Browsing by Subject "Databases"

Now showing 1 - 12 of 12

Big Temporally-Detailed Graph Data Analytics
(2015-06) Gunturi, Venkata Maruti Viswanath
Increasingly, temporally-detailed graphs are of a size, variety, and update rate that exceed the capability of current computing technologies. Such datasets can be called Big Temporally-Detailed Graph (Big-TDG) Data. Examples include temporally-detailed (TD) roadmaps which provide typical travel speeds experienced on every road segment for thousands of departure-times in a typical week. Likewise, we have temporally-detailed (TD) social networks which contain a temporal trace of social interactions among the individuals in the network over a time window. Big-TDG data has transformative potential. For instance, a 2011 McKinsey Global Institute report estimates that location-aware data could save consumers hundreds of billions of dollars annually by 2020 by helping vehicles avoid traffic congestion via next-generation routing services such as eco-routing. However, Big-TDG data presents big challenges for the current computer science state of the art. First, Big-TDG data violates the cost function decomposability assumption of current conceptual models for representing and querying temporally-detailed graphs. Second, the voluminous nature of Big-TDG data can violate the stationary ranking-of-candidate-solutions assumption of dynamic programming based techniques such Dijsktra's shortest path algorithm. This thesis proposes novel approaches to address these challenges. To address the first challenge, this thesis proposes a novel conceptual model called, "Lagrangian Xgraphs," which models non-decomposability through a series of overlapping (in space and time) relations, each representing a single atomic unit which retains the required semantics. An initial study shows that Lagrangian Xgraphs are more convenient for representing diverse temporally-detailed graph datasets and comparing candidate travel itineraries. For the second challenge, this thesis proposes a novel divide-and-conquer technique called "critical-time-point (CTP) based approach," which efficiently divides the given time-interval (over which over non-stationary ranking is observed) into disjoint sub-intervals over which dynamic programming based techniques can be applied. Theoretical and experimental analysis show that CTP based approaches outperform the current state of the art.
Compile and Make Digital the Lithologies for all NRRI Drill Logs, with Emphasis on the Duluth Complex Drill Holes (An Addendum to an Earlier NRRI Database)
(University of Minnesota Duluth, 2009) Severson, Mark J; Oreskovich, Julie A; Patelke, Marsha Meinders
This report and associated databases are updates on many of the holes that have been recently logged by the Natural Resources Research Institute (NRRI) in the Keweenawan Duluth Complex, the Paleoproterozoic Biwabik Iron Formation of the Mesabi Iron Range, and the Archean Deer Lake Complex of northeastern Itasca County, Minnesota. The main emphasis of this project was to update some of the databases that were presented in an earlier NRRI report (Patelke, 2003) with regard to lithologies in Duluth Complex drill holes that were logged by the NRRI since 2003 (and discussed in Severson and Hauck, 2008). To date, all of the publically available drill holes (except for around 30 drill holes) have now been logged in the Duluth Complex by the NRRI. These 30 holes are all that are missing from either the databases in this report or the databases in Patelke (2003). It is strongly suggested that the databases herein be combined, at the user’s discretion, with corresponding databases in Patelke (2003) in order to make an all- encompassing database for lithologies for all NRRI logged drill holes in the Duluth Complex. A secondary goal of this project was to present a header file database for all the holes that were recently drilled in the Duluth Complex (post-2003). Most of these holes are not yet publically available, but data regarding drill hole locations can be gleaned from abandonment files. Combining Duluth Complex header files in this report with the similar header file in Patelke (2003) could provide an all-encompassing database of locations for all of the holes drilled to date in the Duluth Complex (pre-2010 data). This combining of the data is left to the user’s discretion. Lastly, additional goals of this project (time permitting) were to present lithologic databases for all holes logged by the NRRI in the Mesabi Iron Range and, to a much lesser extent, holes logged by the NRRI in the Deer Lake Complex. The database for the Mesabi Iron Range contains information for almost 300 drill holes (over 5,947 lines of lithologic data) in regard to the lithologic picks pertaining to informal members and submembers of the iron-formation. The data in this file is about 80% complete in that not all of the iron-formation submembers are presented for holes drilled at the Keetac Taconite mine or in the Coleraine, MN, area (the latter holes are discussed in Zanko et al., 2003).
Conducting Inductive Logic Programming Directly in Database Management Systems
(2015-07) Koppula, Akshay
Inductive logic programming (ILP) is a research area formed at the intersection of machine learning and logic programming. Given a set of background knowledge as well as positive and negative examples of a concept, an ILP system attempts to learn rules that cover all the positive examples and none of the negative examples by using the background knowledge. Over the years, ILP is being used extensively in medical applications. Existing ILP systems are implemented in Prolog, using first-order logic. But, Prolog does not integrate well with database systems, where a lot of the data of interest is stored. Prolog is also not often used in business applications. This thesis presents a novel approach of storing the facts (background knowledge, examples) required for ILP in databases and using Java for easy access and retrieval of the stored knowledge. Since most of the ILP machine learning data sets can be stored easily in databases, this approach provides an easier to use technique. Facts are stored in the form of tables in database and rules are stored as database views by using a database join on the multiple predicates in a fact. A Sequential covering algorithm that uses the best first search approach to learn rules for ILP problems is implemented in this thesis. The results obtained on two real-world test data sets by using this approach are compared with traditional systems. The accuracy of the system presented in this thesis is on par with the accuracy of the traditional systems. These results are very assuring and the system provides an easy-to-use approach for the ILP users.
Dataset: U.S. Public University Responses to Public Records Requests for Structured Data
(2021-03-10) Anderson, Jonathan; Wiley, Sarah K.; and08164@umn.edu; Anderson, Jonathan
This dataset is the product of a study that assessed how public universities in the United States respond to public records requests of varying complexity for structured data. When a university provided a substantive response, the following variables were coded: 1. Nature of response: Whether the university produced responsive data, produced or offered different data than what we requested, asserted there were no records, required prepayment before processing, required in-person inspection, or denied the request. 2. Response time: The number of business days (i.e., omitting weekends and holidays) from the day after a request was filed to the day a substantive response was received. 3. Format: The format that data were released: Excel, CSV, PDF, or web page. 4. New record: Whether the university expressly asserted that it is not obligated to create a new record in response to a public records request. 5. Fee estimate: The amount of money a university estimated it would cost to process the request.
Efficiently Storing and Discovering Knowledge in Databases via Inductive Logic Programming Implemented Directly in Databases
(2015-07) Repaka, Ravikanth
Inductive Logic Programming (ILP) uses inductive, statistical techniques to generate hypotheses which incorporate the given background knowledge to induce concepts that cover most of the positive examples and few of the negative examples. ILP uses techniques from both logic programming and machine learning. Research has been evolving from several years in this field and many systems are developed to solve ILP problems and most of these systems are developed in Prolog and take the input in the form of text files or other similar formats. This thesis proposes to use a relational database to store background knowledge, positive and negative examples in the form of database entities. This information is then manipulated directly uses ILP techniques efficiently in the process of generating hypotheses. The database does the heavy lifting by efficiently handling and storing a very large number of intermediate rules which are generated in the process of finding the required hypotheses. The proposed system will be helpful to generate hypotheses from relational databases. The system also provides a mechanism to store the given data into a database which exists in text files. Sequential covering algorithm is used to find the hypotheses which cover all positive examples and few or none of the negative examples. The proposed system is tested on real world datasets, Mutagenesis and Chess Endgame, and the generated hypotheses and its accuracy are similar to the results of existing systems which were tested on the same datasets. The results are promising and this encourages researchers to use the system in future to discover the knowledge for other datasets or in relational databases.
Exploration Drill Hole Lithology, Geologic Unit, Copper-Nickel Assay, and Location Database for the Keweenawan Duluth Complex, Northeastern Minnesota
(University of Minnesota Duluth, 2003) Patelke, Richard L
This report and database compiles virtually all publically available drill hole location data, lithological logging data, copper-nickel assay data, and rock quality data for about 2,145 exploration drill holes in and near the Keweenawan Duluth Complex in northeastern Minnesota. This database covers about 1,779,600 feet of drilling over about 70,000 lithological, and about 70,000 separate assay intervals. All of this drilling is in St. Louis, Lake, and Cook counties. The digital data are presented in an industry standard (Gemcom for Windows) exploration and mine modeling software format, as well as spreadsheet and comma-delimited files for use in other programs. This format can be adapted for use in a GIS program such as ArcView. The purpose of this report is to make these data available to mineral exploration companies in a format almost immediately usable by them.
Feasibility Analysis of an American Indian Multi-Purpose Facility in East St Paul: Update of a Demographic Database for American Indians in the Twin Cities
(2001) Swenson, Cyndie
Job Bank Merrick Community Services Client Database
(2001) Mendez, Marianallet
Liberation Characteristics of Taconite Plant Feeds
(University of Minnesota Duluth, 2007) Ersayin, Salih
Subsidized Housing Data Base for Minneapolis and St. Paul. (Minneapolis-Saint Paul Family Housing Fund Sponsored Project).
(1986) Lukermann, Barbara L
Transit System Monitoring and Design
(1990-01) Stephanedes, Yorgos J.
Statistical techniques were developed for extracting the most significant features (indicators) from a transit system data base, and classifying proposed and existing transit systems according to the selected features. The data base was constructed by using information from all previous years available by the Mn/DOT, the Census and other sources to be used in classifying transit systems. The data base emphasized the use of raw characteristics of the operating system and the area socioeconomics. The feature extraction was done so that the minimum number of features were extracted that can be used for classifying the transit systems with maximum accuracy. The classification method was designed around the data base and is flexible so that it can use future data to update the data base at minimum cost. The transit system patterns, resulting from the classification method, were identified according to need and performance, and the main characteristics were specified for each pattern. These characteristics and descriptions identifying each pattern determines whether it should be modified. A controlled experiment was required to test the classification method. A randomly selected part of the data was classified by the method, and then the unselected data was treated as a control group for the experiment. After the experiment a percent of misclassifications was calculated.
Transitway Data Management Project
(Center for Transportation Studies, 2010-01) Borah, Jason C.; Craig, William J.
The purpose of this project is make data available for studying the impact of transitways in the Twin Cities Metropolitan Area. We are doing this in two ways: 1) documenting the databases used by University of Minnesota researchers funded by the TIRP – Transitways Impact Research Program and 2) developing a directory of public and private data sources that could be used by future TIRP researchers. This report documents work done to accomplish those goals. Preliminary work has been done using the Minnesota Metadata Guidelines to document two completed TIRP projects. Ten new data sources have been added to MetroGIS’s DataFinder catalog, along with two new data categories. These sources and categories are documented in the report.

University Digital Conservancy

Browse by Subject

Browsing by Subject "Databases"