Browsing by Subject "data integration"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Integrative Analyses for Multi-source Data with Multiple Shared Dimensions(2018-07) O'Connell, MichaelHigh dimensional data consists of matrices with a large number of features and is common across many fields of study, including genetics, imaging, and toxicology. This type of data is challenging to analyze because of its size, and many traditional methods are difficult to implement or interpret with such data. One way of handling high dimensional data is dimension reduction, which aims to reduce high rank, high-dimensional data sets into low-rank approximations, which maintain important components of the structures of the matrices but are easier to use in models. The most common method for dimension reduction of a single matrix is principal components analysis (PCA). Multi-source data are high dimensional data in which multiple data sources share a dimension. When two or more data sets share a feature set, this is called horizontal integration. When two or more data sets share a sample set, this is called vertical integration. Traditionally, there are two ways to approach such a data set: either analyze each data source separately or treat them as one data set. However, these analyses may miss important features that are unique to each data source or miss important relationships between the data sources. A number of recent methods have been developed for analyzing multi-source data that are either vertically or horizontally integrated. One such method is Joint and Individual Variation Explained (JIVE), which decomposes the variation in multi-source data sets into structure that is shared between data sources (called joint structure) and structure that is unique to each of the data sources (called individual structure) (Lock et al. 2013). We have created an R package, r.jive, that implements the JIVE algorithm and provides visualization tools for multi-source data, making multi-source methods more accessible. While there are several methods for data sets with horizontal or vertical integration, there have been no previous methods for data sets with simultaneous horizontal and vertical integration (which we call bidimensional integration). We introduce a method called Linked Matrix Factorization that allows for simultaneous decomposition of multi-source data sets with bidimensional integration. We also introduce a method for bidimensionally integrated data that are not normally distributed, called Generalized Linked Matrix Factorization, which is based on generalized linear models rather than ordinary least squares.Item Weather- and process-based models for the estimation of maize and soybean growth, development, and yield(2020-01) Joshi, VijayaField experiments in agricultural studies carried out at multiple sites and over several growing seasons are instrumental in improving crop management efforts. However, results from such experiments can have narrower applicability as results can vary depending on spatial and temporal variability in crop management practices, weather, and soil properties. In such context, crop models offer opportunities to overcome the shortcomings of field experiments conducted over limited periods and locations by simulating crop growth, development, and yield at various scenarios of weather and soil conditions. Three experiments were conducted to evaluate the application of crop models for maize and soybean production in the growing conditions of the US central Corn Belt. The first experiment evaluated the use and relative importance of readily available weather data to develop weather-based yield estimation models for maize and soybean. Total rainfall (Rain), average air temperature (Tavg), and the difference between maximum and minimum air temperature (Tdiff) at weekly, biweekly, and monthly time-scales from May to August were used to train multiple linear regression (MLR), general additive (GAM), and support vector machine (SVM) models to estimate county-level maize and soybean grain yields for Iowa, Illinois, Indiana, and Minnesota. For the total study area and at individual state level, SVM outperformed other models at all temporal levels for both maize and soybean. For maize, Tavg and Tdiff during July and August, and Rain during June and July were relatively more important whereas for soybean, Tavg in June and, Tdiff and Rain during August were more important weather variables to determine yield. The second experiment evaluated the simulation accuracy of the process-based CERES-Maize model. A study on four nitrogen (N) fertilizer rates for maize production was conducted during the growing seasons of 2016 and 2017 in southwest (Lamberton) and southern (Waseca) Minnesota. The model accurately simulated the dates of anthesis and maturity at both locations with a normalized root mean square error (nRMSE) of 1%. At Lamberton, final grain yield in both years was simulated within 16% nRMSE, but aboveground biomass was simulated with nRMSE as high as 30% and aboveground shoot N content and leaf area index (LAI) were simulated with nRMSEs as high as 38%. At Waseca, however, aboveground biomass over the growing season and final grain yield in both years were simulated with a 15% nRMSE, and aboveground shoot N content and LAI at both years were simulated with 21% nRMSE. Overall, the accuracy of the model was better with optimal growing conditions compared to no N fertilization. The third experiment compared the site-specific maize grain yield estimation accuracy of a stand-alone crop model, CERES-Maize, with a data-integration approach. In the integration approach, maize biomass estimated using satellite multispectral data at the five (V5) and ten (V10) leaf-collar stages were used to optimize the total soil nitrogen concentration (SLNI) and soil fertility factor (SLPF) in CERES-Maize. Without integration, maize yield was simulated with RMSE of 1264 kg ha-1. Optimization of SLNI improved yield simulations at both V5 and V10. However, better simulations were obtained from optimization at V10 as compared to V5. Optimization of SLPF together with SLNI did not further improve the yield simulations.