University Digital Conservancy :: Browsing by Subject "Prediction"

Browsing by Subject "Prediction"

Now showing 1 - 9 of 9

Database of Nutrient Composition of Animal Protein Meals
(2017-02-13) Urriola, Pedro E; Kerr, Brian J; Jha, Rajesh; Shurson, Gerald C; urrio001@umn.edu; Urriola, Pedro E; Department of Animal Sciences
An industry survey and an animal experiment were conducted to evaluate compositional variability and DE and ME content of animal protein by-products, and to generate equations to predict DE and ME content based on chemical analysis. For the 220 samples collected, the greatest concentration of CP was observed in blood meal (BM) and least in meat and bone meal (MBM), the greatest concentration of ether extract (EE) was in meat meal and least in BM, with ash content greatest in MBM and least in BM; with Ca and P levels being 36.1 and 16.3% of the ash content, respectively. For the balance experiment, a corn-soybean meal basal diet was used with test diets formulated by mixing 80% of the basal diet with 20% of the animal protein by-product, except for BM which was included at 10 and 20% of the test diets. Ten groups of 24 gilts (final BW = 92.5 ± 7.4 kg) were used, with gilts randomly assigned to the test or the basal diet within each group, resulting in 16 replications per animal protein by-product or basal diet, except for BM determinations (20 replications). Gilts were placed in metabolism crates and offered 2.4 kg daily of their assigned diet for 13 d, with total collection of feces and urine during the last 4 d. Gross energy was determined in the diets, feces, and urine to calculate DE and ME content of each ingredient by the difference procedure, using DE and ME content of the basal diet as covariates among groups of pigs. The DE content of the animal protein by-products ranged from 5,367 to 2,567 kcal DE/kg of DM, and ME ranged from 4,783 to 2,340 kcal ME/kg DM. Using all animal protein by-products, the best fit equations were as follows: DE, kcal/kg DM = -2,468 + (1.26 × GE, kcal/kg DM), with R2 of 0.84, SE = 390, and P < 0.01; ME, kcal/kg DM = -2,331 + (1.15 × GE, kcal/kg DM), with R2 of 0.86, SE = 327, and P < 0.01). The apparent total tract digestibility (ATTD) of Ca and P were also determined using the difference procedure, with the average ATTD of Ca and P for the animal protein by-products, excluding BM and FM, being 27.1 and 39.1%, respectively. These data indicate that DE and ME varied substantially among the animal protein by-products and sources, and that a variety of nutritional components can be used to accurately predict DE and ME for finishing pigs. In addition, it appears that high dietary inclusion rates of animal protein by-products may result in low ATTD estimates of Ca and P, which may be due to excessive concentrations of total Ca and P affecting digestibility.
Dimension reduction and prediction in large p regressions
(2009-05) Adragni, Kofi Placid
A high dimensional regression setting is considered with p predictors X=(X1,...,Xp)T and a response Y. The interest is with large p, possibly much larger than n the number of observations. Three novel methodologies based on Principal Fitted Components models (PFC; Cook, 2007) are presented: (1) Screening by PFC (SPFC) for variable screening when p is excessively large, (2) Prediction by PFC (PPFC), and (3) Sparse PFC (SpPFC) for variable selection. SPFC uses a test statistic to detect all predictors marginally related to the outcome. We show that SPFC subsumes the Sure Independence Screening of Fan and Lv (2008). PPFC is a novel methodology for prediction in regression where p can be large or larger than n. PPFC assumes that X|Y has a normal distribution and applies to continuous response variables regardless of their distribution. It yields accuracy in prediction better than current leading methods. We adapt the Sparse Principal Components Analysis (Zou et al., 2006) to the PFC model to develop SpPFC. SpPFC performs variable selection as good as forward linear model methods like the lasso (Tibshirani, 1996), but moreover, it encompasses cases where the distribution of Y|X is non-normal or the predictors and the response are not linearly related.
Genomewide Selection in Apple: Prediction and Postdiction in the University of Minnesota Apple Breeding Program
(2019-10) Blissett, Elizabeth
Although marker assisted breeding is now considered routine in apple breeding programs, the adoption of genomewide selection is still in its infancy. Genomewide selection offers the potential to be a valuable tool to apple breeders. The first aim of this research was to assess the predictive ability of genomewide selection for fruit traits by testing an additive prediction model, a model fitting heterozygote effects, and a model fitting fixed effects for major QTL. The second aim of this research was to assess the utility of genomewide selection for fruit traits in the University of Minnesota apple breeding program. This comprised two main objectives, a comparison of selections based on genomewide predictions to selections made based on phenotypic selection and an analysis of the impact on predictive ability when full-sibs are included in the training data. This research finds that in general, a simple linear model is the most efficient choice for genomewide selection in apple unless major effect QTL are known, in which case including them as fixed effects may improve predictive abilities. We also confirmed that predictions made based on genomewide selection to be consistent with selections based on traditional phenotypic selection and that including five to 15 full-sibs from the test population in the training population data can improve predictive ability.
Multi-source Data Decomposition and Prediction for Various Data Types
(2022-12) Palzer, Elise
Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are presently limited because they either (1) only consider data structure shared by all datasets while ignoring structures unique to each source, or (2) they extract underlying structures first without consideration to the outcome. In Chapter 2, we propose a method called supervised joint and individual variation explained (sJIVE) [1] that can simultaneously (1) identify shared (joint) and source-specific (individual) underlying structure and (2) build a linear prediction model for an outcome using these structures. Simulations show sJIVE to outperform existing methods when large amounts of noise are present in the multi-source data, and an application to data from the COPDGene study reveals gene expression and proteomic patterns that are predictive of lung function. In Chapter 3, we extend sJIVE to allow for binary and/or count data and to incorporate sparsity using a method called sparse exponential family sJIVE (sesJIVE). Simulations show the non-sparse version of sesJIVE to outperform existing methods when the data is Bernoulli- or Poisson- distributed with large amounts of noise, and sesJIVE outperforms other JIVE-based methods in our application with COPDGene data. Lastly, chapter 4 will discuss our R package, sup.r.jive, that implements sJIVE, sesJIVE, and a previous method called JIVE-Predict [2]. Summary and visualization tools are also available within our R package for all three methods.
Partial sufficient dimension reduction in regression.
(2011-07) Kim, Do Hyang
In this thesis we propose a new model-based reduction method to reduce the dimension of one set of predictors while maintaining another set of predictors and a response if the response is present. Based on the probabilistic PCA model (Tipping and Bishop 1999) and the PFC model (Cook 2007), we develop new models in the partial dimension reduction context: partial probabilistic PCA models, partial PFC models, and combining models. We estimate the parameters of interest for the partial sufficient reduction using the maximum likelihood method. Methods are also proposed for prediction in partial PFC models.
Reduced-complexity epileptic seizure prediction with EEG.
(2012-01) Park, Yun Sang
In the dissertation we seek to develop and validate reliable frameworks for human epileptic seizure prediction with electrocorticogram (ECoG) and intracranial electroencephalogram (iEEG). The long-term goal of the research is to develop and prototype an implantable device that can reliably provide alarms prior to a seizure in real-time. The specific objective is to develop a patient-specific algorithm that can predict seizures in ECoG/iEEG with high sensitivity and low false positive rate as well as low complexity. This dissertation starts by demonstrating that seizures can be predicted with linear features of spectral power, and it ultimately focuses on developing a reduced-complexity algorithm that can decode ECoG/iEEG for human epileptic seizure prediction with high sensitivity and acceptable low false positive rate. By contrast to prior prediction work, most of which focused on nonlinear measurements, we demonstrate that human epileptic seizures can be predicted with linear features of ECoG/iEEG in machine learning classification approach. To begin with, a new patient-specific seizure prediction algorithm with ECoG/iEEG is proposed. It is novel in sense that it employs a set of linear features of spectral power from ECoG/iEEG for prediction and that predictive models are established and tested using cost-sensitive support vector machines (SVMs) using double cross-validation method. The proposed algorithm is tested over 433.2 hours of interictal recordings including 80 seizure events from 18 human epileptics in the Freiburg EEG database. It achieves high sensitivity of 97.5% (78/80), a low false alarm rate of 0.27 per hour (total 117 FPs), and total false prediction times of 13.0% (56.4-hour). Bipolar and/or time-differential preprocessing improves sensitivity and false positive rate. For the seizure prediction algorithm to be practically feasible on an implantable device, we further propose a reduced-complexity prediction algorithm. We lower the complexity of the algorithm by investigating and using small numbers of essential features and by replacing nonlinear SVMs and the Kalman filter with linear SVMs and moving-average filters. The key features are determined using the RFE SVM (recursive feature elimination using SVMs). The proposed reduced-complexity algorithm significantly lowers the predictor's complexity and thus the power consumption, while producing high sensitivity as well as reasonable false positives. It is tested on 9 subjects selected from the Freiburg database that result in high prediction rate when the initial prediction algorithm is applied, and successfully demonstrates high sensitivity of 100.0% (38/38) as well as low false positive rate of 0.15 per hour (total 32 FPs) and false positive portion of 9.65% (21.0-hour) in the 217.5-hour interictal recordings with the selected six time-differential features. It has been observed that time-differential preprocessing improves the prediction rate significantly. Additionally, we develop an enhanced approach for seizure onset and offset detection in rats' ECoG. This is an improved version of the automatic seizure detection and termination system in in-vivo rats' ECoG. We improve the system by using a specific frequency range of 14-22Hz, which has been observed to be more relevant to seizure onsets than other bands; by using spectral power instead of spectral amplitudes as a feature set; and by substituting the 2-point moving average filter with the Kalman filter. Furthermore, while the proposed algorithm provides better detection statistics, it also lowers the system's complexity by removing the fast Fourier transform computation and keeping a single structure though the proposed algorithm uses the two different spectral features for detecting onsets and offsets.
Reducing And Exploiting Genotype By Environment Interaction In The Context Of Genomewide Prediction In 969 Maize Biparental Populations
(2018-03) Ames, Nicholas
Multi-environment testing remains crucial in genomewide selection, and environmental effects (Ej) complicate selection. We aimed to: 1) determine if past year’s data on previous populations can be used to eliminate environments for a current training population; 2) assess if genomewide predictions can reduce the number of environments used in subsequent phenotypic selection; 3) identify which statistical models and environmental factors are best for estimating Ej; and 4) determine the predictive ability in models that include and exclude genotype × environment interaction effects. A total of 969 Monsanto maize (Zea mays L.) populations were genotyped and phenotyped at multiple U.S. locations from 2000 to 2008. Environmental data from the National Oceanic and Atmospheric Administration were gathered and interpolated. The data included 154,000 lines, 448 million marker data points, 3.2 million phenotypic observations, 1395 unique environments, and 1.3 million environmental covariable data points. For 27 biparental crosses that we chose as test populations, environmental stability and an index that used genomewide predictions and phenotypic data could replace one out of four environments in phenotypic evaluation. Correlations between predicted and observed Ej were between 0.25 and 0.35 even when only two environmental factors (precipitation and heat units) were used. A nonfactorial model for line performance in a given environment effectively combined both the line genetic effect and Ej, doubling prediction ability for grain yield and test weight. We speculate that this model can be combined with crop modelling for additional prediction ability in predicting plant performance in a given environment.
Variable Selection and Prediction in Messy'' High-Dimensional Data"
(2017-07) Brown, Benjamin
When dealing with high-dimensional data, performing variable selection in a regression model reduces statistical noise and simplifies interpretation. There are many ways to perform variable selection when standard regression assumptions are met, but few that work well when one or more assumptions is violated. In this thesis, we propose three variable selection methods that outperform existing methods in such "messy data'' situations where standard regression assumptions are violated. First, we introduce Thresholded EEBoost (ThrEEBoost), an iterative algorithm which applies a gradient boosting type algorithm to estimating equations. Extending its progenitor, EEBoost (Wolfson, 2011), ThrEEBoost allows multiple coefficients to be updated at each iteration. The number of coefficients updated is controlled by a threshold parameter on the magnitude of the estimating equation. By allowing more coefficients to be updated at each iteration, ThrEEBoost can explore a greater diversity of variable selection "paths'' (i.e., sequences of coefficient vectors) through the model space, possibly finding models with smaller prediction error than any of those on the path defined by EEBoost. In a simulation of data with correlated outcomes, ThrEEBoost reduced prediction error compared to more naive methods and the less flexible EEBoost. We also applied our method to the Box Lunch Study where we found that we were able to reduce our error in predicting BMI from longitudinal data. Next, we propose a novel method, MEBoost, for variable selection and prediction when covariates are measured with error. To do this, we incorporate a measurement error corrected score function due to Nakamura (1990) into the ThrEEBoost framework. In both simulated and real data, MEBoost outperformed the CoCoLasso (Datta and Zou, 2017), a recently proposed penalization-based approach to variable selection in the presence of measurement error, and the (non-measurement error corrected) Lasso. Lastly, we consider the case where multiple regression assumptions may be simultaneously violated. Motivated by the idea of stacking, specifically the SuperLearner technique (VanDerLaan et al., 2007), we propose a novel method, Super Learner Estimating Equation Boosting (SuperBoost). SuperBoost performs variable selection in the presence of multiple data challenges by combining the results from variable selection procedures which are each tailored to address a different regression assumption violation. The ThrEEBoost framework is a natural fit for this approach, since the component "learners'' (i.e., violation-specific variable selection techniques) are fairly straightforward to construct and implement by using various estimating equations. We illustrate the application of SuperBoost on simulated data with both correlated outcomes and covariate measurement error, and show that it performs as well or better than methods which address only one (or neither) of these factors.
Viewing Expert Judgment in Individual Assessments through the Lens Model: Testing the Limits of Expert Information Processing
(2018-05) Yu, Martin
The predictive validity of any assessment system is only as good as its implementation. Across a range of decision settings, algorithmic methods of data combination often match or outperform the judgmental accuracy of expert judges. Despite this, individual assessments still largely rely on the use of expert judgment to combine candidate assessment information into an overall assessment rating to predict desired criteria such as job performance. This typically results in lower levels of validity than what could theoretically have been achieved. Based on archival assessment data from an international management consulting firm, this dissertation presents three related studies with an overarching goal of better understanding the processes underlying why expert judgment tends to be less accurate in prediction compared to algorithmic judgmental methods. First, the Lens Model is used to break down expert judgment in individual assessments into its component processes, finding that when combining assessment information into an overall evaluation of candidates, expert assessors use suboptimal predictor weighting schemes and also use them inconsistently when evaluating multiple candidates. Second, the ability of expert assessors to tailor their judgments to maximise predictive power for specific organisations is tested by comparing models of expert judgment local and non-local to organisations. No evidence of valid expertise tailored to organisations is found as models of expert judgment local to a specific organisation performed only as well as models non-local to that organisation. Third, the importance of judgmental consistency in maximising predictive validity is evaluated by testing random weighting schemes. Here, simply exercising mindless consistency by applying a randomly generated weighting scheme consistently is enough to outperform expert judgment. Taken together, these results suggest that the suboptimal and inconsistent ways that expert assessors combine assessment information is drastically hampering their ability to make accurate evaluations of assessment candidates and to predict candidates’ future job performance. Even if they are able to demonstrate valid expert insight from time to time, over the long run the opportunities for human error far outweigh any opportunity for expertise to be truly influential. Implications of these findings for how assessments are conducted in organisations as well as recommendations for how expert judgment could still be retained and improved are discussed.

University Digital Conservancy

Browsing by Subject "Prediction"

Results Per Page

Sort Options