Datasets to test the importance of genetic architecture in marker selection decisions for genomic prediction
2023-03-01
Loading...
Persistent link to this item
Statistics
View StatisticsKeywords
Collection period
Date completed
Date updated
Time period coverage
Geographic coverage
Source information
Journal Title
Journal ISSN
Volume Title
Title
Datasets to test the importance of genetic architecture in marker selection decisions for genomic prediction
Published Date
2023-03-01
Group
Author Contact
Hirsch, Candice N
cnhirsch@umn.edu
cnhirsch@umn.edu
Type
Dataset
Genomics Data
Simulation Data
Genomics Data
Simulation Data
Abstract
This dataset contains the input files to simulate traits for maize recombinant inbred lines (RILs) and run genomic prediction models with different marker types. Using real genotypic information from 333 maize recombinant inbred lines with single nucleotide polymorphism (SNP) and structural variant (SV) information projected from their seven sequenced parental lines, we simulated traits with different genetic architectures in multiple environments using the R package simplePHENOTYPES. We varied the heritability, the number of quantitative trait loci (QTLs), the type of causative variant (SNPs or SVs), and the variant effect sizes. Weather data from five locations in the U.S. Midwest in 2020 was used to generate a residual correlation matrix among environments. After performing a two-stage analysis with multivariate GBLUP prediction model for each marker type and genetic architecture, we obtained prediction accuracies using two types of cross-validation (CV1 and CV2). For instructions on how to perform this analysis and analysis script, please see https://github.com/HirschLabUMN/genomic_prediction_svs
Description
Files include structural variant calls of the maize parental lines, genotypic data for recombinant inbred lines (RILs), simulated trait values for each RIL with different genetic architectures, input data for genomic prediction models with different marker types, and genomic prediction accuracy for each combination of simulated genetic architecture and predictors. More detailed information for each file can be found in the README file.
Referenced by
Della Coletta, R., Fernandes, S.B., Monnahan, P.J. et al. (2023). Importance of genetic architecture in marker selection decisions for genomic prediction. Theoretical and Applied Genetics 136, 220.
https://doi.org/10.1007/s00122-023-04469-w
https://doi.org/10.1007/s00122-023-04469-w
Related to
Replaces
https://hdl.handle.net/11299/252793.1
item.page.isreplacedby
Publisher
Collections
Funding information
United States Department of Agriculture (2018-67013-27571)
National Science Foundation (IOS-1546727)
Minnesota Agricultural Experiment Station
National Science Foundation (IOS-1546727)
Minnesota Agricultural Experiment Station
item.page.sponsorshipfunderid
item.page.sponsorshipfundingagency
item.page.sponsorshipgrant
Previously Published Citation
Other identifiers
Suggested citation
Della Coletta, Rafael; Fernandes, Samuel B; Monnahan, Patrick J; Mikel, Mark A; Bohn, Martin O; Lipka, Alexander E; Hirsch, Candice N. (2023). Datasets to test the importance of genetic architecture in marker selection decisions for genomic prediction. Retrieved from the Data Repository for the University of Minnesota (DRUM), https://doi.org/10.13020/atq4-1b58.
View/Download File
File View/Open
Description
Size
supp_file1.vcf.gz
Raw structural variant calls of the maize parental lines in VCF format
(28.14 MB)
supp_file2.hmp.txt.gz
Filtered genotypic data of recombinant inbred lines (RILs) in hapmap format with projected SNPs and SVs
(108.77 MB)
supp_file3.zip
Files containing simulated trait values for each RIL across different genetic architectures
(11.21 MB)
supp_file4.zip
Files containing ANOVA results for each simulated scenario
(1.91 MB)
supp_file5.zip
Files containing all the marker datasets used for genomic prediction
(24.65 MB)
supp_file6.zip
Files containing simulated trait values for each RIL across different genetic architectures to understand the relationship between LD and prediction accuracy
(28.02 MB)
supp_file7.zip
Files containing all the marker datasets used for genomic prediction to understand the relationship between LD and prediction accuracy
(16.93 MB)
supp_file8.xlsx
Genomic prediction accuracy of different marker types for each replicate of simulated traits where either SNPs or SVs were the causative variants
(178.38 KB)
supp_file9.xlsx
Genomic prediction accuracy of markers with low (r2 < 0.5), moderate (0.5 < r2 < 0.9) and high (r2 > 0.9) linkage disequilibrium (LD) to a QTL for each replicate of simulated traits where both SNPs and SVs were the causative variants
(138.57 KB)
Archival_Data.zip
Archival data (CSV format)
(186.39 KB)
Readme_Coletta_2023.txt
Description of the data
(21.5 KB)
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.