Datasets to test the importance of genetic architecture in marker selection decisions for genomic prediction

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Keywords

Collection period

Date completed

Date updated

Time period coverage

Geographic coverage

Source information

Journal Title

Journal ISSN

Volume Title

Title

Datasets to test the importance of genetic architecture in marker selection decisions for genomic prediction

Published Date

2023-03-01

Author Contact

Hirsch, Candice N
cnhirsch@umn.edu

Type

Dataset
Genomics Data
Simulation Data

Abstract

This dataset contains the input files to simulate traits for maize recombinant inbred lines (RILs) and run genomic prediction models with different marker types. Using real genotypic information from 333 maize recombinant inbred lines with single nucleotide polymorphism (SNP) and structural variant (SV) information projected from their seven sequenced parental lines, we simulated traits with different genetic architectures in multiple environments using the R package simplePHENOTYPES. We varied the heritability, the number of quantitative trait loci (QTLs), the type of causative variant (SNPs or SVs), and the variant effect sizes. Weather data from five locations in the U.S. Midwest in 2020 was used to generate a residual correlation matrix among environments. After performing a two-stage analysis with multivariate GBLUP prediction model for each marker type and genetic architecture, we obtained prediction accuracies using two types of cross-validation (CV1 and CV2). For instructions on how to perform this analysis and analysis script, please see https://github.com/HirschLabUMN/genomic_prediction_svs

Description

Files include structural variant calls of the maize parental lines, genotypic data for recombinant inbred lines (RILs), simulated trait values for each RIL with different genetic architectures, input data for genomic prediction models with different marker types, and genomic prediction accuracy for each combination of simulated genetic architecture and predictors. More detailed information for each file can be found in the README file.

Referenced by

Related to

Replaces

Publisher

Funding information

United States Department of Agriculture (2018-67013-27571)
National Science Foundation (IOS-1546727)
Minnesota Agricultural Experiment Station

item.page.sponsorshipfunderid

item.page.sponsorshipfundingagency

item.page.sponsorshipgrant

Previously Published Citation

Suggested citation

View/Download file
File View/OpenDescriptionSize
supp_file1.vcf.gzRaw structural variant calls of the maize parental lines in VCF format28.14 MB
supp_file2.hmp.txt.gzFiltered genotypic data of recombinant inbred lines (RILs) in hapmap format with projected SNPs and SVs385.46 MB
supp_file3.tar.gzFiles containing simulated trait values for each RIL across different genetic architectures21.79 MB
supp_file4.tar.gzFiles containing ANOVA results for each simulated scenario4.4 MB
supp_file5.tar.gzFiles containing all the marker datasets used for genomic prediction17.31 MB
supp_file6.tar.gzFiles containing simulated trait values for each RIL across different genetic architectures to understand the relationship between LD and prediction accuracy8.17 MB
supp_file7.tar.gzFiles containing all the marker datasets used for genomic prediction to understand the relationship between LD and prediction accuracy11.15 MB
supp_file8.xlsxGenomic prediction accuracy of different marker types for each replicate of simulated traits where either SNPs or SVs were the causative variants347.83 KB
supp_file9.xlsxGenomic prediction accuracy of different marker types for each replicate of simulated traits where both SNPs and SVs were the causative variants708.01 KB
supp_file10.xlsxGenomic prediction accuracy of markers with low (r2 < 0.5), moderate (0.5 < r2 < 0.9) and high (r2 > 0.9) linkage disequilibrium (LD) to a QTL for each replicate of simulated traits where both SNPs and SVs were the causative variants269.35 KB
Archival data.zipArchival data (CSV format)782.72 KB
Readme_Colletta_2023.txtDescription of the data22.85 KB

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.