This readme.txt file was generated on 2023-01-24 by Rafael Della Coletta Recommended citation for the data: Della Coletta, Rafael; Liese, Sharon E; Fernandes, Samuel B; Mikel, Mark A; Bohn, Martin O; Lipka, Alexander E; Hirsch, Candice N. (2023). Datasets to build marker effect networks. Retrieved from the Data Repository for the University of Minnesota. https://doi.org/10.13020/b1e0-q828. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Datasets to build marker effect networks 2. Author Information Author Contact: Candice N Hirsch (cnhirsch@umn.edu) Name: Rafael Della Coletta Institution: University of Minnesota Email: della028@umn.edu ORCID: 0000-0001-6988-9598 Name: Sharon E Liese Institution: University of Illinois Email: sharonf2@illinois.edu ORCID: - Name: Samuel B Fernandes Institution: University of Arkansas Email: samuelbf@uark.edu ORCID: 0000-0001-8269-535X Name: Mark A Mikel Institution: University of Illinois Email: mmikel@illinois.edu ORCID: - Name: Martin O Bohn Institution: University of Illinois Email: mbohn@illinois.edu ORCID: 0000-0003-2364-6229 Name: Alexander E Lipka Institution: University of Illinois Email: alipka@illinois.edu ORCID: 0000-0003-1571-8528 Name: Candice N Hirsch Institution: University of Minnesota Email: cnhirsch@umn.edu ORCID: 0000-0002-8833-3023 3. Date published or finalized for release: 2023-01-23 4. Date of data collection (single date, range, approximate date): 05/01/2019 to 10/31/2019 and 04/22/2020 to 10/31/2020. 5. Geographic location of data collection (where was data collected?): Bloomington, IL, USA; Champaign, IL, USA (2 different sites); St. Paul, MN, USA; Janesville, WI, USA 6. Information about funding sources that supported the collection of the data: United States Department of Agriculture (2018-67013-27571) Minnesota Agricultural Experiment Station 7. Overview of the data (abstract): This dataset contains the input files to build marker effect networks and identify markers associated with environmental adaptability. These networks are built by adapting commonly used software for building gene co-expression networks with marker effects across growth environments as the input data into the networks. Here, we provide grain yield data from 400 maize hybrids grown across nine environments in the U.S. Midwest, a set of ~10,000 non-redundant markers, and environmental data containing 17 weather parameters in 3-day intervals collected from planting date to the end of the season. For instructions on how to perform this analysis and analysis script, please see https://github.com/HirschLabUMN/meffs_networks. For more details on marker effect networks, please see preprint on https://www.biorxiv.org/content/10.1101/2023.01.19.524532v1. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC0 1.0 Universal (http://creativecommons.org/publicdomain/zero/1.0/) 2. Links to publications that cite or use the data: Rafael Della Coletta, Sharon E. Liese, Samuel B. Fernandes, Mark A. Mikel, Martin O. Bohn, Alexander E. Lipka, Candice N. Hirsch. 2023. Linking genetic and environmental factors through marker effect networks to understand trait plasticity. bioRxiv. Available at: https://www.biorxiv.org/content/10.1101/2023.01.19.524532v1 3. Was data derived from another source? If yes, list source(s): No 4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- File List Filename: supp_file1.csv Short description: Raw yield data of maize hybrids across different growth environments Filename: supp_file2.txt Short description: Best Linear Unbiased Estimates (BLUEs) of maize hybrids within each environment Filename: supp_file3.hmp.txt Short description: Raw genotypic data of maize parental lines in hapmap format Filename: supp_file4.hmp.txt Short description: Raw genotypic data of recombinant inbred lines (RILs) in hapmap format Filename: supp_file5.hmp.txt Short description: Filtered genotypic data of maize hybrids derived from RILs in hapmap format Filename: supp_file6.txt Short description: Environmental covariates data -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: See Methods in Della Coletta et al., 2023 (https://www.biorxiv.org/content/10.1101/2023.01.19.524532v1) 2. Methods for processing the data: See Methods in Della Coletta et al., 2023 (https://www.biorxiv.org/content/10.1101/2023.01.19.524532v1) 3. Instrument- or software-specific information needed to interpret the data: All datasets are readable with a text editor (NotePad, Atom, Microsoft Excel, Google Sheets, etc.). For downstream analysis, please refer to https://github.com/HirschLabUMN/meffs_networks 4. Environmental/experimental conditions: See Methods in Della Coletta et al., 2023 (https://www.biorxiv.org/content/10.1101/2023.01.19.524532v1) 5. Describe any quality-assurance procedures performed on the data: See Methods in Della Coletta et al., 2023 (https://www.biorxiv.org/content/10.1101/2023.01.19.524532v1) 6. People involved with sample collection, processing, analysis and/or submission: Rafael Della Coletta, Sharon E. Liese, Samuel B. Fernandes, Mark A. Mikel, Martin O. Bohn, Alexander E. Lipka, Candice N. Hirsch ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: supp_file1.csv ----------------------------------------- 1. Number of variables: 18 2. Number of cases/rows: 10417 3. Missing data codes: NA 4. Variable List A. Name: Experiment Description: ID of experiment (NIFA) with respective year of data collection B. Name: Location Description: Location of where hybrids were grown C. Name: Replication Description: Replicate number D. Name: Hybrid Description: ID of maize hybrid E. Name: ParentA Description: ID of one of the hybrid's parents F. Name: ParentB Description: ID of the other hybrid's parents G. Name: Fam Description: Family number for a hybrid H. Name: B73 Description: Presence (1) or absence (0) of B73 line in hybrid's pedigree I. Name: PHG39 Description: Presence (1) or absence (0) of PHG39 line in hybrid's pedigree J. Name: PH207 Description: Presence (1) or absence (0) of PH207 line in hybrid's pedigree K. Name: PHG47 Description: Presence (1) or absence (0) of PHG47 line in hybrid's pedigree L. Name: PHG35 Description: Presence (1) or absence (0) of PHG35 line in hybrid's pedigree M. Name: LH82 Description: Presence (1) or absence (0) of LH82 line in hybrid's pedigree N. Name: Group Description: Hybrid group O. Name: PlotWeight Description: Plot grain weight at harvest P. Name: Moisture Description: Grain moisture at harvest (%) Q. Name: TWT Description: Test weight at harvest (lb/bu) R. Name: YLD Description: Grain yield at 15.5% moisture (bu/ac) ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: supp_file2.txt ----------------------------------------- 1. Number of variables: 3 2. Number of cases/rows: 3600 3. Missing data codes: NA 4. Variable List A. Name: hybrid Description: ID of maize hybrid B. Name: env Description: ID of growth environment C. Name: BLUEs Description: Best linear unbiased estimates of grain yield ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: supp_file3.hmp.txt ----------------------------------------- 1. Number of variables: 18 2. Number of cases/rows: 20139 3. Missing data codes: NA for columns A to K; NN for remaining columns 4. Variable List A. Name: rs Description: Marker ID B. Name: alleles Description: Possible alleles for marker C. Name: chrom Description: Chromosome that the marker was mapped D. Name: pos Description: Respective position of this marker on chromosome E. Name: strand Description: Orientation of the marker in the DNA strand F. Name: assembly Description: Version of reference sequence assembly G. Name: center Description: Name of genotyping center that produced the genotypes H. Name: protLSID Description: ID for HapMap protocol I. Name: assayLSID Description: ID for HapMap assay used for genotyping J. Name: panel Description: ID for panel of individuals genotyped K. Name: QCcode Description: Quality control ID for all entries Remaining columns. Names: B73 to LH82 Description: Marker genotypes of inbred parents ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: supp_file4.hmp.txt ----------------------------------------- 1. Number of variables: 344 2. Number of cases/rows: 20139 3. Missing data codes: NA for columns A to K; NN for remaining columns 4. Variable List A. Name: rs Description: Marker ID B. Name: alleles Description: Possible alleles for marker C. Name: chrom Description: Chromosome that the marker was mapped D. Name: pos Description: Respective position of this marker on chromosome E. Name: strand Description: Orientation of the marker in the DNA strand F. Name: assembly Description: Version of reference sequence assembly G. Name: center Description: Name of genotyping center that produced the genotypes H. Name: protLSID Description: ID for HapMap protocol I. Name: assayLSID Description: ID for HapMap assay used for genotyping J. Name: panel Description: ID for panel of individuals genotyped K. Name: QCcode Description: Quality control ID for all entries Remaining columns. Names: B73*PHG39-B-B-12-1-1-B-B to LH82*PH207-B-B-31-1-1-B Description: Marker genotypes of recombinant inbred lines ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: supp_file5.hmp.txt ----------------------------------------- 1. Number of variables: 411 2. Number of cases/rows: 10334 3. Missing data codes: NA for columns A to K; NN for remaining columns 4. Variable List A. Name: rs Description: Marker ID B. Name: alleles Description: Possible alleles for marker C. Name: chrom Description: Chromosome that the marker was mapped D. Name: pos Description: Respective position of this marker on chromosome E. Name: strand Description: Orientation of the marker in the DNA strand F. Name: assembly Description: Version of reference sequence assembly G. Name: center Description: Name of genotyping center that produced the genotypes H. Name: protLSID Description: ID for HapMap protocol I. Name: assayLSID Description: ID for HapMap assay used for genotyping J. Name: panel Description: ID for panel of individuals genotyped K. Name: QCcode Description: Quality control ID for all entries Remaining columns. Names: UIUC1 to UIUC400 Description: Marker genotypes of hybrids ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: supp_file6.txt ----------------------------------------- 1. Number of variables: 4 2. Number of cases/rows: 7803 3. Missing data codes: NA 4. Variable List A. Name: env Description: ID of growth environment B. Name: covariable Description: ID of environmental parameter C. Name: intervals Description: Day of growing season in which covariable was obtained D. Name: value Description: Value of environmental parameter for a particular day interval