This readme.txt file was generated on 20210518 by Kayla R. Altendorf ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Phenotype and SNP marker data for an intermediate wheatgrass (Thinopyrum intermedium) nested association mapping (NAM) population evaluated in St. Paul, MN and Salina, KS in 2017 and 2018 2. Author Information Principal Investigator Contact Information Name: Kayla R. Altendorf Institution: USDA-ARS Forage Seed and Cereal Research Unit Address: 24106 N. Bunn Rd. Prosser, WA 99350 Email: kayla.altendorf@usda.gov ORCID: https://orcid.org/0000-0002-4224-0171 Associate or Co-investigator Contact Information Name: Lee R. DeHaan Institution: The Land Institute Address: 2440 E Water Well Rd, Salina, KS 67401 Email: dehaan@landinstitute.org ORCID: https://orcid.org/0000-0002-6368-5241 Associate or Co-investigator Contact Information Name: James A. Anderson Institution: University of Minnesota Department of Agronomy and Plant Genetics Address: 1991 Upper Buford Circle St. Paul, MN 55108 Email: ander319@umn.edu ORCID: https://orcid.org/0000-0003-4655-6517 Associate or Co-investigator Contact Information Name: Steven R. Larson Institution: USDA-ARS Forage Range and Research Lab Address: Utah State University 696 NORTH 1100 EAST Logan, UT 84322 Email: steve.larson@usda.gov ORCID: https://orcid.org/0000-0003-2742-2134 3. Date of data collection: 20170424 - 20180717 4. Geographic location of data collection (where was data collected?): St. Paul, MN and Salina, KS 5. Information about funding sources that supported the collection of the data: The Malone Family Foundation, Perennial Agriculture Project, The Land Institute -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: Please cite us if you use the data. 2. Links to publications that cite or use the data: Altendorf, K. R., Larson, S. R., DeHaan, L. R., Crain, J., Neyhart, J., Dorn, K. M., & Anderson, J. A. (2021). Nested association mapping reveals the genetic architecture of spike emergence and anthesis timing in intermediate wheatgrass. G3, 11(3), jkab025. https://doi.org/10.1093/g3journal/jkab025 Altendorf, K. R., DeHaan, L. R., Heineck, G. C., Zhang, X., & Anderson, J. A. Floret site utilization and reproductive tiller number are primary components of grain yield in intermediate wheatgrass spaced plants. Crop Science. https://doi.org/10.1002/csc2.20385 3. Links to other publicly accessible locations of the data: www.github.com/kraltendorf 4. Links/relationships to ancillary data sets: No 5. Was data derived from another source? No 6. Recommended citation for the data: Altendorf, Kayla R; DeHaan, Lee R; Anderson, James A; Larson, Steven R. (2021). Phenotype and SNP marker data for an intermediate wheatgrass (Thinopyrum intermedium) nested association mapping (NAM) population evaluated in St. Paul, MN and Salina, KS in 2017 and 2018. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/cnpv-kf72. --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: NAM_Data.txt Short description: Phenotypic data from the intermediate wheatgrass (Thinopyrum intermedium) population. B. Filename: backbone.csv Short description: A .csv file that includes a balanced number of individuals per family for easy analysis. Used as a data backbone to filter out phenotypic data to only include relevant individuals and to include missing values. C. Filename: NAM_GATK_filtered_maf_selfs.vcf Short description: Single nucleotide polymorphism (SNP) marker data for the intermediate wheatgrass NAM. This data has been filtered with a minor allele frequency (MAF = 0.005) and selfs removed. The methods for obtaining the SNP marker data are described in detail in: Kayla R Altendorf, Steven R Larson, Lee R DeHaan, Jared Crain, Jeff Neyhart, Kevin M Dorn, James A Anderson, Nested association mapping reveals the genetic architecture of spike emergence and anthesis timing in intermediate wheatgrass, G3 Genes|Genomes|Genetics, Volume 11, Issue 3, March 2021, jkab025, https://doi.org/10.1093/g3journal/jkab025 D. Filename: NAM_GATK_imputed.vcf Short description: Imputed single nucleotide polymorphism (SNP) marker data for the intermediate wheatgrass NAM. The data was imputed using the program LinkImpute (Money, D., K. Gardner, Z. Migicovsky, H. Schwaninger, G. Zhong, et al. 2015. LinkImpute : Fast and Accurate Genotype Imputation for Nonmodel Organisms. 5(November): 2383–2390. doi: 10.1534/g3.115.021667). E. Filename: new_key.txt Short description: The key to connect sample names in .vcf files (listed as flowcell_lane_sample) to sample names used in the phenotypic data analyses. F. Filename: WGN_map.map Short description: A custom consensus genetic map for the nested association mapping population (NAM) created using JoinMap and described in https://doi.org/10.1093/g3journal/jkab025. G. Filename: Trait_ID Short description: Trait IDs used in NAM_Data.txt, their units, and the methods used to collect them. 2. Relationship between files: As described above. 3. Additional related data collected that was not included in the current data package: 4. Are there multiple versions of the dataset? No -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: See the following publications: Phenotype and SNP marker data for an intermediate wheatgrass (Thinopyrum intermedium) nested association mapping (NAM) population evaluated in St. Paul, MN and Salina, KS in 2017 and 2018 2. Methods for processing the data: See www.github.com/kraltendorf and the repositories beginning with “IWG_”. 3. Instrument- or software-specific information needed to interpret the data: N/A 4. Standards and calibration information, if appropriate: 5. Environmental/experimental conditions: 6. Describe any quality-assurance procedures performed on the data: 7. People involved with sample collection, processing, analysis and/or submission: Kayla Altendorf, Lee R. DeHaan, James A. Anderson, Steve Larson, Jeffrey Neyhart, Kevin M. Dorn, Jared Crain, Garett Heineck ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: NAM_Data.txt ----------------------------------------- 1. Number of variables: 19 2. Number of cases/rows: 443352 3. Missing data codes: “.”, NA, “” 4. Variable List A. Name: plant_id Description: identity used in the intermediate wheatgrass database maintained by Kansas State University and The Land Institute. B. Name: germplasm_id Description: identity used across phenotypic and genotypic data where “WGN” indicates wheatgrass NAM, followed by family ID (two digits) and “C” for common parent or “D” for donor parent as the mother, or “P” indicating a cloned parent of the population, followed the individual progeny ID per family. C. Name: experiment_id Description: as used in the intermediate wheatgrass database maintained by Kansas State University and The Land Institute, where 16_SAL_WGN and 16_STP_WGN indicates the planting year, location (SAL = Salina, KS; STP = St. Paul, MN) and the population name (WGN). D. Name: planting_date Description: date that the experiment was planted E. Name: row Description: row position in the field F. Name: range Description: column or range position in the field G. Name: serpentine Description: serpentine order for walking through the field H. Name: rep Description: replication or block (n = 2 per environment) I. Name: female_parent Description: female_parent of the individual, where WGN59P01 is the common parent J. Name: male_parent Description: male of the individual, where WGN59P01 is the common parent K. Name: family_name Description: family from which the individual was derived or belongs L. Name: sample_number Description: in cases where multiple data points were collected per trait per individual M. Name: trait_id Description: the phenotypic trait, please see the Trait_ID file for more information. N. Name: phenotype_value Description: the value associated with the phenotype. For details on units and methods please see the Trait_ID file. O. Name: phenotype_date Description: the date the data were collected, when available. P. Name: trait_id Description: the phenotypic trait, please see the Trait_ID file for more information. Q. Name: phenotype_year Description: year that the data were collected. R. Name: phenotype_person Description: the person that collected the data. S. Name: notes Description: any notes associated with the sample. T. Name: sampling_datetime Description: time and date the data were collected (unavailable in this dataset).