This readme.txt file was generated on 2024-08-23 by data curator and updated by L. McGilp Recommended citation for the data: McGilp, Lillian; Haas, Matthew W; Shao, Mingqin; Millas, Reneth; Castell-Miller, Claudia; Kern, Anthony J; Shannon, Laura M; Kimball, Jennifer A. (2024). Data for "Towards Stewardship of Wild Species and Their Domesticated Counterparts: A Case Study in Northern Wild Rice (Zizania palustris L.)". Retrieved from the Data Repository for the University of Minnesota (DRUM), https://doi.org/10.13020/TPV1-8J41. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Data for "Towards Stewardship of Wild Species and Their Domesticated Counterparts: A Case Study in Northern Wild Rice (Zizania palustris L.)" 2. Author Information Author Contact: Jennifer Kimball (jkimball@umn.edu) Name: Lillian McGilp Institution: University of Minnesota Email: garbe047@umn.edu ORCID: https://orcid.org/0009-0007-3005-0059 Name: Matthew W Haas Institution: University of Minnesota Email: haasx092@umn.edu Name: Mingqin Shao Institution: Department of Energy Joint Genome Institute Email: shaomingqin99@gmail.com Name: Reneth Millas Institution: University of Minnesota Email: milla138@umn.edu Name: Claudia Castell-Miller Institution: University of Minnesota Email: caste007@umn.edu ORCID: https://orcid.org/0000-0002-5730-5863 Name: Anthony J Kern Institution: University of Minnesota Email: akern@umn.edu Name: Laura M Shannon Institution: University of Minnesota Email: lmshannon@umn.edu ORCID: https://orcid.org/0000-0003-3935-4909 Name: Jennifer A Kimball Institution: University of Minnesota Email: jkimball@umn.edu ORCID: https://orcid.org/0000-0002-1210-8161 3. Date published or finalized for release: 2024-08-08 4. Date of data collection: The data was collected in 2010 and 2018. 5. Geographic location of data collection (where was data collected?): Data was collected from across Minnesota and Western Wisconsin. More specific information for each location can be found in the linked publication. 6. Information about funding sources that supported the collection of the data: This work was supported by the State of Minnesota, Agricultural Research, Education, Extension and Technology Transfer program. 7. Overview of the data (abstract): This record contains SNP data from 839 Northern Wild Rice (Zizania palustris) plants, which included 12 wild NWR populations collected from across Minnesota and Western Wisconsin, some of which were collected over two time points; a representative collection of cultivated NWR varieties and breeding populations; and a Zizania aquatica outgroup. The digital record provides a sample key, a vcf file, and a README file which explains the relationship between the other two files. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: Attribution 4.0 International (http://creativecommons.org/licenses/by/4.0/) 2. Links to publications that cite or use the data: McGilp, L., Haas, M., Shao, M., Millas, R., Castell-Miller, C., Kern, A., ... & Kimball, J. (2024). Towards Stewardship of Wild Species and Their Domesticated Counterparts: A Case Study in Northern Wild Rice (Zizania palustris L.). Authorea Preprints. https://www.biorxiv.org/content/10.1101/2022.08.25.505308v2 3. Was data derived from another source? no 4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/policies/#drum-terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- Filename: Full_Sample_Key_240122.xlsx Short description: Sample key including SRA accession numbers. The sample key has 5 columns: Sample_number, ID, Sample_ID, Class, and SRA_accession. Sample number and ID are identification values assigned at the time of sequencing to differentiate between individual samples. The sample number can be used to link the key with the vcf file. Sample_ID denotes the lake or river of origin for natural stand samples or the breeding material for cultivated samples. Class indicates whether the sample came from a natural stand or from cultivated material. Finally, the SRA accession contains the sequence read archive identifier assigned by NCBI, which can be used to search for sequence information under the accession number PRJNA774842. Filename: Fullsamplevcf.vcf Short description: A concatenated VCF of filtered GBS SNP data The vcf (https://samtools.github.io/hts-specs/VCFv4.2.pdf) file contains SNP data from the 15 largest NWR chromosomes as well as two large scaffolds. To generate this file SNP data was filtered with VCFtools (https://vcftools.sourceforge.net/man_latest.html) with s maximum missing rate of 20 % across all samples, minor allele frequency of 0.05, the exclusion of indels, and a minimum depth of 4 reads per variant. The names of the bamfiles within this vcf line up with the sample number values that are listed in the sample key but are in the format Sample_XXXX/Sample_XXXX_sorted.bam. Filename: Full_Sample_Key_240122.csv Short description: duplicate of the .xlsx file added by data curator for preservation and interoperability Additional information about the generation of these files can be found on our github page at: https://github.com/UMNKimballLab/WildRiceGeneticDiversity2022 -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Northern Wild Rice (NWR; Zizania palustris L.) is an aquatic, annual grass with significant ecological, cultural, and economic importance to the Great Lakes region of North America. In this study, we assembled and genotyped a diverse collection of 839 NWR individuals using genotyping-by-sequencing (GBS) and obtained 5,955 single-nucleotide polymorphisms (SNPs). Our collection consisted of samples from 12 wild NWR populations collected across Minnesota and Western Wisconsin, some of which were collected over two time points; a representative collection of cultivated NWR varieties and breeding populations; and a Zizania aquatica outgroup. Using these data, we characterized the genetic diversity, relatedness, and population structure of this broad collection of NWR genotypes. In this dataset you will find two files, a sample key and a vcf file of the obtained and filtered SNP data. 2. Methods for processing the data: The methods used to process the data can be found in detail at https://github.com/UMNKimballLab/WildRiceGeneticDiversity2022 3. Instrument- or software-specific information needed to interpret the data: Any VCF or spreadsheet reading software will work with this data. 4. Standards and calibration information, if appropriate: n/a 5. Environmental/experimental conditions: Location and time were used as experimental conditions. 6. Describe any quality-assurance procedures performed on the data: Several quality procedures were performed on the data including FastQC and filtering with VCFtools. 7. People involved with sample collection, processing, analysis and/or submission: The listed authors were involved in the stated processes.