This codebook.txt file was generated on 20210210 by wilsonkm ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset Whole Genome Assembly and Annotation of Northern Wild Rice (Zizania palustris L.), a North American Grain 2. Author Information Principal Investigator Contact Information Name: Jennifer A. Kimball Institution: University of Minnesota Email: jkimball@umn.edu Associate or Co-investigator Contact Information Name: Matthew W. Haas Institution: University of Minnesota Associate or Co-investigator Contact Information Name: Thomas Kono Institution: University of Minnesota, Minnesota Supercomputing Institute Associate or Co-investigator Contact Information Name: Marissa Macchietto Institution: University of Minnesota, Minnesota Supercomputing Institute Associate or Co-investigator Contact Information Name: Reneth Millas Institution: University of Minnesota Associate or Co-investigator Contact Information Name: Lillian McGlip Institution: University of Minnesota Associate or Co-investigator Contact Information Name: Mingqin Shao Institution: University of Minnesota, Lawrence Berkeley National Laboratory Associate or Co-investigator Contact Information Name: Jacques Duquette Institution: University of Minnesota Associate or Co-investigator Contact Information Name: Candice N. Hirsch Institution: University of Minnesota 3. Date of data collection: 2018 - 2021 4. Geographic location of data collection: All analyses were performed at the University of Minnesota. Data presented are the result of analysis of the Northern Wild Rice genome so a geographic location isn't applicable. 5. Information about funding sources that supported the collection of the data: Sponsorship: This work was supported by the Minnesota Cultivated Wild Rice Council and by the State of Minnesota, Agricultural Research, Education, Extension, and Technology Transfer program. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: Attribution 3.0 United States 2. Links to publications that cite or use the data: The paper describing the data will soon be submitted for publication, but is not yet available anywhere. We will also post the manuscript to bioRxiv at that time and we could provide that link when it's available. 3. Links to other publicly accessible locations of the data: The genome sequence and annotation have been submitted to NCBI and will be available there, but the rest of our data can't be deposited there. 4. Links/relationships to ancillary data sets: N/A 5. Recommended citation for the data: Haas, Matthew W; Kono, Thomas; Macchietto, Marissa; Millas, Reneth; McGilp, Lillian; Shao, Mingqin; Duquette, Jacques; Hirsch, Candice N; Kimball, Jennifer A. (). Whole Genome Assembly and Annotation of Northern Wild Rice (Zizania palustris L.), a North American Grain. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/ha32-4735. --------------------- DATA & FILE OVERVIEW --------------------- 1. File List Notes: The .csv and .tsv files can be viewed by Microsoft Office. The .fasta file can be viewed by a text editor equipped to parse large text files, such as Glogg. A. Filename: annotation_mapping_table_NCBI.txt Short description: Use this file to translate original gene names (beginning with "FUN") to newer "ZP" names B. Filename: cpm_list.txt Short description: counts per million (cpm) values for all tissues C. Filename: Duplications_Figure_2.tsv Short description: Duplications.tsv file from OrthoFinder run for Figure 2 D. Filename: Duplications_rice_relatives.tsv Short description: Duplications.tsv file from OrthoFinder run for rice relatives E. Filename: Duplications_whole_set_with_Zlatifolia.tsv Short description: Duplications.tsv file for the entire (20-member) run of OrthoFinder incl. Z. latifolia F. Filename: go_biological_process_genes.csv Short description: genes with Biological Process GO terms G. Filename: go_cellular_component_genes.csv Short description: genes with Cellular Component GO terms H. Filename: go_molecular_function_genes.csv Short description: genes with Molecular Function GO terms I. Filename: Orthogroups_Figure_2.tsv Short description: Orthogroups.tsv file from OrthoFinder run for Figure 2 J. Filename: Orthogroups_rice_relatives.tsv Short description: Orthogroups.tsv file from OrthoFinder run for rice relatives K. Filename: Orthogroups_UnassignedGenes_Figure_2.tsv Short description: Orthogroups.UnassignedGenes.tsv file from OrthoFinder run for Figure 2 L. Filename: Orthogroups_UnassignedGenes_rice_relatives.tsv Short description: Orthogroups.UnassignedGenes.tsv file from OrthoFinder run for rice relatives M. Filename: Orthogroups_UnassignedGenes_whole_set_with_Zlatifolia.tsv Short description: Orthogroups.UnassignedGenes.tsv file from OrthoFinder run for entire (20-member) run of OrthoFinder incl. Z. latifolia N. Filename: Orthogroups_Whole_set_with_Zlatifolia.tsv Short description: Orthogroups.tsv file from OrthoFinder run for the entire (20-member) run of OrthoFinder incl. Z. latifolia O. Filename: Orthogroups.GeneCount_Figure_2.tsv Short description: Orthogroups.GeneCount.tsv file for Figure 2 P. Filename: Orthogroups.GeneCount_rice_relatives.tsv Short description: Orthogroups.GeneCount.tsv file for rice relatives Q. Filename: Orthogroups.GeneCount_whole_set_with_Zlatifolia.tsv Short description: Orthogroups.GeneCount.tsv for the entire (20-member) run of OrthoFinder incl. Z. latifolia R. Filename: rice.gene_structures_post_PASA_updates.21917.gff3 Short description: Northern Wild Rice (Zizania palustris L.) genome (cultivar Itasca- C12) GFF3 file OLD NAMES S. Filename: WR_Tau_AllTissues.csv Short description: Tau values for all tissues T. Filename: WR_Tau_WithoutRoot.csv Short description: Tau values without root tissue U. Filename: zizania_palustris_13Nov2018_okGsv_renamedNCBI2.fasta Short description: Northern Wild Rice (Zizania palustris L.) genome (cultivar Itasca- C12) FASTA file NEW NAMES V. Filename: 201003_updated_annotation_file.xlsx - genes.csv Short description: Annotation file with old and new gene names, genomic position, predicted function, gene ontology information, and tissue-specific counts (cpm) 2. Relationship between files: N/A 3. Additional related data collected that was not included in the current data package: N/A -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Scripts are available via Github: https://github.com/UMNKimballLab/NWRGenome_v1.0 2. Methods for processing the data: N/A 3. Instrument- or software-specific information needed to interpret the data: N/A 4. Standards and calibration information, if appropriate: N/A 5. Environmental/experimental conditions: N/A 6. Describe any quality-assurance procedures performed on the data: N/A 7. People involved with sample collection, processing, analysis and/or submission: The authors listed on this dataset