This readme.txt file was generated on 20200311 by Peng Zhou Contact Information Name: Peng Zhou Institution: University of Minnesota Address: 422 Biological Sciences Email: zhoux379@umn.edu Principal Investigator Contact Information Name: Nathan M. Springer Institution: University of Minnesota Address: 1479 Gortner Avenue Email: springer@umn.edu --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: studies.xlsx Short description: Details of 45 public maize RNA-Seq studies and the IDs used in this study. Column "nid" is a unique ID assigned to each one of the 45 regulatory networks built in this study, while columns "study", "note", "net_type" and "sample_size" stand for the public RNA-Seq study used to built each network, the genotype / tissue information associated with each study, type of the network as well as the sample size of each study/network. Rows in spreadsheet are colored by network type consistent with the color coding in manuscript figures. The full references for the studies are found in the related manuscript. B. Filename: cpm_tables_raw.tar.gz Short description: Raw CPM (Counts Per Million) expression tables of 45 public maize RNA-Seq studies collected in this study. See Methodology section for detail. C. Filename: cpm_tables_filtered.tar.gz Short description: Filtered CPM (Counts Per Million) expression tables of 45 public maize RNA-Seq studies collected in this study. The filtered CPM table removed genes with no or low expression (CPM < 1 in more than 90% samples) on a per-study basis. D. Filename: rf_1m.tar.gz Short description: Top 1 million (1,000,000) regulatory interactions predicted in each of the 45 studies using Random Forest algorithm. Three columns in the file represent "regulator gene ID", "target gene ID" and "regulatory strength". E. Filename: rf_100k.tar.gz Short description: Top 100k (100,000) regulatory interactions predicted in each of the 45 studies using Random Forest algorithm. Three columns in the file represent "regulator gene ID", "target gene ID" and "regulatory strength". F. Filename: et_1m.tar.gz Short description: Top 1 million (1,000,000) regulatory interactions predicted in each of the 45 studies using Extra Trees algorithm. Three columns in the file represent "regulator gene ID", "target gene ID" and "regulatory strength". G. Filename: et_100k.tar.gz Short description: Top 100k (100,000) regulatory interactions predicted in each of the 45 studies using Extra Trees algorithm. Three columns in the file represent "regulator gene ID", "target gene ID" and "regulatory strength". H. Filename: xgb_1m.tar.gz Short description: Top 1 million (1,000,000) regulatory interactions predicted in each of the 45 studies using XGBoost algorithm. Three columns in the file represent "regulator gene ID", "target gene ID" and "regulatory strength". I. Filename: xgb_100k.tar.gz Short description: Top 100k (100,000) regulatory interactions predicted in each of the 45 studies using XGBoost algorithm. Three columns in the file represent "regulator gene ID", "target gene ID" and "regulatory strength". 2. Relationship between files: 3. Additional related data collected that was not included in the current data package: 4. Are there multiple versions of the dataset? no 5. Publications that cite or use the data: Peng Zhou, Zhi Li, Erika Magnusson, Fabio A. Gomez Cano, Peter Alexander Crisp, Jaclyn Noshay, Erich Grotewold, Candice Hirsch, Steven Paul Briggs, Nathan M. Springer. (2020). Exploring Gene Regulatory Networks in Maize. The Plant Cell, tpc.00080.2020; https://doi.org/10.1105/tpc.20.00080 6. Reccommended citation for the data: Zhou, Peng; Springer, Nathan M. (2020). Data for: Meta gene regulatory networks in maize highlight functionally relevant regulatory interactions. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/p3g0-3170. -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Raw sequencing reads were downloaded from NCBI Sequence Read Archive (SRA), trimmed using fastp and mapped to the maize B73 AGP_v4 genome using Hisat2 using a variant-aware reference. Uniquely mapped reads were assigned to and counted for the 46,117 reference gene models (Ensembl Plants v37) using featureCounts. Raw read counts were then normalized using the TMM normalization approach to give CPMs (Counts Per Million reads) and then further normalized by gene CDS lengths to give FPKM (Fragments Per Kilobase of exon per Million reads) values. Hierarchical clustering and principal component analysis were used to explore sample clustering patterns. 2. Methods for processing the data: see above 3. Instrument- or software-specific information: fastp: https://github.com/OpenGene/fastp Hisat2: http://daehwankimlab.github.io/hisat2/ Ensembl Plants: https://plants.ensembl.org/index.html 4. Standards and calibration information, if appropriate: 5. Environmental/experimental conditions: 6. Describe any quality-assurance procedures performed on the data: 7. People involved with sample collection, processing, analysis and/or submission: