This readme.txt file was generated on 20220713 by Mark Jankowski Recommended citation for the data: Jankowski, Mark D; Fairbairn, David J; Baller, Joshua A; Westerhoff, Benjamin M; Schoenfuss, Heiko L. (2022). Data for: Using the Daphnia magna Transcriptome to Distinguish Water Source: Wetland and Stormwater Case Studies. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/4hq3-v890. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset Data for: Using the Daphnia magna Transcriptome to Distinguish Water Source: Wetland and Stormwater Case Studies 2. Author Information Principal Investigator Contact Information Name: Mark D Jankowski Institution: Minnesota Pollution Control Agency Address: 1200 6th Ave, Seattle, WA 98101 Email: jankowski.mark@epa.gov ORCID: 0000-0002-1422-2611 Associate or Co-investigator Contact Information Name: David J Fairbairn Institution: Minnesota Pollution Control Agency Address: Lloyd 700 Building, 700 NE Multnomah St #600, Portland, OR 97232 Email: david.fairbairn@deq.oregon ORCID: NA Associate or Co-investigator Contact Information Name: Joshua A Baller Institution: University of Minnesota Address: 599 Walter Library, 117 Pleasant St. SE, Minneapolis, MN 55455 Email: jballer@umn.edu ORCID: NA Associate or Co-investigator Contact Information Name: Heiko L Schoenfuss Institution: St. Cloud State University Address: 599 Walter Library, 117 Pleasant St. SE, Minneapolis, MN 55455 Email: hschoenfuss@stcloudstate.edu ORCID: 0000-0001-5464-992X Associate or Co-investigator Contact Information Name: Benjamin M Westerhoff Institution: Address: Email: ORCID: 3. Date published or finalized for release: 05/25/2022 4. Date of data collection (single date, range, approximate date): 20160228 - 20170131 5. Geographic location of data collection (where was data collected?): Minneapolis, St. Paul and St. Cloud, Minnesota 6. Information about funding sources that supported the collection of the data: This work was supported by the Clean Water Fund of Minnesota's Clean Water, Land and Legacy Amendment and Clean Water Act Section 106 funds provided by the U.S. Environmental Protection Agency to the Minnesota Pollution Control Agency. 7. Overview of the data (abstract): A major challenge in ecotoxicology is accurately and sufficiently measuring chemical exposures and biological effects given the presence of complex and dynamic contaminant mixtures in surface waters. Our study examined the performance of the Daphnia magna transcriptome to detect distinct responses across three water sources in Minnesota: laboratory [well] waters, wetland waters, or stormwaters. Pyriproxyfen (PPF) was included as a gene expression and male neonate production positive control to examine whether gene expression resulting from exposure to this well-studied juvenoid hormone analog can be detected in complex matrices. Laboratory-reared (<24 hr) D. magna were exposed to a water source and/or PPF for 16 d to monitor phenotypic changes or 96 hr to examine gene expression responses using Illumina HiSeq 2500 (10 million reads per library, 50-bp paired-end (2x50)). Results indicated a unique gene expression profile was produced for each water source. At 119 ng/L PPF (approximately EC25) for male neonate production, as expected, the Doublesex1 gene was upregulated. In descending order, gene expression patterns were most discernible with respect to PPF exposure status, season of stormwater sample collection, and wetland quality, as indicated by the index of biological integrity. However, the biological implications of the affected genes were not broadly clear given limited genome resources for invertebrates. Our study provides support for the utility of short-term whole organism transcriptomic testing in D. magna to discern sample type but highlights the need for further work on invertebrate genomics. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: Open Access; CC0 1.0 Universal 2. Links to publications that cite or use the data: Jankowski, M.D., Fairbairn, D.J., Baller, J.A., Westerhoff, B.M. and Schoenfuss, H.L. (2022), Using the Daphnia magna Transcriptome to Distinguish Water Source: Wetland and Stormwater Case Studies. Environ Toxicol Chem. Accepted Author Manuscript. https://doi.org/10.1002/etc.5392 3. Was data derived from another source? No. If yes, list source(s): 4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: short ppf data for EC10 analysis.csv Short description: Laboratory pyriproxyfen exposure sex ratio response dataset used for ECx determination (Figure 2) B. Filename: diffExp_meanTPM_INvsOut.csv Short description: File includes RNAseq TPM data for each gene from daphnia treated with stormwater that was not treated (In column) and was treated (Out column) with BMP. Gene annotation information is also included. Data used in Figures 4 and 5 and Table S9. C. Filename: diffExp_meanTPM_LabvsWetland.csv Short description: File includes RNAseq TPM data for each gene from daphnia treated with laboratory or wetland water with and without 119 ng/L pyriproxyfen. Gene annotation information is also included.Data used in Figures 3, 5, S2-S4 and Table S4-S6. D. Filename: diffExp_meanTPM_PPF.csv Short description: File includes RNAseq TPM data for each gene from daphnia treated with and without 119 ng/L pyriproxyfen. Gene annotation information is also included.Data used in Table 1. E. Filename: diffExp_meanTPM_Seasons.csv Short description: File includes RNAseq TPM data for each gene from daphnia treated with stormwater collected in the spring, early or late summer. Gene annotation information is also included.Data used in Figure 4 and 5 and Table 2. 2. Relationship between files: File A contains data uniquely different from the files B through E in that it contains phenotypic information in response to treatment rather than gene expression data. Files B through E contain gene expression data used in figure and table creation as noted above as well as for gene ontology analyses and logistic regression modeling. -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: see associated article DOI 10.1002/etc.5392 2. Methods for processing the data: see associated article DOI 10.1002/etc.5392 3. Instrument- or software-specific information needed to interpret the data: see associated article DOI 10.1002/etc.5392 4. Standards and calibration information, if appropriate: see associated article DOI 10.1002/etc.5392 5. Environmental/experimental conditions: see associated article DOI 10.1002/etc.5392 6. Describe any quality-assurance procedures performed on the data: see associated article DOI 10.1002/etc.5392 7. People involved with sample collection, processing, analysis and/or submission: All article authors listed above. See associated article DOI 10.1002/etc.5392 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: short ppf data for EC10 analysis.csv ----------------------------------------- 1. Number of variables: 8 2. Number of cases/rows: 96 3. Missing data codes: N/A 4. Variable List A. Name: Date Description: Date of observation B. Name: Replicate Description: Experimental replicate number C. Name: Conc_nom Description: Nominal pyriproxyfen concentration in ng/L added to laboratory water D. Name: Conc_adj Description: Adjusted pyriproxyfen concentration in ng/L added to laboratory water. Concentrations adjusted by the regression equation between measured and nominal concentrations in a subset of samples as described in DOI 10.1002/etc.5392 E. Name: Males Description: Number of male neonates observed F. Name: Females Description: Number of female neonates observed G. Name: Total Description: Total number of neonates observed H. Name: Perc_Male Description: Percent of neonates that were male ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: diffExp_meanTPM_INvsOut.csv ----------------------------------------- 1. Number of variables: 10 2. Number of cases/rows: 36992 3. Missing data codes: N/A 4. Variable List A. Name: ID Description: Transcript identification B. Name: In Description: transcripts per million (TPM) genes treated with untreated stormwater C. Name: Out Description: transcripts per million (TPM) quantity for daphnia genes treated with treated stormwater D. Name: log2FoldChange Description: log2 for In divided by Out TPMs E. Name: padj Description: FDR adjusted P value for log2FoldChange F. Name: geneID Description: Gene identification G. Name: Name Description: Gene name H. Name: CDD Description: Conserved Domains Database code I. Name: PFAM Description: The protein families database code J. Name: GO Description: Gene ontology code ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: diffExp_meanTPM_LabvsWetland.csv ----------------------------------------- 1. Number of variables: 13 2. Number of cases/rows: 17078 3. Missing data codes: N/A 4. Variable List A. Name: ID Description: Transcript identification B. Name: meanTPM.Lab0 Description: Average transcripts per million (TPM) of laboratory water exposures of daphnia with 0 ng/L pyriproxyfen C. Name: meanTPM.Wetland0 Description: Average transcripts per million (TPM) of wetland water exposures of daphnia with 0 ng/L pyriproxyfen D. Name: meanTPM.Wetland119 Description: Average transcripts per million (TPM) of wetland water exposures of daphnia with 119 ng/L pyriproxyfen E. Name: log2FC.Lab0vsWet119 Description: log2 of TPMs from Treatment 1 (eg, lab water with 0 ng/L pyriproxyfen) divided by TPMs from Treatment 2 (eg, Wetland water with 119 ng/L pyriproxyfen) F. Name: padj.Lab0vsWet119 Description: FDR adjusted P value for log2FoldChange for each noted comparison G. Name: log2FC.Lab0vsWet0 Description: log2 of TPMs from Treatment 1 (eg, lab water with 0 ng/L pyriproxyfen) divided by TPMs from Treatment 2 (eg, Wetland water with 0 ng/L pyriproxyfen) H. Name: padj.Lab0vsWet0 Description: FDR adjusted P value for log2FoldChange for each noted comparison I. Name: geneID Description: Gene identification J. Name: Name Description: Gene name K. Name: CDD Description: Conserved Domains Database code L. Name: PFAM Description: The protein families database code M. Name: GO Description: Geno ontology code ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: diffExp_meanTPM_PPF.csv ----------------------------------------- 1. Number of variables: 10 2. Number of cases/rows: 36992 3. Missing data codes: N/A 4. Variable List A. Name: Pyriproxyfen 0 or 119 Description: Exposure of daphnia to 0 or 119 ng/L pyriproxyfen A. Name: ID Description: Transcript identification B. Name: 0 Description: 0 ng/L pyriproxyfen exposure of daphnia C. Name: 119 Description: 119 ng/L pyriproxyfen exposure of daphnia D. Name: log2FoldChange Description: log2 of TPMs from Treatment with 0 ng/L pyriproxyfen divided by TPMs from Treatment with 119 ng/L pyriproxyfen E. Name: padj Description: FDR adjusted P value for log2FoldChange for TPMs from Treatment with 0 ng/L pyriproxyfen divided by TPMs from Treatment with 119 ng/L pyriproxyfen F. Name: geneID Description: Gene identification G. Name: Name Description: Gene name H. Name: CDD Description: Conserved Domains Database code I. Name: PFAM Description: The protein families database code J. Name: GO Description: Geno ontology code ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: diffExp_meanTPM_Seasons.csv ----------------------------------------- 1. Number of variables: 15 2. Number of cases/rows: 36992 3. Missing data codes: N/A TPM Definition: Average transcripts per million (TPM) of daphnia 4. Variable List A. Name: ID Description: Transcript identification B. Name: Spring Description: Exposure of daphnia to stormwater collected in the spring C. Name: Summer Description: Exposure of daphnia to stormwater collected in the early summer D. Name: Summer2 Description: Exposure of daphnia to stormwater collected in the late summer E. Name: log2FC.SPvsSU Description: log2 of TPMs from Treatment (spring collected water) divided by TPMs from Treatment (summer collected water) F. Name: padj.SPvsSU Description: FDR adjusted P value for log2FoldChange of TPMs from Treatment (spring collected water) divided by TPMs from Treatment (summer collected water) G. Name: log2FC.SPvsSU2 Description: log2 of TPMs from Treatment (spring collected water) divided by TPMs from Treatment (late summer collected water) H. Name: padj.SPvsSU2 Description: FDR adjusted P value for log2FoldChange of TPMs from Treatment (spring collected water) divided by TPMs from Treatment (late summer collected water) I. Name: log2FC.SUvsSU2 Description: log2 of TPMs from Treatment (summer collected water) divided by TPMs from Treatment (late summer collected water) J. Name: padj.SUvsSU2 Description: FDR adjusted P value for log2FoldChange of TPMs from Treatment (summer collected water) divided by TPMs from Treatment (late summer collected water) K. Name: geneID Description: Gene identification L. Name: Name Description: Gene name M. Name: CDD Description: Conserved Domains Database code N. Name: PFAM Description: The protein families database code O. Name: GO Description: Geno ontology code