This readme.txt file was generated on <20230928> by Recommended citation for the data: Abdul Halim, Mohd Farid; Costa, Kyle C.; Fonseca, Dallas; Niehaus, Thomas. (2023). Transcriptomics analysis (RNA-sequencing) of Methanococcus maripaludis wild-type strain and moeA deletion mutant.. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/YY0Q-8870. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset Transcriptomics analysis (RNA-sequencing) of Methanococcus maripaludis wild-type strain and moeA deletion mutant. 2. Author Information Principal Investigator Contact Information Name: Kyle Costa Institution: University of Minnesota Address: 670 Biological Sciences Center 1445 Gortner Avenue St. Paul, MN 55108 Email: kcosta@umn.edu ORCID: https://orcid.org/0000-0003-0407-1431 Associate or Co-investigator Contact Information Name: Mohd Farid Abdul Halim Institution: University of Minnesota Address: 685 Biological Sciences Center 1445 Gortner Avenue St. Paul, MN 55108 Email: faridh@umn.edu ORCID: https://orcid.org/0000-0002-2327-3621 Associate or Co-investigator Contact Information Name: Dallas Fonseca Institution: University of Minnesota Address: 685 Biological Sciences Center 1445 Gortner Avenue St. Paul, MN 55108 Email: fonse039@umn.edu ORCID: Associate or Co-investigator Contact Information Name: Thomas Niehaus Institution: University of Minnesota Address: 685 Biological Sciences Center 1445 Gortner Avenue St. Paul, MN 55108 Email: tniehaus@umn.edu ORCID: https://orcid.org/0000-0002-3575-8001 3. Date of data collection 20230111 - 20230217 4. Geographic location of data collection (where was data collected?): SeqCenter, 91 43rd St Suite 250, Pittsburgh, PA 15201 5. Information about funding sources that supported the collection of the data: U.S. Department of Energy, Office of Science, Basic Energy Sciences under grant number DE-SC0019148 6. Overview of the data (abstract): Transcriptomic analysis of total RNA for Methanococcus maripaludis grown in McCas-formate medium. The data compared the RNA abundance between the wild-type strain and the mutant strain with the gene encoding molybdopterin molybdotransferase (moeA, MMP1619) deletion. Released for the submission of manuscript for publication. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC0 1.0 Universal (https://creativecommons.org/publicdomain/zero/1.0/) 2. Links to publications that cite or use the data: Mohd Farid Abdul Halim, Dallas R Fonseca, Thomas D Niehaus, Kyle C Costa. 2023. Functionally redundant formate dehydrogenases enable formate-dependent growth in Methanococcus maripaludis. bioRxiv 2023.05.09.540023 https://doi.org/10.1101/2023.05.09.540023 3. Was data derived from another source? No If yes, list source(s): 4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: alignment.tsv Short description: Total RNA reads for all samples submitted. B. Filename: hisat2_counts.tsv Short description: RNA reads count based on the homolog identified. C. Filename: 143 Differentially Expressed Genes Heatmap.tiff Short description: 143 Differentially Expressed Genes Heatmap data image D. Filename: 143 Differentially Expressed Genes In DmoeA vs WT.tsv Short description: 143 Differentially Expressed Genes list (tsv format) E. Filename: 143 Differentially Expressed Genes In DmoeA vs WT.tsv.xlsx Short description: 143 Differentially Expressed Genes list (Excel format) F. Filename: All Quantified Genes.tsv Short description: All Quantified Expressed Genes list (tsv format) G. Filename: All Quantified Genes.tsv.xlsx Short description: All Quantified Expressed Genes list (Excel format) H. Filename: Sample PCA.pdf Short description: All sample PCA analysis results I. Filename: SeqCenter Project Report Short description: Protocols of experiment by Seqcenter 2. Relationship between files: The files contain the raw transcriptomics data of independent samples (biological replicates) of RNA extract from Methanococcus maripaludis wild-type and DmoeA strains. -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Five ml of Methanococcus maripaludis wild-type and DmoeA strains were collected by centrifugation. The total RNA were extracted from the cell culture. 2. Methods for processing the data: Quality control and adapter trimming was performed with bcl-convert [1]. Read mapping was performed with HISAT2 [2]. Read quantification was performed using Subread’s featureCounts [3] functionality. Mapping statistics can be found in the ‘alignment.tsv’ file. Raw, quantified counts are available in the “hisat2_counts.tsv” file. Read counts loaded into R [4] and were normalized using edgeR’s [5] Trimmed Mean of M values (TMM) algorithm. Subsequent values were then converted to counts per million (cpm). Differential expression analysis was performed using edgeR’s Quasi-Linear F-Test (qlfTest) functionality against treatment groups. The file “All Quantified Genes.tsv” contains the results of the qlfTest for all genes in addition to the normalized counts per million. The file “143 Differentially Expressed Genes In DmoeA vs WT.tsv” is a subset of the aforementioned file with |logFC| > 1 and p < .05. The differentially expressed gene’s normalized counts per million were then used to create a heatmap, “143 Differentially Expressed Genes Heatmap.tiff”. The PCA, “Sample PCA.pdf” is based on the global expression of all genes in the “All Quantified Genes.tsv” file. No pathway analysis could be performed for your selected reference. References: [1] bcl-convert: A proprietary Illumina software for the conversion of bcl files to basecalls. https://support-docs.illumina.com/SW/BCL_Convert/Content/SW/FrontPages/BCL_Convert.htm [2] Kim, D., Paggi, J.M., Park, C. et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37, 907–915 (2019). [3] Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7):923-30, 2014 [4] R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ [5] Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140 3. Instrument- or software-specific information needed to interpret the data: NA 4. Standards and calibration information, if appropriate: NA 5. Environmental/experimental conditions: NA 6. Describe any quality-assurance procedures performed on the data: NA 7. People involved with sample collection, processing, analysis and/or submission: Mohd Farid Abdul Halim Dallas Fonseca SeqCenter RNA Sequencing Services ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: hisat2_counts.tsv ----------------------------------------- 1. Number of variables: 5 2. Number of cases/rows: 1722 3. Missing data codes: There is no missing data. 4. Variable List A. Name: Locustag Description: The gene locus tag identification for Methanococcus maripaludis S2 B. Name: DmoeA_1 Description: Raw RNA reads count in the moeA deletion strain - Replicate #1 C. Name: DmoeA_2 Description: Raw RNA reads count in the moeA deletion strain - Replicate #2 D. Name: WT_1 Description: Raw RNA reads count in the wild-type strain - Replicate #1 E. Name: WT_2 Description: Raw RNA reads count in the wild-type strain - Replicate #2 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: 143 Differentially Expressed Genes In DmoeA vs WT.tsv ----------------------------------------- 1. Number of variables: 10 2. Number of cases/rows: 143 3. Missing data codes: There is no missing data. 4. Variable List A. Name: Locustag Description: The gene locus tag identification for Methanococcus maripaludis S2 B. Name: Description Description: Annotation of the gene of interest C. Name: FeatureType Description: Denote gene of interest is based from "Complete DNA sequence (CDS)" D. Name: logFC Description: Log-fold changes of the RNA reads count E. Name: PValue Description: The probability that a particular statistical measure, such as the mean or standard deviation, of an assumed probability distribution will be greater than or equal to (or less than or equal to in some instances) observed results. F. Name: FDR Description: False Discovery Rate G. Name: WT_1 Description: Normalized counts per million (cpm) of RNA reads for the wild-type strain - Replicate #1 H. Name: WT_2 Description: Normalized counts per million (cpm) of RNA reads for the wild-type strain - Replicate #2 I. Name: DmoeA_1 Description: Normalized counts per million (cpm) of RNA reads for the moeA deletion strain - Replicate #1 J. Name: DmoeA_2 Description: Normalized counts per million (cpm) of RNA reads for the moeA deletion strain - Replicate #2 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: All Quantified Genes.tsv ----------------------------------------- 1. Number of variables: 10 2. Number of cases/rows: 1530 3. Missing data codes: There is no missing data. 4. Variable List A. Name: Locustag Description: The gene locus tag identification for Methanococcus maripaludis S2 B. Name: Description Description: Annotation of the gene of interest C. Name: FeatureType Description: Denote gene of interest is based from "Complete DNA sequence (CDS)" D. Name: logFC Description: Log-fold changes of the RNA reads count E. Name: PValue Description: The probability that a particular statistical measure, such as the mean or standard deviation, of an assumed probability distribution will be greater than or equal to (or less than or equal to in some instances) observed results. F. Name: FDR Description: False Discovery Rate G. Name: WT_1 Description: Normalized counts per million (cpm) of RNA reads for the wild-type strain - Replicate #1 H. Name: WT_2 Description: Normalized counts per million (cpm) of RNA reads for the wild-type strain - Replicate #2 I. Name: DmoeA_1 Description: Normalized counts per million (cpm) of RNA reads for the moeA deletion strain - Replicate #1 J. Name: DmoeA_2 Description: Normalized counts per million (cpm) of RNA reads for the moeA deletion strain - Replicate #1