This readme.txt file was generated on <20200529> by ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset 16S RNA data for biofilm in contact with antimicrobial peptide coatings 2. Author Information Principal Investigator Contact Information Name: Conrado Aparicio Institution: University of Minnesota Address:16-212 Moos Tower, 515 Delaware St. SE, Minneapolis, MN 55455 Email: apari003@umn.edu ORCID: 0000-0003-2969-6067 Associate or Co-investigator Contact Information Name: Dina Moussa Institution:University of Minnesota Address: 16-212 Moos Tower, 515 Delaware St. SE, Minneapolis, MN 55455 Email:mouss023@umn.edu ORCID:0000-0003-0376-3452 3. Date of data collection (single date, range, approximate date): 20171121 4. Geographic location of data collection (where was data collected?): Data generated by the University of Minnesota Genomics Center and processed by the University of Minnesota Informatics Institute 5. Information about funding sources that supported the collection of the data: National Institute of Dental and Craniofacial Research-National Institute of Health-USA R01-DE026117-01 R90-DE023058-05 -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC0 1.0 Universal 2. Links to publications that cite or use the data: D.G. Moussa, C. Aparicio. "Targeting the Oral Plaque Microbiome with Immobilized Anti-biofilm Peptides at Tooth-restoration Interfaces". Manuscript Under Review 3. Links to other publicly accessible locations of the data: None 4. Links/relationships to ancillary data sets: None 5. Was data derived from another source? No If yes, list source(s): 6. Recommended citation for the data: Aparicio, Conrado and Moussa, Dina G. (2020). 16S RNA data for biofilm in contact with antimicrobial peptide coatings. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/3dh2-xh66. --------------------- DATA & FILE OVERVIEW --------------------- 1. File and Folder List A. Analysis (folder) Short description: It includes XLSX file for all taxa detected listed in rows in all samples listed in columns. The yellow columns show the samples with low sequencing-quality reads. Also, it has a txt file for the data after filtering. B. Resources (folder) Short description: It includes a fastQC folder for fastQC reports for all samples. There are 4 groups, 6 samples each, total 24 samples + 2 water samples. The groups are: 1. Control "CTRL" , 2. 1018 peptide "1018", DJK2 peptide "DJK2", and D-GL13K peptide "GL13k". There are additional 2 water control samples. Each samples labelled twice, without or with "pma", indicating the pre-amplification treatment for selective detection of viable cells. Since we used a Dual-Indexing Approach, for a paired-end run, one Read 1 (R1) and one Read 2 (R2) FASTQ file is created for each sample for each lane. So, there are a total of 53 fastQC reports, each sample has R1 and R2. C. File name: index.html Short description: It is the Illumina BasicQC Report showing the number of samples (26) and the library type (paired-end). It includes all the links for Fastq Quality Plots for all the samples described above. 2. Relationship between files: dataset 3. Additional related data collected that was not included in the current data package: None -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods. 2016;13(7):581-3. Epub 2016/05/24. doi: 10.1038/nmeth.3869. PubMed PMID: 27214047; PubMed Central PMCID: PMCPMC4927377. Martin M. Cutadapt Removes Adapter Sequences From High-Throughput Sequencing Reads. EMBnetjournal. 2011;17(1):10-2. 2. Methods for processing the data: Illumina data was processed with cutadapt for removal of adapters, DADA2 for quality processing and an amplicon sequence variant table (ASV) analysis following current suggested practices. Primers where removed with FilterandTrim, paired reads were merged with mergePairs, Bimeras where removed in with removeBimeraDenovo, and Silva version 132 was used for taxonomy identification. Sequences with a length < 251 and > 255 were removed as well as sequences with a total read count across all samples equal to 1. Further analysis of the ASV table was performed in R with the vegan, reshape2, ggplot2, grid, gridExtra, plyr, dplyr packages. The alpha diversity indices of Shannon, Simpson, Inverse Simpson (InvSympson) were calculated at 97% identity. Beta diversity analysis was performed by principal coordinates analysis (PCoA) based on Bray-Curtis distances at the OTU level. Using R-vegan function Adonis, Permutational multivariate analysis of variance (PERMANOVA) and TukeyHSD for multiple comparisons were used on treatment variables and alpha diversity statistics Shannon, Simpson and InvSympson. 3. Instrument- or software-specific information needed to interpret the data: R or QIIME2 software 4. Standards and calibration information, if appropriate: As mentioned above. 5. Environmental/experimental conditions: In-vitro culturing of oral biofilms. 6. Describe any quality-assurance procedures performed on the data: Quality filtering via QC tools (1) FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), which can evaluate multiple aspects of the raw sequencing data quality, such as per base quality, per base GC content and sequence length distribution. (2) Fastx-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/), which is a collection of command line tools for short-reads FASTA/FASTQ files preprocessing, including read length trimming, identical reads collapsing, adapter removing, format converting, etc. 7. People involved with sample collection, processing, analysis and/or submission: Dina Moussa Trevor Gould Corbin and Allison for the University of Minnesota Genomic Center Conrado Aparicio ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Aparicio.xlsx ----------------------------------------- 1. Number of variables: Two variables 2. Number of cases/rows: 4 treatment X 2 pre-amplification treatment transposed into columns, 3 samples each. 3. Missing data codes: None Code/symbol Definition Code/symbol Definition 4. Variable List A. Name: Description: Value labels if appropriate DGL13K, 1018, DJK2, CTRL B. Name: Description: Value labels if appropriate "pma"