Data supporting "Sodium azide mutagenesis induces a unique pattern of mutations"

Loading...
Thumbnail Image
Statistics
View Statistics

Keywords

Collection Period

Date Completed

item.page.dateupdated

Time period coverage

Geographic coverage

Source information

Journal Title

Journal ISSN

Volume Title

Published Date

Author Contact

Liu, Chaochih
chaochih.l@gmail.com

Abstract

This dataset contains genomic variant calls, structural variation data, and phenotypic measurements related to Hordeum vulgare (barley). It includes VCF and BED files from multiple sequencing platforms—10x Genomics, Oxford Nanopore, PacBio, and Illumina featuring raw, filtered, and phased variants. Structural variant calls in BEDPE format and callable/un-callable region lists are provided. Phenotype data from field trials provide spatially adjusted and raw trait values for agronomic characteristics. These data provide insight on genomic diversity, structural variation, and genotype-phenotype associations in barley. The use of multiple sequencing technologies enables cross-validation of variant calls, allowing for high-confidence genome annotation and population genetics studies. The filtered VCFs help isolate biologically relevant mutations, while the callable region data ensure rigorous quality control for variant interpretation. This dataset is being released as it is supporting the associated paper to this research and also to support the broader research community in comparative genomics, evolutionary biology, and crop improvement studies. By making these resources publicly available, we aim to enhance reproducibility, enable novel insights into barley’s genomic architecture, and assist breeding efforts focused on climate resilience and agronomic performance.

Description

1. File List A. Filename: Barley_MorexV3_pseudomolecules_parts.entropy_0.7_masked.subtracted_gene_ann.bed Short description: A mask of low complexity sequence using BBMask B. Filename: M01-3-3_dels.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. C. Filename: M01-3-3_large_sv_calls.bedpe Short description: Paired end BED file from 10x Genomics Longranger pipeline. D. Filename: M01-3-3_large_svs.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. E. Filename: M01-3-3_phased_variants.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. F. Filename: M01_ont_partsRefv3_90.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. G. Filename: M20-2-2_dels.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. H. Filename: M20-2-2_large_sv_calls.bedpe Short description: Paired end BED file from 10x Genomics Longranger pipeline. I. Filename: M20-2-2_large_svs.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. J. Filename: M20-2-2_phased_variants.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. K. Filename: M20_ont_partsRefv3_90.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. L. Filename: M29-2-2_dels.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. M. Filename: M29-2-2_large_sv_calls.bedpe Short description: Paired end BED file from 10x Genomics Longranger pipeline. N. Filename: M29-2-2_large_svs.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. O. Filename: M29-2-2_phased_variants.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. P. Filename: M29_ont_partsRefv3_90.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. Q. Filename: Morex_85X_sorted_renamed.vcf Short description: Sorted and renamed VCF with Oxford Nanopore data. R. Filename: Morex_ont_partsRefv3_90.vcf.gz Short description: Unfiltered VCF of Oxford Nanopore data. S. Filename: Morex_pacbio_90.vcf.gz Short description: Unfiltered VCF of PacBio data. T. Filename: concat_mut_3_lines_dels_10xGenomics_and_ONTSniffles2.callable.noRefDiffs.final.noConflicts.vcf Short description: Filtered VCF for 3 mutated lines including only deletions from 10x Genomics and Oxford Nanopore data called by Sniffles2. U. Filename: hybrid13.INDELs.common.vcf.gz Short description: Filtered VCF for hybrid samples (no treatment) including only common variants. V. Filename: hybrid13.INDELs.rare.vcf.gz Short description: Filtered VCF for hybrid samples (no treatment) including only rare variants. W. Filename: hybrid13.SNPs.common.vcf.gz Short description: Filtered VCF for hybrid samples (no treatment) including only common variants. X. Filename: hybrid13.SNPs.rare.vcf.gz Short description: Filtered VCF for hybrid samples (no treatment) including only common variants. Y. Filename: hybrid13_indels_biallelic.callable.vcf.gz Short description: Filtered VCF only including filters for biallelic sites and callable regions. Z. Filename: hybrid13_snps_biallelic.callable.vcf.gz Short description: Filtered VCF only including filters for biallelic sites and callable regions. AA. Filename: mmx 2020-2022_plot level_031623.csv Short description: Plot level phenotype data. Traits which have a continuous scale ("yield_bua", "yield_kgha", "heading_dap", and "height_cm") were analyzed for spatial variation using moving average. Raw data for those traits will appear as just listed, and spatially adjusted data will have the extension "_use". Traits with a discrete scoring scale ("lodging_score") were not analyzed for spatial variation. The trait title is "lodging_score_use" because the raw value is the final value used by our summary generation function in R. AB. Filename: mmx 2020-2022_trial avg_031623.csv Short description: Trial level phenotype data. Traits which have a continuous scale ("yield_bua", "yield_kgha", "heading_dap", and "height_cm") were analyzed for spatial variation using moving average. Raw data for those traits will appear as just listed, and spatially adjusted data will have the extension "_use". Traits with a discrete scoring scale ("lodging_score") were not analyzed for spatial variation. The trait title is "lodging_score_use" because the raw value is the final value used by our summary generation function in R. AC. Filename: morex-sample2_dels.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. AD. Filename: morex-sample2_large_svs.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. AE. Filename: morex-sample2_phased_variants.vcf.gz Short description: Unfiltered VCF from 10x Genomics Longranger pipeline. AF. Filename: morex_v3_callable.bed Short description: List of callable regions in BED format. AG. Filename: morex_v3_callable.low_complexity_excluded.bed Short description: List of callable regions excluding low complexity regions in BED format. AH. Filename: morex_v3_combined_uncallable.low_complexity.nochrUn.bed Short description: List of uncallable regions plus low complexity regions in BED format. AI. Filename: morex_v3_combined_uncallable.low_complexity.nochrUn.pseudo_pos.bed Short description: List of uncallable regions plus low complexity regions in BED format. AJ. Filename: morex_v3_combined_uncallable.nochrUn.bed Short description: List of uncallable regions in BED format. AK. Filename: mut8_and_3mut10xGenomics.INDELs.private.vcf.gz Short description: Filtered VCF containing variants private to each mutated line. AL. Filename: mut8_and_3mut10xGenomics.SNPs.private.vcf.gz Short description: Filtered VCF containing variants private to each mutated line. AM. Filename: mut8_and_hybrid_barley_raw_variants_indels.vcf.gz Short description: Unfiltered VCF for Illumina short-read sequencing data called by GATK. AN. Filename: mut8_and_hybrid_barley_raw_variants_snps.vcf.gz Short description: Unfiltered VCF for Illumina short-read sequencing data called by GATK. FILE AVAILABLE FOR DOWNLOAD AT: https://drive.google.com/file/d/1zO-hmka75PQnmmZult3ddJ_h2JjzKgtN/view?usp=share_link AO. Filename: mut_3_lines_dels_merged.callable.noRefDiffs.private.supports.final.vcf Short description: Filtered VCF including only deletions in callable regions that are private to each mutated line with visual validation. AP. Filename: mut_3_lines_large_svs_merged.callable.noRefDiffs.vcf Short description: Filtered VCF including large SVs in callable regions. AQ. Filename: pixy_pi_400bp_win.gt0.02.bed Short description: Windowed diversity estimates using Pixy in BED format.

Referenced by

Chaochih Liu, Giulia Frascarelli, Adrian O. Stec, Shane Heinen, Li Lei, Skylar R. Wyant, Erik Legg, Monika Spiller, Gary J. Muehlbauer, Kevin P. Smith, Justin C. Fay, Peter L. Morrell. (2024). Sodium azide mutagenesis induces a unique pattern of mutations. bioRxiv 2024.05.06.592067
https://doi.org/10.1101/2024.05.06.592067

Series

Related to

item.page.isreplacedby

License

CC0 1.0 Universal
http://creativecommons.org/publicdomain/zero/1.0/

Publisher

Funding Information

- University of Minnesota Informatics Institute MnDRIVE Graduate Assistantship award to Chaochih Liu - National Science Foundation (grant IOS-1339393) - Minnesota Agricultural Experiment Station fund (MIN-13-122 in support of Peter L. Morrell)

item.page.sponsorshipfunderid

item.page.sponsorshipfundingagency

item.page.sponsorshipgrant

Previously Published Citation

Other identifiers

Suggested Citation

Liu, Chaochih; Morrell, Peter; Frascarelli, Giulia; Stec, Adrian; Heinen, Shane; Lei, Li; Wyant, Skylar; Legg, Erik; Spiller, Monika; Muehlbauer, Gary; Smith, Kevin; Fay, Justin. (2025). Data supporting "Sodium azide mutagenesis induces a unique pattern of mutations". Retrieved from the Data Repository for the University of Minnesota (DRUM), https://doi.org/10.13020/sewd-qq35.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.