Data supporting "Sodium azide mutagenesis induces a unique pattern of mutations"
Loading...
Persistent link to this item
Statistics
View StatisticsKeywords
Collection Period
Date Completed
item.page.dateupdated
Time period coverage
Geographic coverage
Source information
Journal Title
Journal ISSN
Volume Title
Published Date
Author Contact
Liu, Chaochih
chaochih.l@gmail.com
chaochih.l@gmail.com
Abstract
This dataset contains genomic variant calls, structural variation data, and phenotypic measurements related to Hordeum vulgare (barley). It includes VCF and BED files from multiple sequencing platforms—10x Genomics, Oxford Nanopore, PacBio, and Illumina featuring raw, filtered, and phased variants. Structural variant calls in BEDPE format and callable/un-callable region lists are provided. Phenotype data from field trials provide spatially adjusted and raw trait values for agronomic characteristics.
These data provide insight on genomic diversity, structural variation, and genotype-phenotype associations in barley. The use of multiple sequencing technologies enables cross-validation of variant calls, allowing for high-confidence genome annotation and population genetics studies. The filtered VCFs help isolate biologically relevant mutations, while the callable region data ensure rigorous quality control for variant interpretation.
This dataset is being released as it is supporting the associated paper to this research and also to support the broader research community in comparative genomics, evolutionary biology, and crop improvement studies. By making these resources publicly available, we aim to enhance reproducibility, enable novel insights into barley’s genomic architecture, and assist breeding efforts focused on climate resilience and agronomic performance.
Description
1. File List
A. Filename: Barley_MorexV3_pseudomolecules_parts.entropy_0.7_masked.subtracted_gene_ann.bed
Short description: A mask of low complexity sequence using BBMask
B. Filename: M01-3-3_dels.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
C. Filename: M01-3-3_large_sv_calls.bedpe
Short description: Paired end BED file from 10x Genomics Longranger pipeline.
D. Filename: M01-3-3_large_svs.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
E. Filename: M01-3-3_phased_variants.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
F. Filename: M01_ont_partsRefv3_90.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
G. Filename: M20-2-2_dels.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
H. Filename: M20-2-2_large_sv_calls.bedpe
Short description: Paired end BED file from 10x Genomics Longranger pipeline.
I. Filename: M20-2-2_large_svs.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
J. Filename: M20-2-2_phased_variants.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
K. Filename: M20_ont_partsRefv3_90.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
L. Filename: M29-2-2_dels.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
M. Filename: M29-2-2_large_sv_calls.bedpe
Short description: Paired end BED file from 10x Genomics Longranger pipeline.
N. Filename: M29-2-2_large_svs.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
O. Filename: M29-2-2_phased_variants.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
P. Filename: M29_ont_partsRefv3_90.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
Q. Filename: Morex_85X_sorted_renamed.vcf
Short description: Sorted and renamed VCF with Oxford Nanopore data.
R. Filename: Morex_ont_partsRefv3_90.vcf.gz
Short description: Unfiltered VCF of Oxford Nanopore data.
S. Filename: Morex_pacbio_90.vcf.gz
Short description: Unfiltered VCF of PacBio data.
T. Filename: concat_mut_3_lines_dels_10xGenomics_and_ONTSniffles2.callable.noRefDiffs.final.noConflicts.vcf
Short description: Filtered VCF for 3 mutated lines including only deletions from 10x Genomics and Oxford Nanopore data called by Sniffles2.
U. Filename: hybrid13.INDELs.common.vcf.gz
Short description: Filtered VCF for hybrid samples (no treatment) including only common variants.
V. Filename: hybrid13.INDELs.rare.vcf.gz
Short description: Filtered VCF for hybrid samples (no treatment) including only rare variants.
W. Filename: hybrid13.SNPs.common.vcf.gz
Short description: Filtered VCF for hybrid samples (no treatment) including only common variants.
X. Filename: hybrid13.SNPs.rare.vcf.gz
Short description: Filtered VCF for hybrid samples (no treatment) including only common variants.
Y. Filename: hybrid13_indels_biallelic.callable.vcf.gz
Short description: Filtered VCF only including filters for biallelic sites and callable regions.
Z. Filename: hybrid13_snps_biallelic.callable.vcf.gz
Short description: Filtered VCF only including filters for biallelic sites and callable regions.
AA. Filename: mmx 2020-2022_plot level_031623.csv
Short description: Plot level phenotype data. Traits which have a continuous scale ("yield_bua", "yield_kgha", "heading_dap", and "height_cm") were analyzed for spatial variation using moving average. Raw data for those traits will appear as just listed, and spatially adjusted data will have the extension "_use". Traits with a discrete scoring scale ("lodging_score") were not analyzed for spatial variation. The trait title is "lodging_score_use" because the raw value is the final value used by our summary generation function in R.
AB. Filename: mmx 2020-2022_trial avg_031623.csv
Short description: Trial level phenotype data. Traits which have a continuous scale ("yield_bua", "yield_kgha", "heading_dap", and "height_cm") were analyzed for spatial variation using moving average. Raw data for those traits will appear as just listed, and spatially adjusted data will have the extension "_use". Traits with a discrete scoring scale ("lodging_score") were not analyzed for spatial variation. The trait title is "lodging_score_use" because the raw value is the final value used by our summary generation function in R.
AC. Filename: morex-sample2_dels.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
AD. Filename: morex-sample2_large_svs.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
AE. Filename: morex-sample2_phased_variants.vcf.gz
Short description: Unfiltered VCF from 10x Genomics Longranger pipeline.
AF. Filename: morex_v3_callable.bed
Short description: List of callable regions in BED format.
AG. Filename: morex_v3_callable.low_complexity_excluded.bed
Short description: List of callable regions excluding low complexity regions in BED format.
AH. Filename: morex_v3_combined_uncallable.low_complexity.nochrUn.bed
Short description: List of uncallable regions plus low complexity regions in BED format.
AI. Filename: morex_v3_combined_uncallable.low_complexity.nochrUn.pseudo_pos.bed
Short description: List of uncallable regions plus low complexity regions in BED format.
AJ. Filename: morex_v3_combined_uncallable.nochrUn.bed
Short description: List of uncallable regions in BED format.
AK. Filename: mut8_and_3mut10xGenomics.INDELs.private.vcf.gz
Short description: Filtered VCF containing variants private to each mutated line.
AL. Filename: mut8_and_3mut10xGenomics.SNPs.private.vcf.gz
Short description: Filtered VCF containing variants private to each mutated line.
AM. Filename: mut8_and_hybrid_barley_raw_variants_indels.vcf.gz
Short description: Unfiltered VCF for Illumina short-read sequencing data called by GATK.
AN. Filename: mut8_and_hybrid_barley_raw_variants_snps.vcf.gz
Short description: Unfiltered VCF for Illumina short-read sequencing data called by GATK.
FILE AVAILABLE FOR DOWNLOAD AT: https://drive.google.com/file/d/1zO-hmka75PQnmmZult3ddJ_h2JjzKgtN/view?usp=share_link
AO. Filename: mut_3_lines_dels_merged.callable.noRefDiffs.private.supports.final.vcf
Short description: Filtered VCF including only deletions in callable regions that are private to each mutated line with visual validation.
AP. Filename: mut_3_lines_large_svs_merged.callable.noRefDiffs.vcf
Short description: Filtered VCF including large SVs in callable regions.
AQ. Filename: pixy_pi_400bp_win.gt0.02.bed
Short description: Windowed diversity estimates using Pixy in BED format.
Referenced by
Chaochih Liu, Giulia Frascarelli, Adrian O. Stec, Shane Heinen, Li Lei, Skylar R. Wyant, Erik Legg, Monika Spiller, Gary J. Muehlbauer, Kevin P. Smith, Justin C. Fay, Peter L. Morrell. (2024). Sodium azide mutagenesis induces a unique pattern of mutations. bioRxiv 2024.05.06.592067
https://doi.org/10.1101/2024.05.06.592067
https://doi.org/10.1101/2024.05.06.592067
Series
Related to
item.page.isreplacedby
License
CC0 1.0 Universal
http://creativecommons.org/publicdomain/zero/1.0/
http://creativecommons.org/publicdomain/zero/1.0/
Publisher
Collections
Funding Information
- University of Minnesota Informatics Institute MnDRIVE Graduate Assistantship award to Chaochih Liu - National Science Foundation (grant IOS-1339393) - Minnesota Agricultural Experiment Station fund (MIN-13-122 in support of Peter L. Morrell)
item.page.sponsorshipfunderid
item.page.sponsorshipfundingagency
item.page.sponsorshipgrant
Previously Published Citation
Other identifiers
Suggested Citation
Liu, Chaochih; Morrell, Peter; Frascarelli, Giulia; Stec, Adrian; Heinen, Shane; Lei, Li; Wyant, Skylar; Legg, Erik; Spiller, Monika; Muehlbauer, Gary; Smith, Kevin; Fay, Justin. (2025). Data supporting "Sodium azide mutagenesis induces a unique pattern of mutations". Retrieved from the Data Repository for the University of Minnesota (DRUM), https://doi.org/10.13020/sewd-qq35.
View/Download File
File View/Open
Description
Size
Morrell_Readme_2025
Description of the data
(17.81 KB)
Barley_MorexV3_pseudomolecules_parts.entropy_0.7_masked.subtracted_gene_ann.bed
Mask of low complexity sequence using BBMask
(5.53 MB)
M01-3-3_dels.vcf.gz
Unfiltered VCF from 10x Genomics Longranger pipeline
(30.71 KB)
M01-3-3_large_sv_calls.bedpe
Paired end BED file from 10x Genomics Longranger pipeline
(61.19 KB)
M01-3-3_large_svs.vcf.gz
Unfiltered VCF from 10x Genomics Longranger pipeline
(557.9 KB)
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.
