Title: Discovery of barley gene candidates for low temperature and drought tolerance via environmental association Authors: Li Lei, Ana M. Poets, Chaochih Liu, Skylar R. Wyant, Paul J. Hoffman, Corey K. Carter, Richard M. Trantow, Brian G. Shaw, Xin Li, Gary J. Muehlbauer, Fumiaki Katagiri, Peter L. Morrell Abstract: Barley is cultivated from the equator to the Arctic Circle. The wild progenitor species, Hordeum vulgare ssp. spontaneum, occupies a relatively narrow latitudinal range (~30 - 40˚ N) primarily at low elevation, < 1500 m. Adaptation to the range of cultivation has occurred over ~8,000 years. The genetic basis of this adaptation is amenable to study through environmental association. Using genotyping from 7,864 SNPs in 784 barley landraces, we perform mixed model association analysis relative to bioclimatic variables and analysis of allele frequency differentiation across multiple partitions of the data. Using resequencing data from a subset of the landraces, we test for linkage disequilibrium (LD) between SNPs queried in genotyping and SNPs in neighboring loci. We identify seven loci previously reported to contribute to adaptive differences to flowering time and abiotic stress in barley and four loci previously identified in other plant species. In many cases, patterns of LD are consistent with the causative variant occurring in the immediate vicinity of the queried SNP. The identification of barley orthologs to well characterized genes may provide new understanding of the nature of adaptive variation and could permit a more targeted use of potentially adaptive variants in barley breeding and germplasm improvement. ----------------------------------------- Supplemental datasets: ----------------------------------------- Supplemental data 1: VCF file for the 6152 SNPs from Poets et al., 2014. Supplemental data 2: Genotype matrix with 5,800 SNPs for environmental association. Supplemental data 3: The physical positions of 9K SNPs. Supplemental data 4: The annotations for SNPs called from 62 landraces exome capture resequencing data Supplemental data 5: Phenotype matrix with 25 geographic and climatic variables for environmental association. Supplemental data 6: Inferred ancestral status for each 9K SNP. Supplemental data 7: Inferred ancestral status for each exam resequencing SNP. Supplemental data 8: All p-values and FST from elevation, low and high latitude, longitude, and growth habit. Supplemental data 9: All p-values and Benjamini-Hochberg FDR-values from the environmental associations for 25 variables. Supplemental data 10: Supplemental data 10: VCF file for SNPs called from 62 barley exome-capture resequencing data ----------------------------------------- PLEASE NOTE: On .vcf files ----------------------------------------- The headers present within .vcf files provide additional metadata. Headers begin with #. Specific keywords in the headers are denoted with ##. Data lines contain genotype data with one variant per line. For more information on the VCF format, please see the available documentation for VCF at https://github.com/samtools/hts-specs. Please note this dataset contains multiple versions of .vcf files. Versions are noted in individual file sections. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset 1.vcf ----------------------------------------- Description: VCF file for the 6152 SNPs from Poets et al., 2014. 1a. Meta-information line(s): 12 ##fileformat: details the VCF format version number, which is v4.1 ##filedate: date file was created in ISO 8601. ##source: refers to PLINK, an open-source whole genome analysis toolset. ##FORMAT: genotype fields specified in this line are described as the follows: = ##contig: include tags describing the contigs referred to in the VCF file. 1b. Header(s): 1 The header line names 8 fixed columns and 1 additional field, "FORMAT", denoting genotype information is present. #CHROM: the chromosome. POS: position ID: identifier REF: reference base(s) ALT: alternate base(s) QUAL: quality FILTER: filter status INFO: additional information FORMAT: genotype format (please refer to VCFv4.1 documentation, section 1.4.2 - https://github.com/samtools/hts-specs) 2. Total number of lines: 6082 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset 2.csv ----------------------------------------- Description: Genotype matrix with 5,800 SNPs for environmental association. 1. Number of rows: 785 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset 3.vcf ----------------------------------------- Description: The physical positions of 9K SNPs. 1a. Meta-information line(s): 7 ##filedate: date file was created in ISO 8601. ##fileformat: details the VCF format version number, which is v4.2. ##INFO: information field format. (please refer to VCFv4.2 documentation, section 1.1.2 - https://github.com/samtools/hts-specs) ##reference: .fasta file referenced ##source: refers to the source program which is a Python program, SNP_Utils, documented via GitHub - https://github.com/mojaveazure/SNP_Utils 1b. Header(s): 1 The header line names 8 fixed columns #CHROM: the chromosome. POS: position ID: identifier REF: reference base(s) ALT: alternate base(s) QUAL: quality FILTER: filter status INFO: additional information 2. Total number of lines: 7,764 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset 4.csv ----------------------------------------- Description: The annotations for SNPs called from 62 landraces exome capture resequencing data 1. Number of rows: 785 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset 4.csv ----------------------------------------- Description: The annotations for SNPs called from 62 landraces exome capture resequencing data 1. Number of rows: 785 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset 5.txt ----------------------------------------- Description: Phenotype matrix with 25 geographic and climatic variables for environmental association. 1. Number of rows: 5,635 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset 6.txt ----------------------------------------- Description: Inferred ancestral status for each 9K SNP. 1. Number of rows: 357,378 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset 7.txt ----------------------------------------- Description: Inferred ancestral status for each exam resequencing SNP. 1. Number of rows: 357,378 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: supplemental dataset 8.csv ----------------------------------------- Description: All p-values and FST from elevation, low and high latitude, longitude, and growth habit. 1. Number of rows: 145,001 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset 9.txt ----------------------------------------- Description: All p-values and Benjamini-Hochberg FDR-values from the environmental associations for 25 variables. 1. Number of rows: 482,714 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Supplemental dataset10.vcf.tar.gz ----------------------------------------- Description: VCF file for SNPs called from 62 barley exome-capture resequencing data 1a. Meta-information line(s): 114; only unique lines detailed below ##fileformat: details the VCF format version number, which is v4.2. ##ALT: alternate bases and description. ##FILTER: filter status. (please refer to VCFv4.2 documentation, section 1.4.1, - https://github.com/samtools/hts-specs) ##FORMAT: genotype fields specified in this line are described as the follows: = ##GATKCommandLine: Lines added by GATK (genome analysis toolkit) that contains parameters used to run the program that produced the VCF file. ##GVCFBlock: Lines that define genotype quality. ##INFO: information field format. (please refer to VCFv4.2 documentation, section 1.1.2 - https://github.com/samtools/hts-specs) ##contig: include tags describing the contigs referred to in the VCF file. ##reference: .fasta file referenced 1b. Header(s): 1 The header line names 8 fixed columns and 1 additional field, "FORMAT", denoting genotype information is present. #CHROM: the chromosome. POS: position ID: identifier REF: reference base(s) ALT: alternate base(s) QUAL: quality FILTER: filter status INFO: additional information FORMAT: genotype format (please refer to VCFv4.1 documentation, section 1.4.2 - https://github.com/samtools/hts-specs) 2. Total number of lines: 482,829