This readme.txt file was generated on 20150928 by Zhuozhi Georgia Huang, and was modified by John Garbe and Yang Da on 8/5/2016. GENERAL INFORMATION 1. Title of Dataset: Single Nucleotide Polymorphism Calls for 49 Giant Pandas 2. File Information: A. Filename:GVCBLUP B. Short description:folder of input files in GVCBLUP format C. Filename:Plink D. Short description: folder of input files in Plink format 3. Principal Investigator Contact Information A. Name:Garbe, John B. Institution: University of Minnesota C. Email: jgarbe@umn.edu 4. Associate or Co-investigator Contact Information A. Name:Da, Yang 5. Date of data collection: 20150924 6. Geographic location of data collection: raw data downloaded from NCBI Sequence Read Archive (SRA053353), DNA Data Bank of Japan ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA053/SRA053353 7. Date files were created: 20150924 8. Are there multiple versions of the dataset? no 9. Information about funding sources that supported the collection of the data: This project was supported by project MN-16-043 of the Agricultural Experiment Station METHODOLOGICAL INFORMATION 1. Methods for processing the data: A. Trimmomatic was used to trim low-quality bases and Illumina Sequencing Adapter sequences from the ends of reads. B. BWA(Version: 0.7.12-r1039) was used to map the paired-end resequencing reads to the panda reference genome (ailMel1). C. Sequence Alignment/Map formate files were imported to Samtools13(Version: 0.1.18). D. Samtools13(Version: 0.1.18) was used to sort, merge, and convert mapping results into the BAM format. E. Picard package (Version: 1.126) was used to filter duplicated reads. F. Samtools(Version: 0.1.18) was used to remove duplicate, unmapped and discordantly mapping reads. G. Variants were called jointly across all samples using Samtools mpileup and bcftools call. H. bcftools view was used to remove insertion/deletion variants. I. bcftools filter was used to remove SNPs meeting any of the following criteria: a.located within 10bp of an indel; b.coverage across all samples of less than 20 reads or more than 800 reads; c.minor allele frequency less than 10% d.QUAL value less than 20 This process called 6,993,226 SNPs. J. SNPs with the highest quality were selected according to the following two requirements. a.The SNP must have no missing genotype for all 49 pandas; b.The SNP must pass Hardy-Weinberg equilibrium (HWE) test with p≥0.01. A total of 150,025 SNPs satisfied these two requirements and were used in this study. 2. Instrument-specific information needed to interpret the data: Input files can be viewed in R. Must use Plink and GVCBLUP to run the codes. Originally the files were run on GVCBLUP (version 3.9) and PLINK (version 1.90 beta) in Linux interface. 3. Describe any quality-assurance procedures performed on the data: Since the Hardy-Weinberg equilibrium is assumed by all three methods used for estimating genomic inbreeding and kinship coefficients, the Hardy-Weinberg equilibrium (HWE) test with p≥0.01 were performed and SNPs with p<0.01 were removed. See Step J in Methods for processing the data. DATA-SPECIFIC INFORMATION 1. Parameters and/or variables used in the data set A. Name: QIO B. Description: Qionglai D. Name: QIN E. Description: Qinling G. Name: MIN H. Description: Minshan I. Name: DXL J. Description: Daxiangling K. Name: XXL L. Description: Xiaoxiangling M. Name: LS N. Description: Liangshan O. Name: Wolong GPCRC P. Description: China Conservation and Research Center for the Giant Panda in Wolong Q. Name: Chengdu BC R. Description: Chengdu Research Base of Giant Panda Breeding S. Name: QIO x LS T: Description: the individual is a captive bred animal with one parent from QIO and the other parent from LS 2. Column headings for tabular data A. Full name: Sample B. Definition: Sample ID C. Full name: Locality ID D. Definition: City,Province or regions E. Full name: Mountain Range F. Definition: location of the genetic cluster the individual belongs to G. Full name: Origin H. Definition: wild bred or captive bred I. Full name: Type of samples J. Definition: type of samples used to obtain SNPs SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: Placed under CC0 1.0 Universal Public Domain Dedication 2. Links to publications that cite or use the data: Garbe JR, Prakapenka D, Tan C, Da Y (2016) Genomic Inbreeding and Relatedness in Wild Panda Populations. PLoS ONE 11(8): e0160496. doi:10.1371/journal.pone.0160496. http://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.01604963. Links to other publicly accessible locations of the data: http://hdl.handle.net/11299/174479 4. Recommended citation for the data: Garbe, John R; Da, Yang. (2015). Single Nucleotide Polymorphism Calls for 49 Giant Pandas [dataset]. Retrieved from the Data Repository for the University of Minnesota, http://dx.doi.org/10.1038/ng.2494 5. Links to panda sequence data: Zhao, S., Zheng, P., Dong, S., Zhan, X., Wu, Q., Guo, X., ... & Wei, F. (2013). Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation. Nature Genetics, 45(1), 67-71. http://dx.doi.org/10.1038/ng.2494 Credits: Template provided by the University of Minnesota Libraries, http://lib.umn.edu/datamanagement