This readme.txt file was generated on 2024-05-02 by Ryan Briscoe Runquist
Recommended citation for the data: Briscoe Runquist, Ryan. (2024). Genotype and population data for "Isolation-by-environment and its consequences for range shifts with global change: landscape genomics of the invasive plant common tansy". Retrieved from the Data Repository for the University of Minnesota. https://doi.org/10.13020/nx0n-f098.

-------------------
GENERAL INFORMATION
-------------------

1. Title of Dataset: Genotype and population data for "Isolation-by-environment and its consequences for range shifts with global change: landscape genomics of the invasive plant common tansy"

2. Author Information

	Author Contact:  Ryan Briscoe Runquist (rbriscoe@umn.edu)

  Principal Investigator Contact Information
        Name: Ryan Briscoe Runquist
           Institution: University of Minnesota
           Address: Department of Plant and Microbial Biology, 1479 Gortner Ave, 140 Gortner Laboratory, St. Paul, MN 55108
           Email: rbriscoe@umn.edu
	   ORCID: https://orcid.org/0000-0001-7160-9110

  Associate or Co-investigator Contact Information
        Name: David A. Moeller
           Institution: University of Minnesota
           Address: Department of Plant and Microbial Biology, 1479 Gortner Ave, 140 Gortner Laboratory, St. Paul, MN 55108
           Email: moeller@umn.edu
	   ORCID: https://orcid.org/0000-0002-6202-9912


3. Date published or finalized for release: 2024-04-29


4. Date of data collection (single date, range, approximate date): 2019-05-01 to 2019-12-31


5. Geographic location of data collection (where was data collected?): 

	Minnesota, USA (specific information in associate excel spreadsheet file)

6. Information about funding sources that supported the collection of the data:
	
	Minnesota Invasive Terrestrial Plants and Pests Center through the Environment and Natural Resources Trust Fund as recommended by the Legislative-Citizen Commission on Minnesota Resources (LCCMR)
	
7. Overview of the data (abstract):

	Invasive species are a growing global economic and ecological problem. However, it is not well understood how environmental factors mediate invasive range expansion. In this study, we investigated the recent and rapid range expansion of common tansy across environmental gradients in Minnesota, U.S.A. We densely sampled individuals across the expanding range and performed reduced representation sequencing to generate a dataset of 3071 polymorphic loci for 176 individuals. The dataset includes additional samples from the native range in Finland that were not used in the downstream analysis but are contributed for completeness. The dataset includes the genotype calls for all individuals sampled and sequenced. The genotype file was generated by stacks2.59 running the denovo pipeline and then using the populations function where we kept loci that were in 70% of populations and had a minor allele frequency of at least 1%. We used non-spatial and spatially-explicit analyses to determine the relative influences of geographic distance and environmental variation on patterns of genomic variation. We found no evidence for isolation-by-distance (IBD) but strong evidence for 
isolation-by-environment (IBE), indicating that environmental factors may have modulated patterns of range expansion. 


--------------------------
SHARING/ACCESS INFORMATION
--------------------------

1. Licenses/restrictions placed on the data: Attribution-NonCommercial-ShareAlike 3.0 United States (http://creativecommons.org/licenses/by-nc-sa/3.0/us/)

2. Links to publications that cite or use the data:
10.22541/au.171118898.88465185/v1 (pre-print)


3. Was data derived from another source? No.
	If yes, list source(s):

4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use


---------------------
DATA & FILE OVERVIEW
---------------------

File List

	a. Filename: populations snps.vcf
	   Description: This file contains the outputted genotyping SNP calls for 3071 loci from 178 populations of common tansy. 176 of the samples are from the invaded range in or near Minnesota, USA (174 in Minnesota and 2 in Wisconsin). The remaining 2 samples are from the native range in Finland. The 176 samples were used in the mentioned paper. 

Genotypes were generated from GBS Illumina sequencing. Raw reads are available at SRA (BioProject PRJNA1099706 SUB14375599). Genotypes were generated using the stacks 2.59 denovo pipeline with the pipeline parameters of M=2, n=2, m=3. To filter and write out loci for population genetics analysis, we ran the populations function in stacks. We kept one random SNP per locus in order to maintain locus independence during downstream analyses. 
Loci were included if they were present in at least 70% of individuals (which was also equivalent to populations since we have 1 individual/population), had a minor allele frequency (MAF) of at least 1%, and had a maximum heterozygosity of <=95%. 

	b. tansy_pop_info_allsamples_3.csv
	   Description: CSV file containing metadata about populations from the vcf file used in landscape genomics analyses.


2. Relationship between files: The csv has descriptive geographic information about the samples in the genotype file.


--------------------------
METHODOLOGICAL INFORMATION
--------------------------

1. Description of methods used for collection/generation of data:

	Samples were processed from leaf tissue that was collected fresh at the sampling locality and then immediately placed in a bag of silica gel and labeled.
Locality information for each sample was taken using a handheld GPS unit.
Genotypes were generated from GBS Illumina sequencing at UMN Genomic Center (UMGC). Raw reads are available at SRA (BioProject PRJNA1099706 SUB14375599).

2. Methods for processing the data: <describe how the submitted data were generated from the raw or collected data>

	Genotypes were generated using the stacks 2.59 denovo pipeline with the pipeline parameters of M=2, n=2, m=3.
To filter and write out loci for population genetics analysis, we ran the populations function in stacks. 
We kept one random SNP per locus in order to maintain locus independence during downstream analyses. 
Loci were included if they were present in at least 70% of individuals (which was also equivalent to populations since we have 1 individual/population), had a minor allele frequency (MAF) of at least 1%, and had a maximum heterozygosity of <=95%. 

3. Instrument- or software-specific information needed to interpret the data:

	VCF file should be interpretable by any genetics software (e.g. Genodive) or statistical software (e.g. R) that is able to handle genotype files.
CSV file is openable using Excel or text editor

4. Standards and calibration information, if appropriate:

	NA

5. Environmental/experimental conditions:

	Collected from natural field conditions.

6. Describe any quality-assurance procedures performed on the data:

	Samples were processed at the University of Minnesota Genomics Center (https://genomics.umn.edu/services/gbs) using Illumina NextSeq sequencing using 
	the following protocol. UMGC created dual-indexed GBS libraries using the enzyme combination BamHI + NsiI. Enzyme selection followed from a small pilot 
	study used to assess the proper enzyme combination to produce approximately 5000-10000 loci for the average read depth of approximately 1 million per 
	individual. Briefly, extracted DNA was quantified using Picogreen ® (Thermofisher Scientific, MA, USA) and normalized to 10 ng/µl. A total of 100 ng of 
	DNA per sample was digested with 10 units of BamHI & NsiI (New England Biolabs ® , Inc. MA, USA) restriction enzyme and incubated at 37C for 2 hours, 
	and then heat inactivated at 80C for 20 minutes. The DNA samples were then ligated with 200 units of T4 ligase (New England Biolabs ® , Inc. MA, USA) 
	and phased adaptors with -GATC and -TGCA overhangs at 22C for 1 hour and heat killed. The ligated samples were then purified with solid phase reversible 
	immobilization (SPRI) beads and then amplified for 18 cycles with 2X NEB Taq Master Mix to add the barcodes. Libraries were SPRI purified, quantified, 
	and pooled. Fragments with the 300-744 bp size region were selected and diluted to 2 nM for sequencing on the Illumina NextSeq 2000 (Illumina, CA, USA) 
	using single end 1X150 reads. They generated ≈ 320M pass filter reads during sequencing. Once the run was completed, they performed quality control analysis 
	and determined that all expected barcodes and samples were detected, reads were well balanced, and the mean quality scores ≥Q30 for all libraries.

7. People involved with sample collection, processing, analysis and/or submission:

	Collection: RB Runquist, Thomas A. Lake
	DNA Extractions: RB Runquist
	Sequencing: UMGC
	Data processing: RB Runquist


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: tansy_pop_info_allsamples_3.csv
-----------------------------------------

1. Number of variables: 15

2. Number of cases/rows: 178

3. Missing data codes: <NULL>

	Code/symbol	Definition

4. Variable List

genID: Name of the sample in the vcf file
pop: Unique identifying number used during datahandling and analyses
name: Population name
geo_grp: Geographic grouping structure - level not used in final analyses
lat: Latitude of population
lon: Longitude of population
PROVNAME: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Province
ECS_PROV: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Province Number
SECNAME: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Section
ECS_SEC: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Section Number
SUBSECNAME: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Subsection
ECS_SUBSECTION: MN DNR Ecological Classification System (https://www.dnr.state.mn.us/ecs/index.html) Subsection Number
ECS_fac: Numeric designation of ECS subsection used for analyses
ECS_fac2: Numeric designation of ECS subsection used for analyses re-leveled to be continuous
MN_quad: Designation of what geographic quadrant of MN (NE, NW, SE, SW) the population existed