This readme.txt file was generated on 2022-09-23 by <Name>
Recommended citation for the data: Qiao, Yiming; Ma, Zixue; Onyango, Clive; Cheng, Xiang; Dorfman, Kevin D. (2022). Data for DNA fragmentation in a steady shear flow. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/tsva-hn81

-------------------
GENERAL INFORMATION
-------------------

1. Title of Dataset: Data for: DNA fragmentation in a steady shear flow

2. Author Information

	Author Contact:  Yiming Qiao (qiao0017@umn.edu)

	Name:  Yiming Qiao
	Institution:University of Minnesota
	Email: qiao0017@umn.edu
	ORCID: https://orcid.org/0000-0002-1917-2588

	Name:  Zixue Ma
	Institution:University of Minnesota
	Email:ma000052@umn.edu
	ORCID:

	Name:  Clive Onyango
	Institution:University of Minnesota
	Email:onyan021@umn.edu
	ORCID:

	Name:  Xiang Cheng
	Institution:University of Minnesota
	Email:xcheng@umn.edu
	ORCID:

	Name:  Kevin D. Dorfman
	Institution:University of Minnesota
	Email:dorfman@umn.edu
	ORCID: https://orcid.org/0000-0003-0065-5157

3. Date published or finalized for release: 2022-09-23

4. Date of data collection (single date, range, approximate date): 2021-05-31 to 2022-08-31

5. Geographic location of data collection (where was data collected?): N/A

6. Information about funding sources that supported the collection of the data:
	NIH R21-HG011251

7. Overview of the data (abstract):
We have determined the susceptibility of T4 DNA (166 kilobase pairs, kbp) to fragmentation under steady shear in a cone-and-plate rheometer.

--------------------------
SHARING/ACCESS INFORMATION
--------------------------

1. Licenses/restrictions placed on the data: CC0 1.0 Universal (http://creativecommons.org/publicdomain/zero/1.0/)

2. Links to publications that cite or use the data:
Qiao, Yiming; Ma, Zixue; Onyango, Clive; Cheng, Xiang; Dorfman, Kevin D. (2022). Data for: DNA fragmentation in a steady shear flow. Biomicrofluidics.

3. Was data derived from another source?
	If yes, list source(s):

4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use
---------------------
DATA & FILE OVERVIEW
---------------------

File List

	Filename: Image_Analysis.zip 
	Short description: This file contains all the MATLAB codes to process the PFGE data. BoxFinderv2.m is used to locate each lane (this code is built in the Intensity_analysis.m). Intensity_analysis.m is used to get the distribution of all the lanes. gaussianfit.m is used to do the fitting of the distributions and calculate the broken percentage.

	Filename: Figure_Data.zip 
	Short description: This file contains all the raw PFGE images and the corresponding data sets (distribution and fitting). The folder names indicate the details of the experiments.

	Filename: Automate_Python.zip 
	Short description: This file contains one example of how to use Python to extract information from the excel files in the folders in Figure_Data.zip.


2. Relationship between files:
MATLAB codes in Image_Analysis.zip are used to process PFGE images in Figure_Data.zip and get the excel files in Figure_Data.zip. Python code in Automate_Python.zip uses the excel files and generate the final figures in Figure_Data.zip.

--------------------------
METHODOLOGICAL INFORMATION
--------------------------

1. Description of methods used for collection/generation of data:
A commercial rotational rheometer (DHR, TA Instruments) was used for the DNA fragmentation experiments. We use the standard method, pulsed field gel electrophoresis (PFGE), for sizing the DNA samples. Additional information can be found in the associated publication. 

2. Methods for processing the data: <describe how the submitted data were generated from the raw or collected data>
The PFGE images were processed using a custom-written MATLAB script that outputs the size distribution of the DNA samples. Please see the main text of the paper for additional details. 

3. Instrument- or software-specific information needed to interpret the data:
MATLAB_R2018a was used to analyze the data

4. Standards and calibration information, if appropriate:
Please see the main text of the paper for additional information.

5. Environmental/experimental conditions:
Please see the main text of the paper for additional information.

6. Describe any quality-assurance procedures performed on the data:


7. People involved with sample collection, processing, analysis and/or submission:
Kevin D. Dorfman


-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: Image_Analysis.zip
-----------------------------------------

Instructions on how to run image analysis codes Software: 
MATLAB Part A: Process DNA images to generate data 
Step 1: Change the directory that MATLAB uses to find MATLAB codes in Intensity_analysis.m, BoxFinderv2.m, and gaussianfit.m under the Image_Analysis folder. 
Step 2: Open Intensity_analysis.m Change the directory folder containing the PFGE images in the Figure_Data subfolders (line 15 and 16). Run the code. Follow the instruction of the pop-up window, select each lane of the DNA samples (not for markers), and return. If not working, you can either increase the contrast of the FPGE image file in imageJ, or you can play around with the cut-off value for the corelation coefficient in the line 148.
Step 3: Open gaussianfit.m Change the lane number that you would like to fit (line 9). Run the code. If not working, try to change the initial guess values to start with (line 45 and 60). 
Step 4: Put all the information in excel files The 'result.xlsx' file is the distribution gathered from step 2. The 'Gaussian_fit_results_165.xlsx' file is the fitted distribution gathered from step 3. If the saving process does not work for your PC, you could manually save the data to the excel files. In our case: 'result.xlsx': result_bp, sheet 1 result_fraction_y, sheet 2 fitting_y,sheet 3 coefficients,sheet 4 result_fraction_y_bp,sheet 5 Mw,sheet 4 Mn,sheet 5 PDI,sheet 6 mean_distribution,sheet 7 std_distribution,sheet 8 'Gaussian_fit_results_165.xlsx': y_fitted_unsheared: fitted intensity values for unshared part(right small peak), sheet 1 y_fitted_sheared: fitted intensity values for shared part(left large peak), sheet 2 coefficients_unsheared: fitted eqn coefficients for unsheared part, sheet 3 coefficients_sheared: fitted eqn coefficients for sheared part, sheet 4 Gaussian_values: mean_trans (transformed mean value) mean(original mean value) sigma (transformed sigma value) amplitude (peak amplitude) unsheared area (area of transformed Gaussian curve), sheet 5 and 6 sheared_percentage, sheet 7 
Part B: Extract information from excel files and make figures 
Step 1: Open data_for_paper_ligation.ipynb in Automate_Python.zip 
Step 2: Change the directory that python uses to find the excel files (cell 4). Depending how many excel files you have in the folder, you can choose the ones we use to generate the plot ('result.xlsx' and 'Gaussian_fit_results_165.xlsx'). 
Step 3: Run the code.