------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset Extension distribution for DNA confined in a nanochannel near the Odijk regime 2. Author Information Principal Investigator Contact Information Name: Hui-Min Chuang Institution: University of Minnesota Address: Department of Chemical Engineering and Materials Science, 421 Washington Ave SE, Minneapolis, Minnesota 55455 Email: chuan077@umn.edu Associate or Co-investigator Contact Information Name: Jeffrey G. Reifenberger Institution: BioNano Genomics Address: 9640 Towne Centre Drive, Suite 100, San Diego, California 92121 Email: jreifenberger@bionanogenomics.com Associate or Co-investigator Contact Information Name: Aditya Bikram Bhandari Institution: University of Minnesota Address: Department of Chemical Engineering and Materials Science, 421 Washington Ave SE, Minneapolis, Minnesota 55455 Email: bhand050@umn.edu Associate or Co-investigator Contact Information Name: Kevin D. Dorfman Institution: University of Minnesota Address: Department of Chemical Engineering and Materials Science, 421 Washington Ave SE, Minneapolis, Minnesota 55455 Email: dorfman@umn.edu 3. Date of data collection: 2018-12 to 2018-12 4. Geographic location of data collection: N/A 5. Information about funding sources that supported the collection of the data: Sponsorship: National Institutes of Health under grants R01-HG006851 -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: N/A 2. Links to publications that cite or use the data: Accepted and waiting for the link 3. Links to other publicly accessible locations of the data: 4. Links/relationships to ancillary data sets: 5. Was data derived from another source? If yes, list source(s): 6. Recommended citation for the data: Chuang, Hui-Min; Reifenberger, Jeff G.; Bhandari, Aditya Bikram; Dorfman, Kevin D.. (2019). Extension distribution for DNA confined in a nanochannel near the Odijk regime. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/99cv-2243. --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: Fig3.m Short description: This is the Matlab code to generate figure 3 in the main text of the paper. The corresponding data can be found in the folder "Reinhart_data", file "si_fig1.csv" and "si_fig4.csv". The work path to address the data needs to be modified by the user. B. Filename: Fig4.m Short description: This is the Matlab code to generate figure 4 in the main text of the paper. The corresponding data can be found in the file "RMSE.xlsx". C. Filename: Fig5.m Short description: This is the Matlab code to generate figure 5 in the main text of the paper. The corresponding data can be found in the file "RMSE.xlsx". D. Filename: Fig7.m Short description: This is the Matlab code to generate figure 7 in the main text of the paper. The corresponding data can be found in the folder "simulation_data", and file "data_array.txt". The work path to address the data needs to be modified by the user. E. Filename: Fig8.m Short description: This is the Matlab code to generate figure 8 in the main text of the paper. F. Filename: FigS1S2S3.m Short description: This is the Matlab code to generate figure S1, S2 and S3 in the Supplemental Material. The corresponding data can be found in the folder "Reinhart_data", file "si_fig2.csv", "fig4.csv" and "si_fig3.csv", respectively. The work path to address the data needs to be modified by the user. G. Filename: FigS4.m Short description: This is the Matlab code to generate figure S4 in the Supplemental Material. The corresponding data can be found in the folder "Reinhart_data". The work path to address the data needs to be modified by the user. H. Filename: FigS6.m Short description: This is the Matlab code to generate figure S6 in the Supplemental Material. The corresponding data can be found in the file "data_array.txt" and "Sensitivity_Analysis.xlsx"-sheet 1. I. Filename: FigS7.m Short description: This is the Matlab code to generate figure S7 in the Supplemental Material. The corresponding data can be found in the file "data_array.txt" and "Sensitivity_Analysis.xlsx"-sheet 2. J. Filename: FigS8.m Short description: This is the Matlab code to generate figure s8 in the Supplemental Material. The corresponding data can be found in the folder "simulation_data", and file "data_array.txt". The work path to address the data needs to be modified by the user. M. Filename: data_array.xlsx Short description: This data file contains the position of nick-labels on each lambda-DNA molecule (each line) in the unit of base pair (bp) in the experiment. N. Filename: RMSE.xlsx Short description: This data file list the results of three statistical test. O. Filename: Sensitivity_analysis.xlsx Short description: This data file list the results of sensitivity analysis. P. Filename: rgb.m Short description: This Matlab file is to set the color of curves in other files. Q. Filename: telegraph_model.m Short description: This code is to calculate the extension distribution based on Mehligs theory for given DNA parameters - ABB if known. R. Foldername: simulation_data Short description: This folder stores the theoretical data of DNA extension with Odijk sigma, best fit sigma, and 2 Odijk sigma with different D_eff size. Further information is provided below. S. Foldername: Reinhart_data Short description: This folder stores the E-coli data set from Reinhart et al. Further information is provided below. T. Foldername: Statistical test Short description: This folder stores all the Matlab codes and the raw data for three statistical tests. For more detailed information about the codes, please see https://conservancy.umn.edu/handle/11299/205494 2. Relationship between files: All relationships between files have been provided in the description of each file. 3. Additional related data collected that was not included in the current data package: No 4. Are there multiple versions of the dataset? No -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Please see the main text of the paper for additional information. 2. Methods for processing the data: Please see the main text and the supplementary information of the paper for additional information. 3. Instrument- or software-specific information needed to interpret the data: Please see the main text of the paper for additional information. 4. Standards and calibration information, if appropriate: Please see the main text of the paper for additional information. 5. Environmental/experimental conditions: Please see the main text of the paper for additional information. 6. Describe any quality-assurance procedures performed on the data: Please see the main text of the paper for additional information. 7. People involved with sample collection, processing, analysis and/or submission: Hui-Min Chuang, Aditya Bikram; Dorfman, Jeffrey G. Reifenberger, and Kevin D. Dorfman. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: RMSE.xlsx ----------------------------------------- 1. Number of variables: 2 2. Number of cases/rows: 3. Missing data codes: 4. Variable List A. Name: length Description: sequence length of interested DNA sequence, in the unit of base piar B. Name: D Description: channel size, in the unit of nm ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Sensitivity_Analysis.xlsx ----------------------------------------- 1. Number of variables: 3 2. Number of cases/rows: 3. Missing data codes: Symbol: “#NAME?” = -Inf 4. Variable List A. Name: D_eff Description: effective channel size, in the unit of nm B. Name: extension Description: extension-avg(extension) in the unit of nm C. Name: cc Description: correlation coefficient ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: all data files in the folder "simulation_data" ----------------------------------------- 1. Number of variables/columns: 2 2. Column List A. column 1 Description: extension-avg(extension) in the unit of nm B. column 2 Description: probability of DNA extension in log 10 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: all data files in the folder "Reinhart_data" ----------------------------------------- 1. Number of variables: 3 2. Variable List A. Each row: length Description: DNA sequence length, in the unit of base piar B. Each column Description: extension-avg(extension) in the unit of nm C. Each value: log10(pr) Description: probability of DNA extension in log 10