This DCN_EndUserSurvey2021_readme.txt file was generated on 2021-10-14 by Sophia Lafferty-Hess GENERAL INFORMATION 1. Title of Dataset: Data Curation Network End User Survey 2021 2. Author Information A. Principal Investigator Contact Information Name: Sarah Wright Institution: Cornell University Email: Email: B. Associate or Co-investigator Contact Information Name: Lisa Johnston Institution: University of Minnesota Email: ORCID: C. Associate or Co-investigator Contact Information Name: Wanda Marsolek Institution: University of Minnesota Email: ORCID: D. Associate or Co-investigator Contact Information Name: Hoa Luong Institution: University of Illinois at Urbana-Champaign Email: ORCID: E. Associate or Co-investigator Contact Information Name: Susan Braxton Institution: University of Illinois at Urbana-Champaign Email: ORCID: F. Associate or Co-investigator Contact Information Name: Sophia Lafferty-Hess Institution: Duke University Email: ORCID: G. Associate or Co-investigator Contact Information Name: Joel Herndon Institution: Duke University Email: ORCID: H. Associate or Co-investigator Contact Information Name: Jake Carlson Institution: University of Michigan Email: 3. Date of data collection (single date, range, approximate date): April-June 2021 4. Geographic location of data collection: United States 5. Information about funding sources that supported the collection of the data: Alfred P Sloan Foundation “Launching the Data Curation Network” 6. Overview of the data (abstract): This dataset includes the processed dataset from the 2021 End User Survey performed by the Data Curation Network. SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: CC-BY-NC Creative Commons Attributiuon non-commercial 2. Links to publications that cite or use the data: manuscript forthcoming 4. DRUM Terms of Use: By using these files, users agree to the Terms of Use. 5. Was data derived from another source? yes/no A. If yes, list source(s): No 6. Recommended citation for this dataset: Wright, Sarah; Johnston, Lisa; Marsolek, Wanda; Luong, Hoa; Braxton, Susan; Lafferty-Hess, Sophia; Herndon, Joel; Carlson, Jake. (2021). Data Curation Network End User Survey 2021. Retrieved from the Data Repository for the University of Minnesota, DATA & FILE OVERVIEW 1. File List: DCN_EndUserSurvey2021_Readme.txt - overview of the project, data, and methods DCN_EndUserSurvey2021_Instrument.pdf - full survey instrument with question text, skip patterns, and coded values DCN_EndUserSurvey2021_ProcessedData.csv - merged dataset containing all responses from the 6 participating institutions DCN_EndUserSurvey2021_Codebook.txt - codebook containing value codes and frequencies for variables. Raw_Data folder - contains the raw output from Qualtrics excluding identifiable information. METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: The survey was implemented in Qualtrics and distributed to depositors within 6 Data Curation Network institutions repositories - Cornell University eCommons, Duke University Research Data Repository, Johns Hopkins University Data Archive, Illinois Data Bank, University of Illinois, University of Michigan Deep Blue Data, and the Data Repository for the University of Minnesota (DRUM). The survey population included researchers who have deposited a dataset with repositories at the above institutions between January 1, 2019 to March 15, 2021. The survey ran for about 2 weeks at each institution between April-June 2021. Each survey participant response was linked to a submitted dataset and DOI for the purposes of future analysis and identification. A subset of additional questions were also asked by two insitutions, Duke and Michigan, for internal planning assessment purposes and have been included for others to use/reuse. All survey data was de-identified prior to sharing. Each institution completed the IRB process at their respective institutions. 2. Methods for processing the data: Raw Qualtrics data files from each institution were cleaned and processed following these steps: ** Automatically generated Qualtrics variables and rows unnecessary for analysis were dropped (i.e., StartDate, EndDate, Status, IPAddress, Progress, Duration (in seconds), Finished, DistributionChannel, UserLanguage) ** Any identifying information related to the survey participant or staff members within institutions were removed (email address, IP address, name of associated dataset, curator names, etc.) ** Individual institution raw data files were merged into one processed analysis dataset ** Variable names were standardized (this results in there being a potential mismatch between the raw data variable names and the merged names) - JHU Q69-Q70 changed to Q11-Q13 - Duke raw data column L(Q2)- R(Q6_6_TEXT) were moved and renamed to column Q(Q14) - AE(Q18_6_TEXT) in the processed dataset - Michigan raw data columns U(Q71)-AA(Q75_6_Text) were moved and renamed to column Q(Q14) - AE(Q18_6_TEXT) in the processed dataset ** The subset of questions only asked by Duke/Michigan were moved to the end of the primary DCN question block and the variable names were revised ** An institution column was added to the dataset ** Multi-response questions were separated into individual columns (i.e., Q18_1, Q18_2, etc.) ** Recoded string values in raw data to numeric codes for analysis ** Coded missing data 6. Describe any quality-assurance procedures performed on the data: Frequencies were run on the summary analyses and then rerun on the processed dataset to ensure consistency across variables and to check for any errors. Frequency tables are available within the codebook. 7. People involved with sample collection, processing, analysis and/or submission: Data Curation Network members (Sarah Wright and Wendy Kozlowski, Cornell University eCommons; Lisa Johnston and Wanda Marsolek, University of Minnesota - Data Repository for the U of M (DRUM); Hoa Luong and Susan Braxton, University of Illinois Data Bank; Joel Herndon and Sophia Lafferty-Hess, Duke University Research Data Repository; Jake Carlson, University of Michigan Deep Blue Data; Mara Blake, Johns Hopkins University Data Archive) DATA-SPECIFIC INFORMATION FOR: DCN_EndUserSurvey2021_ProcessedData.csv 1. Number of variables: 18 2. Number of cases/rows: 239 3. Variable List: See the codebook and survey instrument for full variable information. 4. Missing data codes: 97 - Question not asked at that institution 98 - Response missing due to survey design skip pattern or multi-response item in the survey 99 - No response provided by the survey participant