This readme.txt file was generated on August 16, 2021 Recommended citation for the data: Johnston, Lisa; Curty, Renata; Lafferty-Hess, Sophia; Hadley, Hannah; Petters, Jonathan; Luong, Hoa; Braxton, Susan; Carlson, Jake; Kozlowski, Wendy A. (2021). Value of Curation Survey, January 2021. Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/04ee-q089. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Value of Curation Survey, January 2021 2. Author Information Principal Investigator Contact Information Name: Lisa Johnston Institution: University of Minnesota Email: ljohnsto@umn.edu ORCID: http://orcid.org/0000-0001-6908-9240 Associate or Co-investigator Contact Information Name: Renata Curty Institution: University of California Santa Barbara Email: rcurty@ucsb.edu ORCID: https://orcid.org/0000-0002-4615-6030 Associate or Co-investigator Contact Information Name: Sophia Lafferty-Hess Institution: Duke University Email: Associate or Co-investigator Contact Information Name: Hannah Hadley Institution: Penn State University Email: pennkitsune@gmail.com Associate or Co-investigator Contact Information Name: Jonathan Petters Institution: Virginia Tech Email: jpetters@vt.edu ORCID: https://orcid.org/0000-0002-0853-5814 Associate or Co-investigator Contact Information Name: Hoa Luong Institution: University of Illinois at Urbana-Champaign Email: hluong2@illinois.edu ORCID: https://orcid.org/0000-0001-6758-5419 Associate or Co-investigator Contact Information Name: Susan Braxton Institution: University of Illinois at Urbana-Champaign Email: braxton@illinois.edu ORCID: https://orcid.org/0000-0001-6605-216X Associate or Co-investigator Contact Information Name: Jake Carlson Institution: University of Michigan Email:jakecar@umich.edu Associate or Co-investigator Contact Information Name: Wendy A Kozlowski Institution: Cornell University Email: wak57@cornell.edu 3. Date published: August 30, 2021 4. Date of data collection: January 4 - 22, 2021 5. Geographic location of data collection: United States 6. Information about funding sources that supported the collection of the data: Alfred P Sloan Foundation 7. Overview of the data (abstract): This dataset includes the raw and augmented survey results from the January 2021 Value of Curation survey run by the Data Curation Network. Distributed to US data repository staff and directors via email listservs the survey received a total of 120 responses. 22 responses were for non-US repositories and three did not provide a repository of reference. A majority of the participants self-identified as staff members with 52 staff and 34 repository directors. The remaining were 5 unaffiliated users, and 4 unaffiliated depositors. A third of the responses (68) were associated with certified CoreTrustSeal repositories, and 27 responses were related to members of the Data Curation Network (DCN). -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC-BY-NC Creative Commons Attribution non-commercial 2. Links to publications that cite or use the data: manuscript under review 3. Was data derived from another source? na 4. DRUM Terms of Use: By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- The survey data consist of 9 files 1. File name: VofC-RawData.csv Description: Unprocessed dataset downloaded from Qualtrics with 120 cases and 87 variables as described in the codebook. Size:85.46Kb 2. File name: VofC-AddedColumns.csv Description: Includes 12 new variables added to the raw dataset (see 'Data Augmentation' and codebook for details) in order to enhance data analysis. Size: 89.17Kb 3. File name: VofC-Transformed-Data.csv Description: Transformed dataset after the removal of some of the columns not used for the analysis and after performing transformations on scales which were reversed, while keeping original variables (see 'Data Transformations' and codebook for details). Size: 37.19Kb 4. File name: VofC-AnalysisData-Quanti.csv Description: Contains the clean and analysis ready version of the transformed dataset with 81 quantitative variables for the 120 cases. Size: 24.60Kb 5. File name: VofC-AnalysisData-Quant-USAonly.csv Description: Same as 4, with filter to USA cases only (95 cases). Size: 19.31Kb 6. File name: VofC-AnalysisData-Qual.csv Description: Dataset used for the qualitative analysis redacted to avoid linking the participant response to the name of the repository. Size: 45.99Kb 7. File name: VofC-Codebook.csv Description: Explains all the variables, values, and codes used in files 1-6. Size: 66.46Kb 8. File name: VofC-Survey.pdf Description: Export of survey instrument from Qualtrics saved as a pdf. Size: 233.2Kb 9. File name: Readme.txt This file, includes survey recruitment email and data analysis processes. Size: 12Kb -------------------------- METHODOLOGICAL INFORMATION -------------------------- To understand the current practice of US-based data repositories we developed a survey instrument to address the following research questions: What level of curation do repositories provide? What is the perceived value-add that curation has on the data sharing process from the perspective of members of the data curation community? Our survey adapts levels of curation from the CoreTrustSeal levels and the FAIR principles (Edmunds, et al, 2016; Wilkinson et al, 2016) and the curation actions were informed by a qualitative review of 100 CoreTrustSeal certified repositories (2017-2019) (Johnston, 2021) Our project was classified as exempt by the University of Minnesota (U of M) Institutional Review Board STUDY00011146. We distributed the survey via U of M Qualtrics. It remained open for three weeks (January 4 - 22, 2021). The target population included repository staff members and directors, however, we did allow end-users to participate in the survey if our distribution reached them via email recruitment (below) on multiple listservs including RDAP, DataCure, DataLibs, IASSIST, NIH Data Science, as well as to RDA working groups and DCN repository staff. Citations: Edmunds, Rorie; L'Hours, Hervé; Rickards, Lesley; Trilsbeek, Paul; Vardigan, Mary; and Mokrane, Mustapha. (2016). Core Trustworthy Data Repositories Requirements V01.00. https://www.coretrustseal.org/wp-content/uploads/2017/01/Core_Trustworthy_Data_Repositories_Requirements_01_00.pdf Johnston, Lisa R. (2021). Level of curation self-reported by 100 CoreTrustSeal certified repositories (2017-2019). Data Repository for the University of Minnesota, https://doi.org/10.13020/w0z3-z709. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018. https://doi.org/10.1038/sdata.2016.18 -------------------------- Survey Recruitment Email -------------------------- Subject: Survey invitation "Value of data curation," please participate by January 22, 2021 Dear Colleagues, Our team from the Data Curation Network seeks input from users, staff, curators, and directors at US-based data repositories (disciplinary, government, institutional, and general) to help us better understand (1) the level of data curation provided by your data repository and (2) what you perceive as the most important value-add that data curation has on the data sharing process. By data curation, we mean the various actions that may be taken to ensure that data are fit for purpose and available for discovery and reuse. Your participation in this 10-minute survey will help establish which data curation actions are commonly in practice across various types of data repositories and any perceived value these actions have on the data sharing process. With your help, the results of this survey will better enable data repositories to benchmark curation actions in a meaningful way and to make evidence-based decisions regarding the value proposition of doing data curation at one level versus another. What to expect: You will be asked to define the “level of curation” taken by a specific data repository and share any perceived value that you feel this work adds to the resulting data product. Begin survey now: Disclaimer and data sharing: There are no incentives for your participation nor penalties for lack of participation. Survey responses will be collected anonymously and all effort will be made to protect the identity of the respondent. To encourage honest feedback, the identity of the data repository will not be tied to participant-level responses. A deidentified dataset of participant-level responses to this study will be published for open and long-term reuse in the Data Repository for the University of Minnesota (http://z.umn.edu/drum) within six months of the survey close. This research was reviewed by the University of Minnesota Institutional Review Board as STUDY00011146. Credits: This survey is authored by members of the Data Curation Network, an alliance of US-based academic and non-profit data repositories that share a mission to help researchers ethically share their research data. Thank you on behalf of the survey authors: Sophia Lafferty-Hess, Duke University Hannah Hadley, Penn State University Renata Curty, University of California, Santa Barbara Hoa Luong, University of Illinois Susan Braxton, University of Illinois Jonathan Petters, Virginia Tech University Jake Carlson, University of Michigan Wendy Anne Kozlowski, Cornell University Lisa Johnston, University of Minnesota -------------------------- Data Augmentation -------------------------- Prior to the data analysis we augmented the dataset with a few variables to help us to better organize the data, as well as to run some additional comparative analysis for a more in-depth exploration of the survey results. The column with the repository link was removed to de-identify responses. A total of 12 columns enriched the original dataset downloaded directly from Qualtrics, as described below: Add2_Country (text) Add3_CountryCoded, Country code (1) USA, (0) Other Add4_RepoNumber, Repo Number Assigned number Add6_RepoTypeCoded, Repo Type (15) Disciplinary (25) Generalist (35) Institutional Add7_CTS, Core Trust Seal Certification status (1) Yes (0) No Add8_DCN, DCN Member status status (1) Yes (0) No Add9_Q3_1, Record level - Splitted Q3 in binary Yes (1), No (0) Add10_Q3_2, File level - Splitted Q3 in binary Yes (1), No (0) Add11_Q3_3, Documentation level - Splitted Q3 in binary Yes (1), No (0) Add12_Q3_4, Data Level - Splitted Q3 in binary Yes (1), No (0) Add14_Q3_5, Data as Distributed - Splitted Q3 in binary Yes (1), No (0) Add15_Level(Q3), Highest level of curation selected grade (0-4), ranging from lowest to the maximum level of curation (0) distributed as deposited - L0, (1) record Level - L1, (2) file Level - L2, (3) documentation level - L3, and (4) data record - L4 -------------------------- Data Transformations -------------------------- In addition to the data augmentation step, we also performed data transformations to clean some of the native exported data from Qualtrics using OpenRefine. For example, for Q3, which allowed for multiple choice, we transformed items as individual dichotomic questions for the assignment of the corresponding level of curation based on the strategy 0, Distributed as deposited (no additional curation taken) 1, Record level curation, e.g., perform brief metadata checks for increased findability 2, File level curation, e.g., review files arrangement and make file formats conversions for increased accessibility 3, Documentation level curation, e.g., review documentation and request/add missing documentation for increased reusability 4, Data level curation, e.g., open files and review data contents and may annotate /edit the data for accuracy or interoperability Some of the scales were reversed in order to keep low to high values for agreement and frequency consistent across all survey items. These and other transformations performed are recorded in the dataset’s RequiredTransforms.csv. -------------------------- Data Codes -------------------------- Missing data and “I don’t know” responses were recoded as 99. The qualitative data (free-text answers) were redacted by attempting to remove all mentions of the repository name and replacing it with the text: [redacted repository name]. -------------------------- Data Analysis -------------------------- Data analysis for closed-ended questions was carried out on SPSS version 26. We performed descriptive statistics and frequencies along with cross tabulations and non-parametric tests in order to compare perceptions and opinions about the value of data curation among and across groups. Free-text responses were examined by two independent coders following an inductive thematic analysis approach . An iterative joint analysis of the initial, independently assigned codes was performed seeking harmonization of labels and their definitions. This resulted in a total of 75 codes, organized into five broad categories: 1) curation action, 2) engagement, 3) goals and impact, 4) limitations and 5) workflow. Each category contained a set of subcategories which provided more specificity for analysis.