This readme.txt file was generated on Feb 9, 2021 by Lisa Johnston Recommended citation for the data: Johnston, Lisa R. (2021). Level of curation self-reported by 100 CoreTrustSeal certified repositories (2017-2019). Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/w0z3-z709. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Level of curation self-reported by 100 CoreTrustSeal certified repositories (2017-2019) 2. Author Information Principal Investigator Contact Information Name: Lisa Johnston Institution: University of Minnesota Address: Email: ljohnsto@umn.edu ORCID: http://orcid.org/0000-0001-6908-9240 3. Date published: Feb 9, 2021 4. Date of data collection: 11-20-20 to 02-09-2021 5. Geographic location of data collection: na 6. Information about funding sources that supported the collection of the data: Alfred P Sloan Foundation funds the Data Curation Network 7. Overview of the data (abstract): This dataset extracts and makes machine-actionable the responses to the "Level of curation performed" component of the CoreTrustSeal application v01 (2017-2019). The author reviewed 100 pdf CTS applications and compiled the responses into one spreadsheet for further analysis. Additionally, the CTS application instruction for v01 were parsed in order to analyze the completed applications and included here in a spreadsheet. This could be useful for others interested in this type of extraction that goes beyond the focus of this study. The applications were linked from the CTS website and released as CC0 per 11-20-20 email correspondence with Dr. Jonas Recker, Chair, CoreTrustSeal Standards and Certification Board. Related work, Lindlar, Michelle, & Rudnik, Pia. (2019). Eye on Core Trust Seal - Data Set (Version 1.0) [Data set]. Presented at the 16th International Conference on Digital Preservation (iPRES2019), Amsterdam, Netherlands: Zenodo. http://doi.org/10.5281/zenodo.3267690 -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC0 1.0 Universal http://creativecommons.org/publicdomain/zero/1.0/ 2. Links to publications that cite or use the data: This data was gathered as part of a literature review in the "value of curation" project, undertaken by members in the Data Curation Network in 2020. In January 2021, my co-authors and I surveyed US-based data repository staff on the level of curation performed and these results will be released in a future publication. 3. Was data derived from another source? YES If yes, list source(s): A list of CoreTrustSeal repositories was generated from their website at https://www.coretrustseal.org/maps/export/geojson/4/. 4. DRUM Terms of Use: By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- There are two csv files and a json file included here. 1. File List A. Filename: CTSv01.00 Short description: The CTS requirements instruction for v01 parsed B. Filename: CTS_levelofcuration Short description: The level of curation extracted for 100 CTS applications C. Filename: 4.json Short description: The JSON file listing CTS certified repositories, downloaded from the Coretrustseal.org website Nov 19, 2020 -------------------------- METHODOLOGICAL INFORMATION -------------------------- The list of CoreTrustSeal repositories was downloaded from their website at https://www.coretrustseal.org/maps/export/geojson/4/ on Nov 19, 2020. This file was called 4.json. Pulling the data into OpenRefine, I transformed it into 101 rows of data. Since I was only looking at CTS requirements from 2017-2019, one row was deleted since the application fell into the newer requirements, the Odin institute per this press release, https://www.coretrustseal.org/wp-content/uploads/2020/11/For_Release_2020-11-05-100th_CTS_Certification_Odum.pdf. Next, I added a column for Country information by extracting from the address using OpenRefine and cleaning up any addresses with missing country name (mostly from the US). In order to review the pdf applications, I needed to understand how the application was formatted. To do this I created CTS_applications.csv based on the CTS requirements and guidance from 2017-2019 found at https://www.coretrustseal.org/wp-content/uploads/2017/01/Core_Trustworthy_Data_Repositories_Requirements_01_00.pdf. Here I parsed each requirement into Heading, Subheading, Req #, Requirement Short, Requirement long, Guidance. I analyzed the CTS guide for which sections would have information about the level of curation each repository provides. Some aspects of the applications could be considered curation, such as storage, security, and technology. I did not include these and instead limited my focus to actions that would fall under CTS's own definition in section R0.3, reprinted below. *********** R0.3 Level of Curation Level of Curation Performed. Select all relevant types from: A. Content distributed as deposited B. Basic curation – e.g., brief checking, addition of basic metadata or documentation C. Enhanced curation – e.g., conversion to new formats, enhancement of documentation D. Data-level curation – as in C above, but with additional editing of deposited data for accuracy "(3) Level of Curation. This item is intended to elicit whether the repository distributes its content to data consumers without any changes, or whether the repository adds value by enhancing the content in some way All levels of curation assume initial deposits are retained unchanged and that edits are only made on copies of those originals. Annotations/edits must fall within the terms of the licence agreed with the data producer and be clearly within the skillset of those undertaking the curation. Thus, the repository will be expected to demonstrate that any such annotations/edits are undertaken and documented by appropriate experts and that the integrity of all original copies is maintained. Knowing this will help reviewers in assessing other certification requirements. Further details can be added that would help to understand the levels of curation you undertake." Source: https://www.coretrustseal.org/wp-content/uploads/2017/01/Core_Trustworthy_Data_Repositories_Requirements_01_00.pdf ********* Finally, I extracted the relevant components from each of the 100 applications and compiled these into two columns "Highest Level of Curation" and "Comments on the Level of Curation." Note, the applications did allow for multiple levels to be selected and I only included the highest level indicated on a scale of A-D where D is the highest. The comments were pulled verbatim and were analyzed via a close read methodology where key themes were extracted for use in a survey we were developing to analyze the value that curation has on the data sharing process. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: CTS_levelofcuration ----------------------------------------- 1. Number of variables: 9 2. Number of cases/rows: 100 3. Missing data codes: BLANK when information was not present in the Comments section, this term was used 4. Variable List ID= a id number generated by CoreTrustSeal, range is 228-330 Name = name of the repository undergoing CTS certification URL Home = URL of the repository CTS application = URL of the published CTS application pdf Generation = The CTS application was updated in 2020. 100 of the 101 repositories reviewed were for the 2017-2019 guidelines. Address = address of the repository used in the source JSON file to generate a map on the CTS website Country = country of the repository extracted from the address Highest Level of Curation = CTS provides four levels and applicants may choose all that apply. This column captures the "highest" level identified in the application where A is the lowest and D is the highest. Comments on the Level of Curation = This field copies and pasts the comments made by the applicant on the "Level of curation" section. If no comments were made, the field will say "Blank".