This readme.txt file was generated on <20200602> by Elizabeth Coburn ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Data supporting: "Testing Our Assumptions: Preliminary Results from the Data Curation Network" 2. Author Information Name: Lisa R Johnston Institution: University of Minnesota Libraries Email: ljohnsto@umn.edu ORCID: http://orcid.org/0000-0001-6908-9240 Name: Liza Coburn Institution: University of Minnesota Libraries Email: ecoburn@umn.edu ORCID: https://orcid.org/0000-0001-8764-0040 3. Date of data collection: 2019-01-01 to 2019-12-31 4. Geographic location of data collection: n/a 5. Information about funding sources that supported the collection of the data: Alfred P Sloan Foundation -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: Attribution-NonCommercial-NoDerivs 3.0 United States http://creativecommons.org/licenses/by-nc-nd/3.0/us/ 2. Links to publications that cite or use the data: n/a (publication of associated manuscript forthcoming) 3. Recommended citation for the data: Coburn, Elizabeth, Johnston, Lisa R. (2020). "Data supporting: 'Testing Our Assumptions: Preliminary Results from the Data Curation Network'" Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/ak4d-ge34. -------------------------- GENERAL INFORMATION -------------------------- This dataset was compiled in order to support the findings discussed in the associated (as yet unpublished) manuscript "Testing Our Assumptions: Preliminary Results from the Data Curation Network". Data were collected between 2019-01-01 and 2019-12-31, or during the first year of the Data Curation Network's pilot shared curation service. Data were compiled, organized and analyzed by Elizabeth Coburn, Data Curation Network Project Coordinator. ----------------------------------------- Methods ----------------------------------------- Datasets are submitted to the DCN using a project management and work tracking tool called Jira which is part of a suite of tools developed and offered by Atlassian. Certain attributes about submitted datasets are pulled from Jira and recorded in Google Sheets for ease of tracking, visualization (as the Jira access is restricted to DCN project personnel), and to facilitate quick analysis and the overall coordination of the DCN's curation activities. --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: 2019_DCN_datasets_v2.csv Short description: 2019 DCN datasets B. Filename: 2019_DCN_data_type_expertise.csv Short description: 2019 DCN data type expertise C. Filename: 2019_DCN_disciplinary_expertise.csv Short description: 2019 DCN disciplinary expertise D. Filename: 2019_DCN_partner_dataset_totals.csv Short description: 2019 DCN partner dataset totals E. Filename: 2019_DCN_partner_submission_rationale.csv Short description: 2019 DCN partner submission rationale F. Filename: 2019_DCN_data_v2.xlsx Short description: Contains all data files in one workbook ---------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: 2019_DCN_datasets.csv ---------------------------------------------------- Col A "Dataset ID" = unique identifier generated by our project management tool (Atlassian Jira) for each dataset submitted to the DCN Col B "Submitting Partner" = shorthand name for DCN partners submitting datasets to the DCN Col C "Curating Partner" = shorthand name for DCN partners assigned to curate datasets submitted to the DCN Col D "Submission Date" = date dataset was submitted to the DCN Col E "Due Date" = date DCN curation must be completed by, provided by the submitting partner Col F "Curation Completion Date" = data DCN curation was actually completed Col G "DCN Turnaround Time (business days)" = number of business days (Monday - Friday) between the submission date and the curation completion date Col H "DCN Curator Time (hours)" = number of hours a DCN curator spent curating a DCN dataset, logged in Jira Col I "Response time in hours (first assignment)" = the number of hours it took a DCN curator to respond to a dataset assignment (accepting or declining the assignment) Col J "Subject area" = the subject area of a submitted dataset (possible values include: Life Sciences, Physical Sciences & Mathematics, Social and Behavioral Sciences, Engineering and Arts and Humanities) Col K "Discipline" = the discipline of a submitted dataset (too many possible terms to list, but vocabulary is controlled) Col L "Shortened Discipline" = the shortened name of a discipline of a submitted dataset for ease of visualization in charts and figures in the associated manuscript Col M "Primary data type" = the main data type of a submitted dataset, and the basis of assignment to a DCN curator (the DCN uses a controlled list of terms) Col N "Programming language (if code)" = for code datasets, the programming language of the code Col O "Level of curation" = the level of curation of a submitted dataset as assessed by the local curator after both the local and DCN the curation processes are complete (possible values include: major curation actions taken, essential curation actions taken, minimal curation actions taken, no curation actions taken, NA and blank) Cells containing "NA" stand for "not applicable" and indicate that data are not available for a particular variable because the submitted dataset was not successfully assigned and curated by the DCN (and was most likely returned to the submitting partner for local curation) Cells containing no value indicate that values were not available at the time of publication (the local curation process was not complete and the dataset had not yet been evaluated or published) or was not appropriate (non-code datasets do not have programming languages). ---------------------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: 2019_DCN_data_type_expertise.csv ---------------------------------------------------------------- Col A "Data Type" = type of data a DCN curator is expert in (the DCN uses a controlled list of terms) Col B "Data Subtype" = the subtype of data a DCN curator is expert in (applicable for the following data types: programming language, database language or statistical programming language) Col C "Expert Count" = the number of DCN curators possessing curation expertise in a particular data type/subtype Col D "Notes" = relevant notes on the way DCN curators ("experts") were counted Cells containing no value are a result of the necessary structure of the data for ease of comprehension ------------------------------------------------------------------ DATA-SPECIFIC INFORMATION FOR: 2019_DCN_disciplinary_expertise.csv ------------------------------------------------------------------ Col A "Subject area" = the subject area related to the discipline a DCN curator possesses curation expertise in (possible values include: Life Sciences, Physical Sciences & Mathematics, Social and Behavioral Sciences, Engineering and Arts and Humanities) Col B "Discipline" = the discipline a DCN curator possesses curation expertise in (too many possible terms to list, but vocabulary is controlled) Col C "Shortened Discipline" = the shortened name of a discipline of a submitted dataset for ease of visualization in charts and figures in the associated manuscript Col D "Expert Count" = the number of DCN curators possessing curation expertise in a particular discipline ------------------------------------------------------------------ DATA-SPECIFIC INFORMATION FOR: 2019_DCN_partner_dataset_totals.csv ------------------------------------------------------------------ Col A "DCN partner name" = the shorthand name for DCN partners submitting datasets to the DCN Col B "Dataset count" = number indicating the total number of datasets a DCN partner either received or approved between 2019-01-01 and 2019-12-31 Col C "Notes" = a notation indicating if the value in Col B "Dataset Count" refers to datasets received or datasets approved ------------------------------------------------------------------------ DATA-SPECIFIC INFORMATION FOR: 2019_DCN_partner_submission_rationale.csv ------------------------------------------------------------------------ Col A "Rationale for submitting to the DCN" = the reason a DCN partner submitted a dataset to the DCN (possible values include: lack data or file type expertise necessary to curate the data, lack bandwidth to curate the data by the due date, lack subject or disciplinary expertise necessary to curate the data, hope to receive a higher level of curation than we could provide locally, already curated locally, but we'd like a second look from an expert) Col B "Proportion of total responses" = the percentage or proportion of total responses (Col D) attributed to each possible value in Col A Col C "Affirmative responses" = the number of responses attributed to each possible values in Col A Col D "Total responses" = the total number of responses collected