------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Data Underlying "Is the open access citation advantage real? A systematic review of the citation of open access and subscription-based articles" 2. Author Information Name: Allison Langham-Putrow Institution: University of Minnesota Libraries Address: 335C Walter Library, 117 Pleasant St S E, Minneapolis, MN 55455-0291 Email: lang0636@umn.edu ORCID: 0000-0003-0196-7224 Name: Caitlin Bakker Institution: University of Minnesota Health Sciences Library Address: Room 5-110 PWB, 516 Delaware Street SE, Minneapolis, MN 55455-0374 Email: cjbakker@umn.edu ORCID: 0000-0003-4154-8382 Name: Amy Riegelman Institution: University of Minnesota Libraries Address: Room 10 OMWL, 309 19th Ave S, Minneapolis, MN 55455 Email: aspringe@umn.edu ORCID: 0000-0003-4127-5222 3. Date of data collection July 2019 to March 2021 4. Geographic location of data collection (where was data collected?) Not applicable 5. Information about funding sources that supported the collection of the data: Not applicable -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: These data are made available under CC-BY-NC license. https://creativecommons.org/licenses/by-nc/3.0/us/ 2. Links to publications that cite or use the data: Manuscript forthcoming through PLoS One. 3. Links to other publicly accessible locations of the data: Not applicable. 4. Links/relationships to ancillary data sets: Not applicable. 5. Was data derived from another source? If yes, list source(s): This data underlies a systematic review project. All data were extracted from research. A complete list of citations is available in the manuscript publication. 6. Recommended citation for the data: Langham-Putrow, Allison; Bakker, Caitlin; Riegelman, Amy. (2021). Data Underlying "Is the open access citation advantage real? A systematic review of the citation of open access and subscription-based articles". Retrieved from the Data Repository for the University of Minnesota, https://doi.org/10.13020/357e-ek33. --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: S1_data Short description: The supplementary data file describing all data extracted from or relating to the studies included in this systematic review project. 2. Relationship between files: Not applicable. 3. Additional related data collected that was not included in the current data package: Not applicable. 4. Are there multiple versions of the dataset? No -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: A systematic review of 17 databases was conducted. All titles and abstracts were screened by two independent reviewers, followed by full-text screening by two independent reviewers. Data extraction forms were developed and piloted using Google Forms, and all extraction was completed in duplicate. Risk of Bias was assessed using Glynn's Evidence-based Librarianship Critical Appraisal Tool. Any discrepancies in extracted data or critical appraisal were resolved through consensus or by a third party. 2. Methods for processing the data: See above. 3. Instrument- or software-specific information needed to interpret the data: Data is in a CSV format and should not require specialized software for access, analysis, or interpretation. 4. Standards and calibration information, if appropriate: Not applicable. 5. Environmental/experimental conditions: Not applicable. 6. Describe any quality-assurance procedures performed on the data: All data were collected in duplicate by two independent researchers. Any discrepancies in the data were resolved by consensus or by a third researcher. 7. People involved with sample collection, processing, analysis and/or submission: Allison Langham-Putrow, Caitlin Bakker, Amy Riegelman ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: S1_data ----------------------------------------- 1. Number of variables: 17 2. Number of cases/rows: 134 3. Missing data codes: NR - Not Reported (data was missing or unclear in original study) 4. Variable List A. Name: Citation Description: First author's surname and year of publication B. Name: Discipline (OECD) Description: Six major subject codes outlined by the Organisation for Economic Cooperation and Development (OECD)’s Field of Science and Technology Classification. Available at https://www.oecd.org/science/inno/38235147.pdf C. Name: OA modes included Description: Broad classification of open access mode. Green: Materials made open access in a location other than where they were originally published, such as a repository Gold: Materials made open access through their venue of publication, generally through an open access journal Both: Studies including green and gold modes of open access D. Name: Study Design Description: Broad description of the design of the study. Noted as either randomized or non-randomized. E. Name: Data Sources Description: Source(s) of the citation data, including: Scopus/Elsevier, Web of Science/Clarivate, Google Scholar, and Other. Google Scholar as a data source includes Publish or Perish. Other refers to any data source aside from the three listed here. F. Name: OA (n) Description: Number of open access items included in the study. Articles or journals is indicated to note the unit of measure. G. Name: TA (n) Description: Number of toll access (or subscription-based) items included in the study. Articles or journals is indicated to note the unit of measure. H. Name: Metrics used Description: The metrics used to describe or capture the primary outcome of citation counts or rates. Mean Citations Per Article Median Citations Per Article Total Citations for Each Article Other Metric (this includes any measure not described in the above categories, including calculations or measures unique to the study) I. Name: Does the Open Access Citation Advantage Exist? Description: The overall finding of the study. Yes - Authors of this study found that open access articles receive a greater number of citations compared to their toll access or subscription-based counterparts No - Authors of this study found that open access articles do not receive a greater number of citations compared to their toll access or subscription-based counterparts Sometimes - Authors of this study found that open access articles may receive more citations than toll access articles in specific cases, but not in all instances. For example, open access articles published in a certain year or a certain country may receive more citations than toll access, but open access articles published in another year or another country do not. Inconclusive - The authors found insufficient data to determine whether there was an open access citation advantage. J. Name: Risk of Bias (Overall) Description: An assessment of the overall methodological quality or potential biases in the study. This is based on the 26 prompting questions and 3 prompting sub-questions described in the four following sections. Each question could be responded to as Yes (Y), No (N), Unclear (U), or NA. To arrive at the overall assessment, the following calculation is used: (Y + N + U = T). If Y/T < 75% or if N+U/T > 25%, there is a significant risk of bias. This was marked as "No/Unclear." Where there was no significant risk of bias detected, this was marked as "Yes." Each section described below relates to a specific domain of bias. The original tool is described in: Glynn L, Cleyle S. A critical appraisal tool for library and information research. Library Hi Tech. 2006;24(3):387-99. doi: 10.1108/07378830610692154. K. Name: Section A: Population Description: Critical appraisal of potential biases or methodological issues introduced in the selection of a population, including a clarity of inclusion and exclusion criteria, sufficient sample size, and representativeness of sample. This section includes 7 prompting questions and 3 subquestions, which can be responded to as Yes (Y), No (N), Unclear (U), or NA. To arrive at the overall assessment, the following calculation is used: (Y + N + U = T). If Y/T < 75% or if N+U/T > 25%, there is a significant risk of bias. This was marked as "No/Unclear." Where there was no significant risk of bias detected, this was marked as "Yes." L. Name: Section B: Data Collection Description: Critical appraisal of potential biases or methodological issues introduced in the data collection process. This includes clarity data collection methods, validity of study measures, and appropriate time frames for outcome measurement. This section includes 8 prompting questions, which can be responded to as Yes (Y), No (N), Unclear (U), or NA. To arrive at the overall assessment, the following calculation is used: (Y + N + U = T). If Y/T < 75% or if N+U/T > 25%, there is a significant risk of bias. This was marked as "No/Unclear." Where there was no significant risk of bias detected, this was marked as "Yes." M. Name: Section C: Study Design Description: Critical appraisal of potential biases or methodological issues introduced by the study design. This includes research methodology being presented at a level of detail that facilitates replication and outcomes clearly discussed in relation to data collected. This section includes 5 prompting questions, which can be responded to as Yes (Y), No (N), Unclear (U), or NA. To arrive at the overall assessment, the following calculation is used: (Y + N + U = T). If Y/T < 75% or if N+U/T > 25%, there is a significant risk of bias. This was marked as "No/Unclear." Where there was no significant risk of bias detected, this was marked as "Yes." N. Name: Section D: Results Description: Critical appraisal of potential biases or methodological challenges introduced in the presentation of results. This includes the clarity of results outlined, accounting for confounding variables, and conclusions that accurately reflect findings of analysis. This section includes 6 prompting questions, which can be responded to as Yes (Y), No (N), Unclear (U), or NA. To arrive at the overall assessment, the following calculation is used: (Y + N + U = T). If Y/T < 75% or if N+U/T > 25%, there is a significant risk of bias. This was marked as "No/Unclear." Where there was no significant risk of bias detected, this was marked as "Yes." O. Name: Item Type Description: General description of the publication type of the study: Article, Thesis, Conference Presentation, Report, or Preprint. P. Name: Journal (if applicable) Description: The name of the journal that published the article. Q. Name: Sherpa-Romeo Description: The open access status of the journal publishing the original study, noted as either "Open" (meaning a fully gold journal) or "Closed" (meaning a toll access or hybrid journal). Data is based on Sherpa-Romeo (available at https://v2.sherpa.ac.uk/romeo/about.html) and through consultation of individual journal policies.