This readme.txt file was generated on 20210519 by Cody Hennesy. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset LibGuides dataset: Subject guides at academic libraries 2. Author Information Principal Investigator Contact Information Name: Cody Hennesy Institution: University of Minnesota, Twin Cities Address: Wilson Library Email: chennesy@umn.edu ORCID: 0000-0002-9410-9810 Associate or Co-investigator Contact Information Name: Annis Lee Adams Institution: California State University, East Bay Address: University Libraries Email:lee.adams@csueastbay.edu ORCID: 0000-0003-0428-7793 3. Date of data collection (single date, range, approximate date): 2020-10-15 to 2020-12-09 (YYYY-MM-DD) 4. Geographic location of data collection (where was data collected?): Online 5. Information about funding sources that supported the collection of the data: Not funded. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: CC BY-NC, Attribution-NonCommercial 3.0 United States 2. Links to publications that cite or use the data: Hennesy, C. & Adams, A.L. (2021). Measuring actual practices: A Computational analysis of LibGuides in academic libraries. (Under review) 3. Links to other publicly accessible locations of the data: - 4. Links/relationships to ancillary data sets: - 5. Was data derived from another source? Data was collected from public LibGuides HTML pages and from an API provided by LibGuides. A fair use analysis was conducted to ensure data collection and sharing would be a fair use (and much of the content is licensed for re-use, though licensing varies). The compilation of the specific data elements captured here is a transformative use, enabling quantitative analyses of specific features of the guides (e.g., average number of tabs per guide), while not reproducing substantial portions of any guide. The dataset focuses on formal properties of guides rather than the content therein (e.g., box names are included but not the content from the boxes). For more information on LibGuides data types referred to throughout (e.g., boxes, tabs, profiles) see the LibGuides documentation: https://ask.springshare.com/libguides 6. Recommended citation for the data: Hennesy, C. & Adams, A.L. (2021). "LibGuides dataset: Subject guides at academic libraries (2020)." Data Repository for the University of Minnesota. --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: lg_data.csv Short description: Each row contains data collected from one subject guide landing page. The url column operates as a unique identifier (and should be resolvable as long as the website is maintained). Other columns contains data elements from each guide. B. Filename: inst_list.txt Short description: List of institutions included in the study. 2. Relationship between files: - 3. Additional related data collected that was not included in the current data package: The original data collection included all content links from each guide, the subject tags associated with each guide profile, all "related guides" listed on each page, and tabs contained within boxes. These columns were dropped from the dataset as they were not utilized for analysis. 4. Are there multiple versions of the dataset? yes/no no -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: The requests and Beautiful Soup Python packages were used to scrape data from publicly accessible LibGuides HTML pages and from an API provided by LibGuides to enable machine-access (Reitz, 2015; Richardson, 2015). See "Data collection" section from "Measuring actual practices…" paper for more details. 2. Methods for processing the data: Data elements were compiled into a Python Pandas dataframe and then exported into the repository CSV. Further cleaning was undertaken before analysis, but the data here is unprocessed. Minor cleaning steps were taken as a part of the data collection, such as unicode escaping to correct for URL encoding issues, and using .strip() to remove whitespace from the beginnings and endings of strings. See "Data cleaning and analysis" sections of "Measuring actual practices…" paper for more info. 3. Instrument- or software-specific information needed to interpret the data: - 4. Standards and calibration information, if appropriate: - 5. Environmental/experimental conditions: - 6. Describe any quality-assurance procedures performed on the data: - 7. People involved with sample collection, processing, analysis and/or submission: Cody Hennesy ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: lg_data.csv ----------------------------------------- 1. Number of variables: 11 2. Number of cases/rows: 12,781 3. Missing data codes: Blank cell: No data was found/collected from the guide for this element. {}: Empty Python dict. No data was found/collected from the guide for this key/value pair element. []: Empty Python list. No data was found/collected from the guide for this list element. 4. Variable List A. Name: site_id Description: Unique ID for each LibGuides instance/site. Pandas dtype: int64 B. Name: url Description: URL of guide page collected. Pandas dtype: object C. Name: page_size Number of characters in guide page source code: .len(src). Pandas dtype: int64 D. Name: n_links Number of tags found on the full guide page. Pandas dtype: int64 E. Name: title Title of subject guide page (e.g., Psychology: Articles & Databases). Pandas dtype: object. F. Name: description Description of guide from guide page. Pandas dtype: object. G. Name: updated Date the guide was last updated. Pandas dtype: object. Unformatted text strings as collected from guide page. These can be converted to Python datetime with a format of '%b %d, %Y %I:%M %p' (more info https://docs.python.org/3/library/datetime.html). H. Name: tabs_dict Key/value pairs (in Python dictionary format) where the key is the title of each tab on the guide page and the value is the corresponding URL for the tab. Pandas dtype: object. I. Name: boxes Python list of the titles for each "box" on the subject guide page. Pandas dtype: object. J. Name: profile_dict Key/value pairs where key is the name associated with a profile box and the value is the URL linking to the profile account page. Pandas dtype: object. K. Name: subjects_dict Key/value pairs where the key is the name of a subject tag associated with the guide, and the value is the URL to the subject page associated with it. Pandas dtype: object.