This codebook.txt file was generated on 20200220 by Lara Friedman-Shedlov ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset YMCA World War I Service Punch Cards 2. Author Information Young Men's Christian Association of North America. International Committee. Principal Investigator Contact Information Name: Institution: Kautz Family YMCA Archives Address: 318 Elmer L. Andersen Library, 222 - 21st Ave S., Minneapolis, MN 55455 Email: ymcaarch@umn.edu 3. Date of data collection (single date, range, approximate date) 1917-1919 4. Geographic location of data collection (where was data collected?): United States of America France Germany 5. Information about funding sources that supported the collection of the data: The data was originally compiled on punch cards by the YMCA in the course of its operations during the First World War. The cards were digitized by the Kautz Family YMCA Archives in 2014 with funding from the University of Minnesota Libraries Strategic Digitization Program, and in 2016-2017, the data was transcribed by volunteers working for FamilySearch International. Subsequently the digitized images were made available publicly on the Zooniverse crowdsourcing platform for additional transcription by volunteers. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: The data is either in the public domain or made available using a Creative Commons CC0 license. 2. Links to publications that cite or use the data: Unknown 3. Links to other publicly accessible locations of the data: The data can be searched by name via FamilySearch at https://www.familysearch.org/search/collection/2513098 4. Links/relationships to ancillary data sets: The data is part of the YMCA World War I-Related Records at the Kautz Family YMCA Archives at the University of Minnesota https://archives.lib.umn.edu/repositories/7/resources/920 5. Recommended citation for the data: World War I Service Cards, 1917-1919. Armed Services World War I-Related Records. Kautz Family YMCA Archives. University of Minnesota. --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: YMCA-WWI-Punch-Card-Images.7z Short description: Zipped file containing 27,364 digital scans of index cards recording the names of individuals who served with the YMCA in WWI, including some cross reference cards. Cards are arranged alphabetically by surname from Abbe to Wilson (the cards containing the names after Wilson alphabetically were unfortunately not preserved) and include some or all of the following data: name, gender (men are on beige/buff cards, women on white, African American Y/N (blue cards), year of birth, address, occupation, work placement, Placement date, salary, date left or returned, qualifications, religion, placed home vs. overseas, marital status, and education. B. Filename: YMCA-WWI-Punch-Card-Data.xlsx Short description: Excel spreadsheet containing transcribed data from the cards, including place of residence, gender, if African American, year of birth, occupation, placement, whether placed in the U.S. or overseas, placement date, salary, date left or returned, qualifications, religion, marital status, and education. The spreadsheet also includes the filename for the corresponding image of the scanned card. C. Filename: YMCA-WWI-Scan-Metadata.csv Short description: Technical metadata for the digital scans of the punch cards, created by the Digital Library Services department of the University of Minnesota Libraries. D. Filename: YMCA-WWI-Zooniverse_Processed_Data.csv Short description: CSV file compiling data transcribed using crowdsourcing via the Zooniverse platform. For each card (listed by Zooniverse subject ID and the filename of the digital scan), the following data is given: the total number of times that card was transcribed ("classified"); a list of the numeric codes of the punches on that card; the number of classifiers/transcribers who recorded the color of the card as white, beige, blue, or not sure; and the number of classifiers/transcribers who recorded that the card had been punched for the categories indicated in the columns. Punches that were not labeled are described by their corresponding number. Column F indicates whether at least 50% of the classifiers agreed on the color of the card. Note: This file does not include any of the data that was written or typed on the cards as text, such as name or place of residence. Updates to file list: "YMCA-WWI-Zooniverse_Processed_Data.csv" was added as an additional file on February 21, 2020. The readme was updated to reference this file, but no changes were made to the existing files (listed above as A, B, and C). 2. Additional related data collected that was not included in the current data package: TIFF versions of the digital scans of cards are available from the Kautz Family YMCA Archives. The raw crowdsourced transcription data downloaded from Zooniverse as a .json file and the Python scripts that were used to process and generate the file described as item D above are also available from the Archives. 3. Are there multiple versions of the dataset? NO If yes, list versions: Name of file that was updated: i. Why was the file updated? ii. When was the file updated? Name of file that was updated: i. Why was the file updated? ii. When was the file updated? -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: The data was originally compiled by the YMCA in the course of its work recruiting and managing its wartime work and captured on cards. Some of the data was written on the cards and some was encoded in the form of machine-readable punches. 2. Methods for processing the data: The cards were digitized by staff of the University of Minnesota Libraries Digital Library Services department. The transcription of data was completed by 22,889 FamilySearch volunteers. The digital scan of each card was viewed by two indexers and then reviewed by an arbitrator who compared both sets of data to ensure the most correct data. The data transcribed by FamilySearch was captured in a spreadsheet. Initially this data as received from FamilySearch did not include the corresponding filename for the digital scan of the corresponding punch card, but this data was added to the spreadsheet by University of Minnesota Libraries staff (item B above). Subsequently, as a second and separate transcription project, the digital scans were uploaded to the Zooniverse platform for crowdsourced transcription of the color and the punches only (not the textual data) by volunteers. This data was downloaded as a .json file and processed using python scripts to generate the file described as item D above: the "data_prep.py" script was used to extract the necessary features and information from the original .json raw data. Also, this script did a statistical count of how many volunteers recorded the features of each punch card. In the dataset generated in this step, the features are all in a single column stored in a JSON dictionary. Next, a Python3 script "data_split.py" was created to generate the final version of the data. This script extracted the data and features out of that JSON data structure and split these features into thirty different columns. Finally, this script matched the features from the number form into text form, which is more readable by users. 3. Describe any quality-assurance procedures performed on the data: For the transcription by FamilySearch, the digital scan of each card was viewed by two indexers and then reviewed by an arbitrator who compared both sets of data to ensure the most correct data. For the crowdsourced transcription in Zooniverse, the digital scan of each card was viewed by at least ten classifiers before it was retired. The file compiling this data indicates the number of individuals who recorded the existence of each punch on that card or the color of the card. This number can be compared with the total number of individuals who reviewed that card (the number of classifiers) to assess the level of agreement among the classifiers.