This readme.txt file was generated on <YYYYMMDD> by Abbey Hammell <<ADD OTHERS>> 

Recommended citation for the data: Tripp, Alayo; Hammell, Abbey; Munson, Benjamin. (2021). Social Information in Written Standard Sentence Materials: Methods and Data. Retrieved from the Data Repository for the University of Minnesota, https://hdl.handle.net/11299/225199.


-------------------
GENERAL INFORMATION
-------------------


1. Title of Dataset Social Information in Written Standard Sentence Materials: Methods and Data


2. Author Information


  Principal Investigator Contact Information
        Name: Tripp, Alayo
           Institution: University of Minnesota
           Email: tripp158@umn.edu 
	   ORCID: https://orcid.org/0000-0003-4294-8820


  Associate or Co-investigator Contact Information
        Name: Hammell, Abbey
           Institution: University of Minnesota 
           Email: hammell@umn.edu
	   ORCID: https://orcid.org/0000-0002-4054-0998


  Associate or Co-investigator Contact Information
        Name: Munson, Benjamin
           Institution: University of Minnesota
           Email: munso005@umn.edu
	   ORCID: https://orcid.org/0000-0002-1547-6912


3. Date published or finalized for release: 2021-11-02

4. Date of data collection (single date, range, approximate date) 2019-10-25 to 2021-10-25

5. Geographic location of data collection (where was data collected?):  online recruitment from Prolific.co

6. Information about funding sources that supported the collection of the data: Sponsorship: National Institutes of Health Grant R21 DC018070

7. Overview of the data (abstract): [From the under-consideration paper, authored by Tripp & Munson]: The Harvard/IEEE (henceforth H/I) sentences are widely used for testing speech recognition in English. This study examined whether two talker characteristics, race and gender, are conveyed by 80 of the H/I sentences in their written form, and by a comparison set of sentences from the internet message board Reddit, which were expected to convey social information. As predicted, a significant proportion of raters reported perceiving race and gender information in the H/I sentences. Suggestions of how to manage the potential influence of this social information on measures of speech intelligibility are provided. This archive includes the raw data from this paper, and the code used to generate the experiment in Qualtrics. The later of these was programmed by Abbey Hammell.


--------------------------
SHARING/ACCESS INFORMATION
-------------------------- 


1. Licenses/restrictions placed on the data: Attribution-NonCommercial-NoDerivs 3.0 United States

2. Links to publications that cite or use the data: Tripp, A., & Munson, B. (under consideration). Written standard sentence materials convey social information. Journal of the Acoustical Society of America Express Letters.

3. Was data derived from another source? N/A. 

4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use


---------------------
DATA & FILE OVERVIEW
---------------------


1. File List
   A. Filename: TrippMunson2021_surveyProgramming.zip       
      Short description: Contains all of the files needed to recreate the survey described in Tripp and Munson (2021)

   B. Filename: TrippMunson2021_raterData.csv       
      Short description: Contains all of the raw data from Tripp and Munson (2021)      
        
   C. Filename:  readme.txt    
      Short description: Readme file


2. Relationship between files: The reproducibility files for the Qualtrics survey (bound together in a .zip archive, TrippMunson2021_surveyProgramming.zip) The data-set (TrippMunson2021_raterData.csv) contains rater data from this survey. Some of the columns can only be understood by reading the file 2277719_Munson_PrePilot_surveydetails.pdf) in the .zip archive.


--------------------------
METHODOLOGICAL INFORMATION
--------------------------

1. Description of methods used for collection/generation of data: 

2. Methods for processing the data: 

	The survey was programmed and collected on Qualtrics online survey platform. 
	
	Data was exported from Qualtrics using the following export options: 
		- Export as CSV 
		- Download all fields
		- Use numeric values
		- Remove line breaks 
		- Recode seen but unanswered questions as -999 
		- Split multi-value fields into columns

	After the data was exported from Qualtrics, the R script "2277719_RemovBlankLMColumns.R" (in "TrippMunson2021_surveyProgramming.zip") was run on the CSV dataset to account for superfluous columns that were created as an artifact of how the survey was programmed. 

	After the the "2277719_RemovBlankLMColumns.R" script was run, data was reshaped into long format.


3. Instrument- or software-specific information needed to interpret the data: 

	You can open the "2277719_Munson_PrePilotCleaned_ 20191028.csv" data file with any program that can read a CSV.

	The codebook for the experiment is given in PDF format-- "2277719_Munson_PrePilot_surveydetails.pdf". 

	If you would like to review the survey within Qualtrics itself, you will need a Qualtrics account. From there, you can import the "2277719_Munson_PrePilot.qsf" file into your own Qualtrics account. 
	

4. Standards and calibration information, if appropriate: NA


5. Environmental/experimental conditions: 

	120 sentences were chunked together into 4 groups (i.e. blocks) of 40. You can view the sentences that belonged to each of the four blocks in the "Pre-Pilot_Blocks_LONG.csv" and "Pre-Pilot_Blocks_WIDE.csv" documents in "TrippMunson2021_surveyProgramming.zip". 

	Each of the 4 blocks of sentences had 20 Reddit sentences and 20 IEEE sentences. The sentences for each block are outlined in the "Pre-Pilot_Blocks_LONG.csv" and "Pre-Pilot_Blocks_WIDE.csv" documents in "TrippMunson2021_surveyProgramming.zip". The sentences within each block were shown in a random order between participants. 

	The viewing order for each of the four blocks of sentences was randomized between participants. There are 24 possible block permutation conditions, which are outlined in the "Groups_BlockOrders.pdf" and "Groups_BlockOrders.csv" documents in "TrippMunson2021_surveyProgramming.zip". 

	Each of the 4 blocks had 4 "engagement" sentences asked after the 5th, 15th, 25th, and 35th sentences of each block. The engagement sentences were presented in the same order for each block across participants. The engagement sentences and their order details can be found int he "Pre-Pilot_EngagementSents.csv" document in "TrippMunson2021_surveyProgramming.zip".


6. Describe any quality-assurance procedures performed on the data: 

	Raw data was exported directly from Qualtrics using the export options described in point #2 above. Data was then restructured, first using the "2277719_RemovBlankLMColumns.R" script and then the and "BEN ADD SCRIPT NAME HERE THAT YOU USED TO PUT DATA INTO LONG FORMAT" R script, both of which can be found in "TrippMunson2021_surveyProgramming.zip". Data was restructured using these R scripts only; it was never manually modified or restructured. 


7. People involved with sample collection, processing, analysis and/or submission: Alayo Tripp, Ph.D., Abbey Hammell, M.A., & Benjamin Munson, Ph.D. 


-----------------------------------------
FILE-SPECIFIC INFORMATION FOR: TrippMunson2021_surveyProgramming.zip
-----------------------------------------

Zip file contains a folder with the following files:


Files for Analysis of Current Data & Replication: 
	
	1. 2277719_Munson_PrePilot_surveydetails.pdf - A human-readable outline of the survey, as programmed into Qualtrics. This document includes things such as: questions, response options, skip/display logic, variable names, numeric recode values, survey flow, etc. 


Files for Replication: 

	1. 2277719_Munson_PrePilot.qsf - A QSF file is a “Qualtrics Survey Format” file. This file has all of the information required to import a copy of our programmed survey into Qualtrics for your own use. The QSF file is in JSON. Importing the QSF file into your own Qualtrics account will give you a better sense of how the survey was set up/programmed. 

	2. Pre-Pilot_Blocks_WIDE.csv - Provides information about the sentences that were present in each block. This document was used to program the “loop & merge” functionality in each sentence block within the survey. See "5. Environmental/experimental conditions:" under METHODOLOGICAL INFORMATION for more information. 

	3. Pre-Pilot_Blocks_LONG.csv - Provides information about the sentences that were present in each block. This is the same information as the " Pre-Pilot_Blocks_WIDE.csv" document, but in long format. 

	4. Pre-Pilot_EngagementSents.csv - Outlines the engagement sentences that were used within each sentence block. Engagement questions were given after the 5th, 15th, 25th, and 35th sentences within each block. Engagement sentences were presented in the same order across participants. See "5. Environmental/experimental conditions:" under METHODOLOGICAL INFORMATION for more information. 

	5. Groups_BlockOrders.csv - Shows the sentence block order for each of the 24 “block order” groups subjects could be randomized into. See "5. Environmental/experimental conditions:" under METHODOLOGICAL INFORMATION for more information. 

	6. Groups_BlockOrders.pdf - The same information as #5 directly above, but color-coded for ease of interpretation and in PDF format. 

	
	7. 2277719_Munson_JS_SentenceCount_Script.js - Provides an explanation of how to program the  sentence progress counter (#/160) so that it works across ALL randomized Qualtrics sentence blocks. Note that this is already programmed into the survey if you create the survey using the “2277719_Munson_PrePilot.qsf“ file. However, this document gives a brief outline of how the sentence count code works. 

	8. 2277719_RemovBlankLMColumns.R - R script that takes in raw data exported from Qualtrics and puts out a cleaned version.

-----------------------------------------
DATA-SPECIFIC INFORMATION FOR: TrippMunson2021_raterData.csv
-----------------------------------------
Some of these descriptions require access to the file 277719_Munson_PrePilot_surveydetails.pdf (part of the .zip archive in TrippMunson2021_surveyProgramming.zip), which is one of the documents that describes the design of the Qualtrics survey that was used to collect data.

sentID		This is a project-internal ID given to each of the target sentences

sentence		The specific sentence used as a stimulus, with words separated by underscores instead of spaces

type		whether the sentence came from the Havard/IEEE corpus ("IEEE") or from Reddit ("Reddit")

EndDate		The date on which participation occurred (reflects that an additional set of data was collected in response to reviewer suggestion)

ResponseId	The response ID assigned by the Qualtrics survey. Note that this is not the Prolific ID, which is assigned by the website Prolific Academic, and which is not included in this data set in order to protect participants privacy. Each participant has a unique Qualtrics ResponseId.

age		The participants' self-reported age in years.

gender		The participants' self-reported gender, with codes described in 2277719_Munson_PrePilot_surveydetails.pdf.

gender_3_TEXT	For individuals who reported a gender that was neither male nor female, their self-reported gender. 

race		The participants' self-reported race, with codes described in 2277719_Munson_PrePilot_surveydetails.pdf.

ethnicity	The participants' self-reported ethnicity, with codes described in 2277719_Munson_PrePilot_surveydetails.pdf.

Group		The experiment used multiple randomizations of the sentences, so that stability of ratings across time can be assessed. This column shows which randomization group (out of 24 possible combinations of sentence blocks) the participant was in. The randomization scheme is described in 2277719_Munson_PrePilot_surveydetails.pdf, Groups_BlockOrders.csv (or Groups_BlockOrders.pdf).

Year		Response to the naturalness question, with response categories described in 2277719_Munson_PrePilot_surveydetails.pdf.  NAs are individuals who skipped this question.

Gender_Response	Response to the gender question, with response categories described in 2277719_Munson_PrePilot_surveydetails.pdf. NAs are for individuals who skipped this question.

Gender_Category	For individuals who answered "yes" to the prior question, the specific gender category they believed the sentence to convey, with codes described in 2277719_Munson_PrePilot_surveydetails.pdf.  NAs are for people who answered "no" to the previous question, or who did not respond to the question.

Race_Response	Response to the race/ethnicity question, with response categories described in 2277719_Munson_PrePilot_surveydetails.pdf. NAs are for individuals who skipped this question.

Race_Category	For individuals who answered "yes" to the prior question, the specific race/ethnicity category they believed the sentence to convey, with codes described in 2277719_Munson_PrePilot_surveydetails.pdf.  NAs are for people who answered "no" to the previous question, or who did not respond to the question.

Race_Category_text. For individuals who chose "other" to the Race_Category question, the open-ended response (if provided).