This readme.txt file was generated on 2022-10-10 by <Name>
Recommended citation for the data: Finestack, Lizbeth H.; Linert, Jamie; Hilliard, Lisa; Ancel, Elizabeth;  Kuchler, Kirstin; Matthys, Olivia. (2022). Verbs Matter: Verb Frequency and Phonological Complexity in Four Morphosyntactic Contexts. Retrieved from the Data Repository for the University of Minnesota. https://hdl.handle.net/11299/241882.

-------------------
GENERAL INFORMATION
-------------------

1. Title of Dataset: Verbs Matter: Verb Frequency and Phonological Complexity in Four Morphosyntactic Contexts

2. Author Information

	Author Contact:  Lizbeth H. Finestack (finestack@umn.edu)

	Name:  Lizbeth H. Finestack
	Institution: University of Minnesota, Twin Cities 
	Email: finestack@umn.edu
	ORCID: 0000-0002-5300-9282


	Name:  Jamie Linert
	Institution: University of Minnesota, Twin Cities 
	Email: liner004@umn.edu
	ORCID:


	Name:  Lisa Hilliard
	Institution: University of Minnesota, Twin Cities 
	Email: hilli096@umn.edu 
	ORCID:

	Name:  Elizabeth Ancel
	Institution: University of Minnesota, Twin Cities 
	Email: ancel014@umn.edu 
	ORCID:

	Name:  Kirstin Kuchler
	Institution: University of Minnesota, Twin Cities 
	Email: kuchl001@umn.edu 
	ORCID:


	Name:  Olivia Matthys
	Institution: University of Minnesota, Twin Cities 
	Email: matth312@umn.edu 
	ORCID:


3. Date published or finalized for release: 2022-10-06


4. Date of data collection (single date, range, approximate date): 2020-09-01 to 2021-12-31


5. Geographic location of data collection (where was data collected?): Minneapolis, MN


6. Information about funding sources that supported the collection of the data:
	This study was supported by funding from the National Institute of Deafness and Other Communication Disorders, R01 DC019374-01 awarded to L. H. Finestack and the Leadership Education in Neurodevelopmental and Other Disorders Training Program (LEND) awarded by the US Department of Health and Human Services Health Resources and Services Administration, T73MC12835-03-00 to A. Hewitt.


7. Overview of the data (abstract):
Research indicates that when teaching grammatical forms to children, the verbs used to model specific grammatical inflections matter. When learning grammatical forms, children have higher performance when they hear many unique verb forms that vary in their frequency and phonological complexity. This dataset includes verbs derived from the language samples of English-speaking children aged 5 to 8.9 years used in one of the following four contexts: regular past tense -ed, third person singular -s, is/are + verb+ing, and do/does questions. We ranked verbs based on frequency and phonological complexity using the Word Complexity Measure developed by Stoel-Gammon (2010). We used this data to identify verbs to use when assessing the grammatical skills of children and when providing interventions for the targeted forms.


--------------------------
SHARING/ACCESS INFORMATION
--------------------------

1. Licenses/restrictions placed on the data: Attribution-NonCommercial 3.0 United States (http://creativecommons.org/licenses/by-nc/3.0/us/)

2. Links to publications that cite or use the data:
Paper under development


3. Was data derived from another source?
	If yes, list source(s): CHILDES Talk Bank (https://childes.talkbank.org/) 

4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use


---------------------
DATA & FILE OVERVIEW
---------------------

File List

	A. Filename: Finestack_VerbsMatter_TalkBankDB_Transcripts.csv
	   Short Description: Databases of transcripts used for analysis
	B. Filename: Finestack_VerbsMatter_Verb_ed.csv
	   Short Description: CLAN output for past tense -ed verbs
	C. Filename: Finestack_VerbsMatter_Verb_ed_Frequency.csv
	   Short Description: Frequency of past tense -ed verbs
	D. Filename: Finestack_VerbsMatter_WCM-ed.csv
	   Short Description: Word Complexity Measure for past tense -ed verbs
	E. Filename: Finestack_VerbsMatter_Verb_3S.csv
	   Short Description: CLAN output for 3rd person singular verbs
	F. Filename: Finestack_VerbsMatter_Verb_3S_Frequency.csv
	   Short Description: Frequency of 3rd person singular verbs
	G. Filename: Finestack_VerbsMatter_WCM-3s.csv
	   Short Description: Word Complexity Measure for 3rd person singular verbs
	H. Filename: Finestack_VerbsMatter_IsAre_Verb_ing.csv
	   Short Description: CLAN output for is/are + verb-ing 
	I. Filename: Finestack_VerbsMatter_IsAre_Verb_ing_Frequency.csv
	   Short Description: Frequency of is/are + verb-ing 
	J. Filename: Finestack_VerbsMatter_WCM-ing.csv
	   Short Description: Word Complexity Measure for -ing verbs
	K. Filename: Finestack_VerbsMatter_DoDoes_Verb.csv
	   Short Description: CLAN output for do-does verbs
	L. Filename: Finestack_VerbsMatter_DoDoes_Verb_Frequency.csv
	   Short Description: Frequency of do-does verbs
	M. Filename: Finestack_VerbsMatter_WCM-DoDoes.csv
	   Short Description: Word Complexity Measure for do-does verbs


2. Relationship between files: All files support manuscript in preparation.


--------------------------
METHODOLOGICAL INFORMATION
--------------------------

1. Description of methods used for collection/generation of data:
We used the CHILDES Talk Bank (https://childes.talkbank.org/) database to determine the relative child verb frequency use for four targeted contexts (i.e., regular past tense -ed, third person singular -s, is/are + verb+ing, and do/does questions). We began by identifying all of the available transcripts in the CHILDES Talk Bank that included an English-speaking child between the ages of 5 and 8.9 years who was engaged in at least one of the following activities: playing with toys, telling stories, talking during mealtime, talking with other children, and/or other activities across the day. We selected these contexts to reflect child spontaneous language in natural settings, excluding contexts that potentially constrained the child’s language (e.g., describing actions in pictures, adult reading to child, child reading). 

2. Methods for processing the data: <describe how the submitted data were generated from the raw or collected data>
To identify the verbs children used in our specified contexts, we first used CLAN to create .cha files for each transcript. Next, we ran eight batches of CLAN code on the .cha files: two for each of our targeted contexts, including regular past tense -ed (freq +t*CHI +o +u +sm-PAST @), third person singular -s (freq +t*CHI +o +u +sm-3S @), is/are + verb+ing (freq +t*CHI +u +s"m;be |v" +s"m;be |part" +c7 +sm;*,o% +o @), and do/does questions (freq +t*CHI +u +s"m;do |sub |v" +s"m;do |pro:per |co" +s"m;do |pro:per |v" +c7 +sm;*,o% @). 

After running all eight transcript batches through CLAN, we rank ordered each verb list from most to least frequent and removed any errors (e.g., duplicated, irregular past, copulas, is/are in past tense). 

We used the Word Complexity Measure (WCM) developed by Stoel-Gammon (2010) to determine the phonological complexity of each verb in our corpora. 


3. Instrument- or software-specific information needed to interpret the data: None


4. Standards and calibration information, if appropriate: NA


5. Environmental/experimental conditions: NA


6. Describe any quality-assurance procedures performed on the data:
Prior to assigning the WCM points, trained undergraduate students phonetically transcribed each word using the International Phonetic Alphabet. A PhD-level graduate student reviewed all transcriptions and corrected any errors. Then, guided by the transcriptions, a PhD-level student assigned WCM points, which were checked by a trained graduate assistant. 


7. People involved with sample collection, processing, analysis and/or submission: All listed authors.


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_TalkBankDB_Transcripts.csv 
-----------------------------------------                                                                     rows:  886 
cols: 10 
	 A. Name: Database 
	    Description: CHILDES database from which transcript was located  

	 B. Name: Name 
	    Description:  Transcription file name

	 C. Name: Language 
	    Description:  File transcription language

	 D. Name: Format 
	    Description:  Format of original file, if available

	 E. Name: Date 
	    Description:  Date of language sample, if known

	 F. Name: Context 
	    Description:  Language sample context

	 G. Name: Child 
	    Description:  Child characteristics

	 H. Name: Set 1 
	    Description: Sample number in Set 1

	 I. Name: Set 2 
	    Description: Sample number in Set 2 


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_Verb_ed.csv 
----------------------------------------- 
                                                                    rows:  133 
cols: 7
 
	 A. Name: CLAN Output 
	    Description: CLAN verb tag

	 B. Name: Verb 
	    Description: Identified verb

	 C. Name: IPA with Inflection 
	    Description: Verb phonetically transcribed with -ed inflection

	 D. Name: Average Frequency Rating 
	    Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2

	 E. Name: Frequency Code 
	    Description: High = top half of verbs with highest frequency; Low = bottom half of verbs with lowest frequency 

	 F. Name: WCM Score 
	    Description: Word Complexity Measure Score 

	 G. Name: Complexity Code 
	    Description: High = top half of verbs with highest WCM; Low = bottom half of verbs with lowest WCM  


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_Verb_ed_Frequency.csv 
----------------------------------------- 
                                                                    rows:  129 
cols: 8 

	 A. Name: Set 1 Verb 
	    Description: CLAN verb tag  

	 B. Name: Set 1 Frequency 
	    Description: Number of verb occurrences in Set 1 

	 C. Name: Set 1 Ranking 
	    Description: Set 1 verb ranking with 1 being most frequently occurring verb 

	 D. Name: Set 2 Verb 
	    Description: Set 2 Frequency   

	 E. Name: Set 2 Frequency 
	    Description: Number of verb occurrences in Set 2   

	 F. Name: Set 2 Ranking 
	    Description: Set 2 verb ranking with 1 being most frequently occurring verb   

	 G. Name: Combined List 
	    Description: All verbs appearing in Sets 1 and 2  

	 H. Name: Average Frequency Rating 
	    Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2   


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_WCM-ed.csv 
----------------------------------------- 

IPA Reminders
Velars: /k, g, ŋ/
Unvoiced Fricatives: /f, θ, s, ʃ/
Voiced Fricatives: /v, ð, z, ʒ/
Unvoiced Affricate: /ʧ/
Voiced Affricate: /ʤ/
Liquid/Rhotics: /ɹ, j, l, w/                                                                 rows:  129 
cols: 12 

	 A. Name: Verb 
	    Description: CLAN verb tag   

	 B. Name: IPA 
	    Description: Verb phonetically transcribed

	 C. Name: IPA with Inflection 
	    Description: Verb phonetically transcribed with -ed inflection

	 D. Name: >2 Syllables 
	    Description: Greater than 2 syllables = 1 point 

	 E. Name: Stress 
	    Description: Stress on non-initial syllable = 1 point

	 F. Name: Cluster 
	    Description: 2+ sequential consonants w/in a syllable = 1 point per cluster 

	 G. Name: Final Consonant 
	    Description: Word ends with consonant = 1 point 

	 H. Name: Velar 
	    Description: 1 point per velar consonant

	 I. Name: Fricative, Affricate 
	    Description:  1 point per fricative or affricate

	 J. Name: Voiced Fric/Aff 
	    Description:  1 additional point per voiced fricative or affricate

	 K. Name: Liquid/Rhotic V 
	    Description: 1 point for each liquid, syllabic liquid, or rhotic vowel

	 L. Name: Total Points 
	    Description: Sum of all points awarded


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_Verb_3S.csv 
----------------------------------------- 
                                                                    rows:  108 
cols: 7 

	 A. Name: CLAN Output 
	    Description: CLAN verb tag  

	 B. Name: Verb 
	    Description: Identified verb 

	 C. Name: IPA with Inflection 
	    Description: Verb phonetically transcribed with -ed inflection  

	 D. Name: Average Frequency Rating 
	    Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2  

	 E. Name: Frequency Code 
	    Description: High = top half of verbs with highest frequency; Low = bottom half of verbs with lowest frequency   

	 F. Name: WCM Score 
	    Description: Word Complexity Measure Score   

	 G. Name: Complexity Code 
	    Description: High = top half of verbs with highest WCM; Low = bottom half of verbs with lowest WCM    


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_Verb_3S_Frequency.csv 
----------------------------------------- 
                                                                    rows:  108 
cols: 8 

	 A. Name: Set 1 Verb 
	    Description: CLAN verb tag   

	 B. Name: Set 1 Frequency 
	    Description: Number of verb occurrences in Set 1   

	 C. Name: Set 1 Ranking 
	    Description: Set 1 verb ranking with 1 being most frequently occurring verb   

	 D. Name: Set 2 Verb 
	    Description: CLAN verb tag  

	 E. Name: Set 2 Frequency 
	    Description: Number of verb occurrences in Set 2  

	 F. Name: Set 2 Ranking 
	    Description: Set 2 verb ranking with 1 being most frequently occurring verb   

	 G. Name: Combined List 
	    Description: All verbs appearing in Sets 1 and 2    

	 H. Name: Average Frequency Rating 
	    Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2   


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_WCM-3s.csv 
----------------------------------------- 

IPA Reminders
Velars: /k, g, ŋ/
Unvoiced Fricatives: /f, θ, s, ʃ/
Voiced Fricatives: /v, ð, z, ʒ/
Unvoiced Affricate: /ʧ/
Voiced Affricate: /ʤ/
Liquid/Rhotics: /ɹ, j, l, w/
rows:  108 
cols: 11 

	 A. Name: Verb 
	    Description: Identified verb   

	 B. Name: IPA 
	    Description: Verb phonetically transcribed with -s inflection 

	 C. Name: >2 Syllables 
	    Description:  Greater than 2 syllables = 1 point

	 D. Name: Stress 
	    Description: Stress on non-initial syllable = 1 point 

	 E. Name: Cluster 
	    Description:  2+ sequential consonants w/in a syllable = 1 point per cluster

	 F. Name: Final Consonant 
	    Description:  Word ends with consonant = 1 point

	 G. Name: Velar 
	    Description: 1 point per velar consonant 

	 H. Name: Fricative, Affricate 
	    Description: 1 point per fricative or affricate

	 I. Name: Voiced Fric/Aff 
	    Description: 1 additional point per voiced fricative or affricate

	 J. Name: Liquid/Rhotic V 
	    Description: 1 point for each liquid, syllabic liquid, or rhotic vowel

	 K. Name: Total Points 
	    Description: Sum of all points awarded  


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_IsAre_Verb_ing.csv 
----------------------------------------- 
                                                                    rows:  69 
cols: 7 

	 A. Name: CLAN Output 
	    Description: CLAN verb tag   

	 B. Name: Verb 
	    Description: Identified verb 

	 C. Name: IPA with Inflection 
	    Description: Verb phonetically transcribed with -ing inflection 

	 D. Name: Average Frequency Rating 
	    Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2   

	 E. Name: Frequency Code 
	    Description: High = top half of verbs with highest frequency; Low = bottom half of verbs with lowest frequency   

	 F. Name: WCM Score 
	    Description: Word Complexity Measure Score  

	 G. Name: Complexity Code 
	    Description:  High = top half of verbs with highest WCM; Low = bottom half of verbs with lowest WCM     


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_IsAre+Verb_ing_Frequency.csv 
----------------------------------------- 
                                                                rows:  69 
cols: 8 

	 A. Name: Set 1 Verb 
	    Description: CLAN verb tag   

	 B. Name: Set 1 Frequency 
	    Description: Number of verb occurrences in Set 1  

	 C. Name: Set 1 Ranking 
	    Description: Set 1 verb ranking with 1 being most frequently occurring verb   

	 D. Name: Set 2 Verb 
	    Description: CLAN verb tag   

	 E. Name: Set 2 Frequency 
	    Description: Number of verb occurrences in Set 2  

	 F. Name: Set 2 Ranking 
	    Description: Set 2 verb ranking with 1 being most frequently occurring verb   

	 G. Name: Combined List 
	    Description: All verbs appearing in Sets 1 and 2    

	 H. Name: Average Frequency Rating 
	    Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2     


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_WCM-ing.csv 
----------------------------------------- 

IPA Reminders
Velars: /k, g, ŋ/
Unvoiced Fricatives: /f, θ, s, ʃ/
Voiced Fricatives: /v, ð, z, ʒ/
Unvoiced Affricate: /ʧ/
Voiced Affricate: /ʤ/
Liquid/Rhotics: /ɹ, j, l, w/
rows:  69 
cols: 11 

	 A. Name: Verb 
	    Description: Identified verb    

	 B. Name: IPA with Inflection 
	    Description: Verb phonetically transcribed with inflection

	 C. Name: >2 Syllables 
	    Description: Greater than 2 syllables = 1 point

	 D. Name: Stress 
	    Description: Stress on non-initial syllable = 1 point 

	 E. Name: Cluster 
	    Description: 2+ sequential consonants w/in a syllable = 1 point per cluster

	 F. Name: Final Consonant 
	    Description: Word ends with consonant = 1 point

	 G. Name: Velar 
	    Description: 1 point per velar consonant

	 H. Name: Fricative, Affricate 
	    Description: 1 point per fricative or affricate

	 I. Name: Voiced Fric/Aff 
	    Description: 1 additional point per voiced fricative or affricate

	 J. Name: Liquid/Rhotic V 
	    Description: 1 point for each liquid, syllabic liquid, or rhotic vowel

	 K. Name: Total Points 
	    Description: Sum of all points awarded    


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_DoDoes_Verb.csv 
----------------------------------------- 
                                                                    rows:  16 
cols: 7 

	 A. Name: CLAN Output 
	    Description: CLAN verb tag   

	 B. Name: Verb 
	    Description: Identified verb    

	 C. Name: IPA 
	    Description: Verb phonetically transcribed  

	 D. Name: Average Frequency Rating 
	    Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2    

	 E. Name: Frequency Code 
	    Description: High = top half of verbs with highest frequency; Low = bottom half of verbs with lowest frequency  

	 F. Name: WCM Score 
	    Description: Word Complexity Measure Score  

	 G. Name: Complexity Code 
	    Description:  High = top half of verbs with highest WCM; Low = bottom half of verbs with lowest WCM     


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_DoDoes_Verb_Frequency.csv 
----------------------------------------- 
                                                                    rows:  16 
cols: 8 

	 A. Name: Set 1 Verb 
	    Description: CLAN verb tag   

	 B. Name: Set 1 Frequency 
	    Description: Number of verb occurrences in Set 1   

	 C. Name: Set 1 Ranking 
	    Description: Set 1 verb ranking with 1 being most frequently occurring verb   

	 D. Name: Set 2 Verb 
	    Description: CLAN verb tag   

	 E. Name: Set 2 Frequency 
	    Description: Number of verb occurrences in Set 2  

	 F. Name: Set 2 Ranking 
	    Description: Set 2 verb ranking with 1 being most frequently occurring verb   

	 G. Name: Combined List 
	    Description: All verbs appearing in Sets 1 and 2    

	 H. Name: Average Frequency Rating 
	    Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2     


----------------------------------------- 
DATA-SPECIFIC INFORMATION FOR:  Finestack_VerbsMatter_WCM-DoDoes.csv 
----------------------------------------- 

IPA Reminders
Velars: /k, g, ŋ/
Unvoiced Fricatives: /h, f, θ, s, ʃ/
Voiced Fricatives: /v, ð, z, ʒ/
Unvoiced Affricate: /ʧ/
Voiced Affricate: /ʤ/
Liquid/Rhotics: /ɹ, j, l, w/

rows:  16 
cols: 11 

	 A. Name: Verb 
	    Description: Identified verb    

	 B. Name: IPA 
	    Description: Verb phonetically transcribed  

	 C. Name: >2 Syllables 
	    Description:  Greater than 2 syllables = 1 point

	 D. Name: Stress 
	    Description: Stress on non-initial syllable = 1 point 

	 E. Name: Cluster 
	    Description: 2+ sequential consonants w/in a syllable = 1 point per cluster 

	 F. Name: Final Consonant 
	    Description:  Word ends with consonant = 1 point

	 G. Name: Velar 
	    Description: 1 point per velar consonant

	 H. Name: Fricative, Affricate 
	    Description:  1 point per fricative or affricate

	 I. Name: Voiced Fric/Aff 
	    Description:  1 additional point per voiced fricative or affricate

	 J. Name: Liquid/Rhotic V 
	    Description:  1 point for each liquid, syllabic liquid, or rhotic vowel

	 K. Name: Total Points 
	    Description: Sum of all points awarded