This readme.txt file was generated on 2022-10-10 by Recommended citation for the data: Finestack, Lizbeth H.; Linert, Jamie; Hilliard, Lisa; Ancel, Elizabeth; Kuchler, Kirstin; Matthys, Olivia. (2022). Verbs Matter: Verb Frequency and Phonological Complexity in Four Morphosyntactic Contexts. Retrieved from the Data Repository for the University of Minnesota. https://hdl.handle.net/11299/241882. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Verbs Matter: Verb Frequency and Phonological Complexity in Four Morphosyntactic Contexts 2. Author Information Author Contact: Lizbeth H. Finestack (finestack@umn.edu) Name: Lizbeth H. Finestack Institution: University of Minnesota, Twin Cities Email: finestack@umn.edu ORCID: 0000-0002-5300-9282 Name: Jamie Linert Institution: University of Minnesota, Twin Cities Email: liner004@umn.edu ORCID: Name: Lisa Hilliard Institution: University of Minnesota, Twin Cities Email: hilli096@umn.edu ORCID: Name: Elizabeth Ancel Institution: University of Minnesota, Twin Cities Email: ancel014@umn.edu ORCID: Name: Kirstin Kuchler Institution: University of Minnesota, Twin Cities Email: kuchl001@umn.edu ORCID: Name: Olivia Matthys Institution: University of Minnesota, Twin Cities Email: matth312@umn.edu ORCID: 3. Date published or finalized for release: 2022-10-06 4. Date of data collection (single date, range, approximate date): 2020-09-01 to 2021-12-31 5. Geographic location of data collection (where was data collected?): Minneapolis, MN 6. Information about funding sources that supported the collection of the data: This study was supported by funding from the National Institute of Deafness and Other Communication Disorders, R01 DC019374-01 awarded to L. H. Finestack and the Leadership Education in Neurodevelopmental and Other Disorders Training Program (LEND) awarded by the US Department of Health and Human Services Health Resources and Services Administration, T73MC12835-03-00 to A. Hewitt. 7. Overview of the data (abstract): Research indicates that when teaching grammatical forms to children, the verbs used to model specific grammatical inflections matter. When learning grammatical forms, children have higher performance when they hear many unique verb forms that vary in their frequency and phonological complexity. This dataset includes verbs derived from the language samples of English-speaking children aged 5 to 8.9 years used in one of the following four contexts: regular past tense -ed, third person singular -s, is/are + verb+ing, and do/does questions. We ranked verbs based on frequency and phonological complexity using the Word Complexity Measure developed by Stoel-Gammon (2010). We used this data to identify verbs to use when assessing the grammatical skills of children and when providing interventions for the targeted forms. -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: Attribution-NonCommercial 3.0 United States (http://creativecommons.org/licenses/by-nc/3.0/us/) 2. Links to publications that cite or use the data: Paper under development 3. Was data derived from another source? If yes, list source(s): CHILDES Talk Bank (https://childes.talkbank.org/) 4. Terms of Use: Data Repository for the U of Minnesota (DRUM) By using these files, users agree to the Terms of Use. https://conservancy.umn.edu/pages/drum/policies/#terms-of-use --------------------- DATA & FILE OVERVIEW --------------------- File List A. Filename: Finestack_VerbsMatter_TalkBankDB_Transcripts.csv Short Description: Databases of transcripts used for analysis B. Filename: Finestack_VerbsMatter_Verb_ed.csv Short Description: CLAN output for past tense -ed verbs C. Filename: Finestack_VerbsMatter_Verb_ed_Frequency.csv Short Description: Frequency of past tense -ed verbs D. Filename: Finestack_VerbsMatter_WCM-ed.csv Short Description: Word Complexity Measure for past tense -ed verbs E. Filename: Finestack_VerbsMatter_Verb_3S.csv Short Description: CLAN output for 3rd person singular verbs F. Filename: Finestack_VerbsMatter_Verb_3S_Frequency.csv Short Description: Frequency of 3rd person singular verbs G. Filename: Finestack_VerbsMatter_WCM-3s.csv Short Description: Word Complexity Measure for 3rd person singular verbs H. Filename: Finestack_VerbsMatter_IsAre_Verb_ing.csv Short Description: CLAN output for is/are + verb-ing I. Filename: Finestack_VerbsMatter_IsAre_Verb_ing_Frequency.csv Short Description: Frequency of is/are + verb-ing J. Filename: Finestack_VerbsMatter_WCM-ing.csv Short Description: Word Complexity Measure for -ing verbs K. Filename: Finestack_VerbsMatter_DoDoes_Verb.csv Short Description: CLAN output for do-does verbs L. Filename: Finestack_VerbsMatter_DoDoes_Verb_Frequency.csv Short Description: Frequency of do-does verbs M. Filename: Finestack_VerbsMatter_WCM-DoDoes.csv Short Description: Word Complexity Measure for do-does verbs 2. Relationship between files: All files support manuscript in preparation. -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: We used the CHILDES Talk Bank (https://childes.talkbank.org/) database to determine the relative child verb frequency use for four targeted contexts (i.e., regular past tense -ed, third person singular -s, is/are + verb+ing, and do/does questions). We began by identifying all of the available transcripts in the CHILDES Talk Bank that included an English-speaking child between the ages of 5 and 8.9 years who was engaged in at least one of the following activities: playing with toys, telling stories, talking during mealtime, talking with other children, and/or other activities across the day. We selected these contexts to reflect child spontaneous language in natural settings, excluding contexts that potentially constrained the child’s language (e.g., describing actions in pictures, adult reading to child, child reading). 2. Methods for processing the data: To identify the verbs children used in our specified contexts, we first used CLAN to create .cha files for each transcript. Next, we ran eight batches of CLAN code on the .cha files: two for each of our targeted contexts, including regular past tense -ed (freq +t*CHI +o +u +sm-PAST @), third person singular -s (freq +t*CHI +o +u +sm-3S @), is/are + verb+ing (freq +t*CHI +u +s"m;be |v" +s"m;be |part" +c7 +sm;*,o% +o @), and do/does questions (freq +t*CHI +u +s"m;do |sub |v" +s"m;do |pro:per |co" +s"m;do |pro:per |v" +c7 +sm;*,o% @). After running all eight transcript batches through CLAN, we rank ordered each verb list from most to least frequent and removed any errors (e.g., duplicated, irregular past, copulas, is/are in past tense). We used the Word Complexity Measure (WCM) developed by Stoel-Gammon (2010) to determine the phonological complexity of each verb in our corpora. 3. Instrument- or software-specific information needed to interpret the data: None 4. Standards and calibration information, if appropriate: NA 5. Environmental/experimental conditions: NA 6. Describe any quality-assurance procedures performed on the data: Prior to assigning the WCM points, trained undergraduate students phonetically transcribed each word using the International Phonetic Alphabet. A PhD-level graduate student reviewed all transcriptions and corrected any errors. Then, guided by the transcriptions, a PhD-level student assigned WCM points, which were checked by a trained graduate assistant. 7. People involved with sample collection, processing, analysis and/or submission: All listed authors. ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_TalkBankDB_Transcripts.csv ----------------------------------------- rows: 886 cols: 10 A. Name: Database Description: CHILDES database from which transcript was located B. Name: Name Description: Transcription file name C. Name: Language Description: File transcription language D. Name: Format Description: Format of original file, if available E. Name: Date Description: Date of language sample, if known F. Name: Context Description: Language sample context G. Name: Child Description: Child characteristics H. Name: Set 1 Description: Sample number in Set 1 I. Name: Set 2 Description: Sample number in Set 2 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_Verb_ed.csv ----------------------------------------- rows: 133 cols: 7 A. Name: CLAN Output Description: CLAN verb tag B. Name: Verb Description: Identified verb C. Name: IPA with Inflection Description: Verb phonetically transcribed with -ed inflection D. Name: Average Frequency Rating Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2 E. Name: Frequency Code Description: High = top half of verbs with highest frequency; Low = bottom half of verbs with lowest frequency F. Name: WCM Score Description: Word Complexity Measure Score G. Name: Complexity Code Description: High = top half of verbs with highest WCM; Low = bottom half of verbs with lowest WCM ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_Verb_ed_Frequency.csv ----------------------------------------- rows: 129 cols: 8 A. Name: Set 1 Verb Description: CLAN verb tag B. Name: Set 1 Frequency Description: Number of verb occurrences in Set 1 C. Name: Set 1 Ranking Description: Set 1 verb ranking with 1 being most frequently occurring verb D. Name: Set 2 Verb Description: Set 2 Frequency E. Name: Set 2 Frequency Description: Number of verb occurrences in Set 2 F. Name: Set 2 Ranking Description: Set 2 verb ranking with 1 being most frequently occurring verb G. Name: Combined List Description: All verbs appearing in Sets 1 and 2 H. Name: Average Frequency Rating Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_WCM-ed.csv ----------------------------------------- IPA Reminders Velars: /k, g, ŋ/ Unvoiced Fricatives: /f, θ, s, ʃ/ Voiced Fricatives: /v, ð, z, ʒ/ Unvoiced Affricate: /ʧ/ Voiced Affricate: /ʤ/ Liquid/Rhotics: /ɹ, j, l, w/ rows: 129 cols: 12 A. Name: Verb Description: CLAN verb tag B. Name: IPA Description: Verb phonetically transcribed C. Name: IPA with Inflection Description: Verb phonetically transcribed with -ed inflection D. Name: >2 Syllables Description: Greater than 2 syllables = 1 point E. Name: Stress Description: Stress on non-initial syllable = 1 point F. Name: Cluster Description: 2+ sequential consonants w/in a syllable = 1 point per cluster G. Name: Final Consonant Description: Word ends with consonant = 1 point H. Name: Velar Description: 1 point per velar consonant I. Name: Fricative, Affricate Description: 1 point per fricative or affricate J. Name: Voiced Fric/Aff Description: 1 additional point per voiced fricative or affricate K. Name: Liquid/Rhotic V Description: 1 point for each liquid, syllabic liquid, or rhotic vowel L. Name: Total Points Description: Sum of all points awarded ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_Verb_3S.csv ----------------------------------------- rows: 108 cols: 7 A. Name: CLAN Output Description: CLAN verb tag B. Name: Verb Description: Identified verb C. Name: IPA with Inflection Description: Verb phonetically transcribed with -ed inflection D. Name: Average Frequency Rating Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2 E. Name: Frequency Code Description: High = top half of verbs with highest frequency; Low = bottom half of verbs with lowest frequency F. Name: WCM Score Description: Word Complexity Measure Score G. Name: Complexity Code Description: High = top half of verbs with highest WCM; Low = bottom half of verbs with lowest WCM ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_Verb_3S_Frequency.csv ----------------------------------------- rows: 108 cols: 8 A. Name: Set 1 Verb Description: CLAN verb tag B. Name: Set 1 Frequency Description: Number of verb occurrences in Set 1 C. Name: Set 1 Ranking Description: Set 1 verb ranking with 1 being most frequently occurring verb D. Name: Set 2 Verb Description: CLAN verb tag E. Name: Set 2 Frequency Description: Number of verb occurrences in Set 2 F. Name: Set 2 Ranking Description: Set 2 verb ranking with 1 being most frequently occurring verb G. Name: Combined List Description: All verbs appearing in Sets 1 and 2 H. Name: Average Frequency Rating Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_WCM-3s.csv ----------------------------------------- IPA Reminders Velars: /k, g, ŋ/ Unvoiced Fricatives: /f, θ, s, ʃ/ Voiced Fricatives: /v, ð, z, ʒ/ Unvoiced Affricate: /ʧ/ Voiced Affricate: /ʤ/ Liquid/Rhotics: /ɹ, j, l, w/ rows: 108 cols: 11 A. Name: Verb Description: Identified verb B. Name: IPA Description: Verb phonetically transcribed with -s inflection C. Name: >2 Syllables Description: Greater than 2 syllables = 1 point D. Name: Stress Description: Stress on non-initial syllable = 1 point E. Name: Cluster Description: 2+ sequential consonants w/in a syllable = 1 point per cluster F. Name: Final Consonant Description: Word ends with consonant = 1 point G. Name: Velar Description: 1 point per velar consonant H. Name: Fricative, Affricate Description: 1 point per fricative or affricate I. Name: Voiced Fric/Aff Description: 1 additional point per voiced fricative or affricate J. Name: Liquid/Rhotic V Description: 1 point for each liquid, syllabic liquid, or rhotic vowel K. Name: Total Points Description: Sum of all points awarded ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_IsAre_Verb_ing.csv ----------------------------------------- rows: 69 cols: 7 A. Name: CLAN Output Description: CLAN verb tag B. Name: Verb Description: Identified verb C. Name: IPA with Inflection Description: Verb phonetically transcribed with -ing inflection D. Name: Average Frequency Rating Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2 E. Name: Frequency Code Description: High = top half of verbs with highest frequency; Low = bottom half of verbs with lowest frequency F. Name: WCM Score Description: Word Complexity Measure Score G. Name: Complexity Code Description: High = top half of verbs with highest WCM; Low = bottom half of verbs with lowest WCM ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_IsAre+Verb_ing_Frequency.csv ----------------------------------------- rows: 69 cols: 8 A. Name: Set 1 Verb Description: CLAN verb tag B. Name: Set 1 Frequency Description: Number of verb occurrences in Set 1 C. Name: Set 1 Ranking Description: Set 1 verb ranking with 1 being most frequently occurring verb D. Name: Set 2 Verb Description: CLAN verb tag E. Name: Set 2 Frequency Description: Number of verb occurrences in Set 2 F. Name: Set 2 Ranking Description: Set 2 verb ranking with 1 being most frequently occurring verb G. Name: Combined List Description: All verbs appearing in Sets 1 and 2 H. Name: Average Frequency Rating Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_WCM-ing.csv ----------------------------------------- IPA Reminders Velars: /k, g, ŋ/ Unvoiced Fricatives: /f, θ, s, ʃ/ Voiced Fricatives: /v, ð, z, ʒ/ Unvoiced Affricate: /ʧ/ Voiced Affricate: /ʤ/ Liquid/Rhotics: /ɹ, j, l, w/ rows: 69 cols: 11 A. Name: Verb Description: Identified verb B. Name: IPA with Inflection Description: Verb phonetically transcribed with inflection C. Name: >2 Syllables Description: Greater than 2 syllables = 1 point D. Name: Stress Description: Stress on non-initial syllable = 1 point E. Name: Cluster Description: 2+ sequential consonants w/in a syllable = 1 point per cluster F. Name: Final Consonant Description: Word ends with consonant = 1 point G. Name: Velar Description: 1 point per velar consonant H. Name: Fricative, Affricate Description: 1 point per fricative or affricate I. Name: Voiced Fric/Aff Description: 1 additional point per voiced fricative or affricate J. Name: Liquid/Rhotic V Description: 1 point for each liquid, syllabic liquid, or rhotic vowel K. Name: Total Points Description: Sum of all points awarded ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_DoDoes_Verb.csv ----------------------------------------- rows: 16 cols: 7 A. Name: CLAN Output Description: CLAN verb tag B. Name: Verb Description: Identified verb C. Name: IPA Description: Verb phonetically transcribed D. Name: Average Frequency Rating Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2 E. Name: Frequency Code Description: High = top half of verbs with highest frequency; Low = bottom half of verbs with lowest frequency F. Name: WCM Score Description: Word Complexity Measure Score G. Name: Complexity Code Description: High = top half of verbs with highest WCM; Low = bottom half of verbs with lowest WCM ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_DoDoes_Verb_Frequency.csv ----------------------------------------- rows: 16 cols: 8 A. Name: Set 1 Verb Description: CLAN verb tag B. Name: Set 1 Frequency Description: Number of verb occurrences in Set 1 C. Name: Set 1 Ranking Description: Set 1 verb ranking with 1 being most frequently occurring verb D. Name: Set 2 Verb Description: CLAN verb tag E. Name: Set 2 Frequency Description: Number of verb occurrences in Set 2 F. Name: Set 2 Ranking Description: Set 2 verb ranking with 1 being most frequently occurring verb G. Name: Combined List Description: All verbs appearing in Sets 1 and 2 H. Name: Average Frequency Rating Description: Verb rank of Set 1 + Verb rank of Set 2, divided by 2 ----------------------------------------- DATA-SPECIFIC INFORMATION FOR: Finestack_VerbsMatter_WCM-DoDoes.csv ----------------------------------------- IPA Reminders Velars: /k, g, ŋ/ Unvoiced Fricatives: /h, f, θ, s, ʃ/ Voiced Fricatives: /v, ð, z, ʒ/ Unvoiced Affricate: /ʧ/ Voiced Affricate: /ʤ/ Liquid/Rhotics: /ɹ, j, l, w/ rows: 16 cols: 11 A. Name: Verb Description: Identified verb B. Name: IPA Description: Verb phonetically transcribed C. Name: >2 Syllables Description: Greater than 2 syllables = 1 point D. Name: Stress Description: Stress on non-initial syllable = 1 point E. Name: Cluster Description: 2+ sequential consonants w/in a syllable = 1 point per cluster F. Name: Final Consonant Description: Word ends with consonant = 1 point G. Name: Velar Description: 1 point per velar consonant H. Name: Fricative, Affricate Description: 1 point per fricative or affricate I. Name: Voiced Fric/Aff Description: 1 additional point per voiced fricative or affricate J. Name: Liquid/Rhotic V Description: 1 point for each liquid, syllabic liquid, or rhotic vowel K. Name: Total Points Description: Sum of all points awarded