Cognitive Processing Language in Communities of Inquiry: An Examination of Cognitive Presence, Instruction Modality, and Academic Performance of Online Learners A THESIS SUBMITTED TO THE FACULTY OF UNIVERSITY OF MINNESOTA BY Samuel Bullard IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS Keisha Varma, Ph.D., Advisor May 2024 © Samuel Bullard, 2024 i Acknowledgements I would first like to acknowledge my advisor, Dr. Keisha Varma, who supported and guided me throughout the process of writing this thesis. Next, I want to express my gratitude to my other committee members and thesis reviewers, Dr. Panayiota Kendeou and Dr. Seth Thompson, for dedicating their time to provide me with valuable feedback and encouragement. I would also like to thank Dr. Martin Van Boekel and Beryl Belmonte, whose assistance during the early stages of this project was instrumental in making this work possible. Lastly, I want to thank my family, who provided me with enduring support throughout this journey. ii Abstract Within the Community of Inquiry framework, the cognitive presence of learners represents a critical feature of the knowledge construction process in online settings. In this study, we analyzed 1852 thematic units of college students’ online discussion posts using both manual coding of cognitive presence and automated linguistic analysis of cognitive processing language. Following these preliminary analyses, we developed a series of regression models to examine relations between cognitive presence, instruction modality, academic performance, and cognitive processing language. We compared four multilevel models using phases of cognitive presence and instruction modality as predictors of cognitive processing language. The final model comprised all four cognitive presence phases, excluding instruction modality. We also found an effect of this linguistic proxy measure on students’ academic performance. These findings have both practical and theoretical implications for future use of surface-level linguistic proxies to assess student learning in online discourse-based learning environments. iii Table of Contents Acknowledgements ............................................................................................................ i Abstract .............................................................................................................................. ii Table of Contents ............................................................................................................. iii List of Tables .................................................................................................................... iv Chapter 1: Introduction ................................................................................................... 1 Chapter 2: Literature Review .......................................................................................... 3 Online Communities of Inquiry ...................................................................................... 3 Cognitive Presence .......................................................................................................... 4 Discourse-Centric Learning Analytics ............................................................................ 6 Chapter 3: Current Study .............................................................................................. 11 Chapter 4: Method .......................................................................................................... 12 Participants .................................................................................................................... 12 Materials and Data Collection ....................................................................................... 12 Study Procedure ............................................................................................................ 14 Chapter 5: Results........................................................................................................... 20 Research Question 1 ...................................................................................................... 20 Research Question 2 ...................................................................................................... 23 Research Question 3 ...................................................................................................... 24 Chapter 6: Discussion ..................................................................................................... 26 Cognitive Presence Model ............................................................................................ 26 Instruction Modality ...................................................................................................... 28 Academic Performance ................................................................................................. 29 Limitations .................................................................................................................... 30 Conclusion ..................................................................................................................... 31 References ........................................................................................................................ 33 Appendix A ...................................................................................................................... 38 Appendix B ...................................................................................................................... 39 iv List of Tables Table 1. Sampled text from online discussion threads in two undergraduate college classes. .............................................................................................................................. 17 Table 2. Cognitive presence coding system used in manual content analysis. ................. 18 Table 3. Cognitive processing subcategories and example target words. ......................... 19 Table 4. Descriptive statistics of cognitive processing language for each phase of cognitive presence. ............................................................................................................ 20 Table 5. Coefficients and standard errors for four candidate models predicting use of cognitive processing language in online discussion posts of undergraduate college students. ............................................................................................................................ 22 Table 6. Estimates of a model predicting students’ final term paper scores based on cognitive processing words used in online learning discourse. ........................................ 25 1 Chapter 1: Introduction Although student enrollment in blended or online course modalities has been a noted trend in higher education for years (Hew & Cheung, 2014), the compulsory shift to virtual learning during the COVID-19 pandemic magnified the need for pedagogical tools which can facilitate a level of student engagement analogous to in-person instruction (Adedoyin & Soykan, 2020; Crawford et al., 2020). One instrument for achieving this goal is instructors' use of asynchronous online discussion forums, in which students post written messages reflecting on learning material and interact with their classmates. However, the current research literature presents contradictory findings regarding the medium's effectiveness in promoting the cognitive skills (e.g., critical thinking, reflective inquiry, etc.) necessary for reaching satisfactory levels of academic performance. On the one hand, some studies suggest student learning is enhanced in these forums via the co-construction of knowledge with peers (De Wever et al., 2006; Galikyan & Admiraal, 2019; Pena-Shaff & Nicholls, 2004). On the other hand, findings from other studies suggest that these interactions rarely go beyond a surface-level exchange of information (Al-Husban, 2020; Garrison, 2007; Garrison & Cleveland-Innes, 2005; Tan & Ng, 2014). This lack of consensus holds implications for both the research and instructional practices in higher education. In the present study, we aimed to clarify these discrepancies by developing a model which provides empirical support for the identification of higher-order cognition in online discourse based on discrete, surface-level linguistic information. In addition, we compared the cognitive processing language used in discussion posts by students enrolled in the same college course, differentiated solely by the modality of its instructional 2 content delivery (i.e., blended vs. entirely online). Finally, we modeled ethe association between these cognitive processes and students' academic performance in the course. To do so, we integrated two distinct theoretical and methodological approaches: the Community of Inquiry (CoI) Framework (Garrison et al., 1999) and Learning Analytics (Siemens, 2013). It was our expectation that integrating these two approaches will result in a more comprehensive understanding of the cognitive processes that are facilitated in online learning discourse. 3 Chapter 2: Literature Review The educational research literature has consistently supported the notion that distance education is at least as effective in supporting positive student learning outcomes compared to traditional, face-to-face instruction (Siemens et al., 2015). However, distance education is not a singularly defined set of instructional practices. Online mediums, like traditional classrooms, can create many different forms of learning opportunities. Provided the sufficient technological support, instructors may teach their courses entirely online using a variety of asynchronous (e.g., online discussion forums, pre-recorded lectures) or synchronous (e.g., live video conferencing software) resources. Alternatively, instructors might also consider combining elements of both online and face-to-face learning (i.e., blended learning; Siemens et al., 2015). Meta-analyses of studies comparing traditional, blended, and online learning have shown that, on average, student learning outcomes are enhanced by courses using blended modes of instruction, compared to those which are either only face-to-face or online (Bernard et al., 2014; Means et al., 2013; Zhao et al., 2005). Online Communities of Inquiry The Community of Inquiry framework (CoI) has persisted as a foundational theoretical guide for research on online teaching and learning. This framework was originally developed around the turn of the 21st century, a time when higher education saw greater pedagogical application for the 'world wide web' and computer-mediated conferencing technology (Garrison et al., 1999). The CoI framework describes three interrelated elements inherent to successful computer-mediated education: (1) cognitive presence, defined as the ability for learners 4 to construct meaning via online communication (Garrison et al., 2001), (2) social presence, or the capacity for a computer-mediated learning environment to foster social and emotional connections (Garrison, 2007), and (3) teaching presence, characterized as pedagogical elements such as the curricular design or facilitation of instructional strategies within a virtual educational experience (Anderson et al., 2001). Some proponents of the CoI framework have also advocated for the inclusion of a fourth element reflecting the self-regulative behaviors inherent to online learning (i.e. “learning presence”; Shea et al., 2014; Shea & Bidjerano, 2010, 2012). However, the integration of this construct is somewhat contested given its conceptual similarities with elements of cognitive presence and failure to accommodate the collaborative nature of co- constructing knowledge (Garrison & Akyol, 2013). Thus, the inclusion of learner presence as a unique component in the CoI framework has not yet been broadly recognized by the empirical literature. Cognitive Presence Cognitive presence serves as the primary theoretical construct within the CoI framework for describing the cognitive processes required for text-based collaborative learning environments to engage learners in critical thinking and inquiry (Garrison et al., 2001). Garrison (2007) characterized cognitive presence as "a cycle of practical inquiry where participants move deliberately from understanding the problem or issue through to exploration, integration and application" (p. 65). Research has consistently shown that the cognitive presence of learners within an online community is critical to their perceived learning, academic performance, and overall satisfaction with their online educational experience (Akyol & Garrison, 2011; Galikyan & Admiraal, 2019; Van Wart et al., 5 2020). Given this influence, both education researchers and practitioners alike have vested interest in developing means for evaluating the degree to which learners are cognitively present during their online courses. Traditional methods for studying cognitive presence within text-based discourse commonly include manual content analysis (De Wever et al., 2006; Garrison et al., 1999; Kovanović et al., 2016), a method which Krippendorff (2003) described as "a research technique for making replicable and valid inferences from texts (or other meaningful matter) to the contexts of their use" (p. 18). To make such inferences regarding online learners' cognitive presence, the CoI framework uses the Practical Inquiry model to guide this process (Garrison et al., 1999). Inspired by Dewey (1910), this model describes cognitive presence in terms of a progression through four phases of inquiry: 1. Triggering Events, in which an issue or problem is identified causing some unease among learners; 2. Exploration, where students exchange relevant information or ideas related to the object of inquiry; 3. Integration, or the synthesis of ideas deliberated on in the exploration phase; 4. Resolution of the problem via hypothesis testing or application of ideas to new domains. Each of these four phases comprise the Practical Inquiry model and are described by a set of indicators to facilitate the coding process (see Table 2). Since the original publication of the CoI framework, the growth of technology and increasing ubiquity of the internet has made online interaction a pervasive feature of our personal, professional, and social lives – a development which has undoubtedly had 6 far-reaching implications for research on online learning. While employing manual content analysis may have been workable for early CoI research, the unprecedented amount of unstructured data (i.e., written text) made available to researchers because of widespread internet adoption present critical challenges to the framework's methodological feasibility in the current digital age (Boyd & Pennebaker, 2015). These concerns can be characterized in a variety of ways. For one, coding data by hand is a time-consuming and labor-intensive process, especially for large sets of data (Kovanović et al., 2016). In addition, manual coding largely depends on the interpretations of individual coders, making generalization across studies of cognitive presence difficult to assess (De Wever et al., 2006). Thus, despite the method offering a strong theoretical foundation for understanding cognitive presence in online communities, it is most often restricted to the retroactive analyses of research teams, limited sample sizes, and of minimal practical utility for instructors – resulting in a framework constrained in its ability to inform instructional design and practice (Donnelly & Gardner, 2011; Kovanović et al., 2016). Such methodological limitations require researchers to integrate new analytic approaches which allow for larger-scale analyses of the development of cognitive presence in online learning discourse. Discourse-Centric Learning Analytics The field of Discourse-Centric Learning Analytics focuses on the question of how discursive learning processes might be measured and eventually improved in online learning contexts (Knight & Littleton, 2015). Inspired by Vygotsky's (1962, 1978) emphasis on the relationship between language and thought, this discipline places particular importance on the linguistic activity of learners engaged in discourse, arguing 7 that such activity "is not merely an indicator (or proxy) for deeper learning, [rather] it is often the site of that learning" (Knight & Littleton, 2015, p. 187). Given this sentiment, research on online learning has gradually shifted attention towards the automated processing of natural language features embedded within online discourse. Natural Language Processing (NLP) is broadly defined as a computational approach for analyzing and interpreting human languages (Chowdhury, 2003). With the massive corpus of text data provided by the world wide web, NLP techniques are especially useful in performing various functions, such as information retrieval/indexing, text classification, translation, knowledge acquisition, and others (Chowdhary, 2020; Chowdhury, 2003). These analyses can vary in complexity, ranging from simpler, rule- based language processing to more advanced machine learning systems. Although, like manual content analysis, one of the primary functions of these techniques is to extract semantic information from learning content (such as discussion post transcripts), it differs in its use of automation and computational capabilities (Hoppe, 2017). The Language Inquiry and Word Count software (LIWC) is one example of a popular dictionary-based NLP tool (Pennebaker et al., 2015). The LIWC software classifies linguistic activity into various psychological categories by comparing words in a text input with a validated internal dictionary, identifying "target" words which are then quantified in relation to the overall text input. In contrast to manual content analysis, these analyses are automated, and therefore can be conducted with minimal user effort. On the front-end of the software, users simply upload a file containing their text input, select the desired word categories for the program to analyze, and readily available numerical data is provided within a matter of seconds. As a result, this analytic software 8 might be a feasible option for educators seeking to monitor the level of cognitive presence of their students in real-time, or for researchers studying online Communities of Inquiry on a larger scale than previously thought possible. Several studies have used LIWC measures in their examination of student learning outcomes. In their analysis of college admissions essays, Pennebaker et al. (2014) found strong associations between students' use of function words (e.g., articles, prepositions, etc.) and their college GPA. Similar associations have been found between the linguistic features comprising student's written introductions of themselves and their academic performance in their college coursework (Robinson et al., 2013). While informative, these studies sampled longer, less discursive forms of student writing from face-to-face learning environments, leading one to question how such a tool might be appropriated for the analysis of language in online, discussion-based learning environments. Further, Kovanović et al. (2014, 2016) demonstrated the potential of using these linguistic features for automated detection of cognitive presence phases using more computationally complex techniques such as random forests classification. Such efforts have since been expanded upon in subsequent literature, resulting in classification models with an appreciable degree of accuracy (Hayati et al., 2019, 2020; Hind et al., 2018; Neto et al., 2018, 2021). These combined findings suggest that particular LIWC measures may be promising candidates to further explore the cognitive presence of online learners, and perhaps eventually expand instructor's ability to evaluate the success of their online discussion-based courses to facilitate these learning processes in real-time. However, to reach this degree of accuracy, such methods required a substantial 9 collection of linguistic features to train the classification algorithm – many features which, on a conceptual level, appear wholly unrelated to the targeted construct (i.e., cognitive presence phase) resulting in theoretically questionable interpretations. To operationalize the cognitive presence construct with linguistic proxies, and therefore open the possibility of automated assessment, greater confidence in the construct validity of such proxies must be established. Thus, research aimed at identifying linguistic proxies of cognitive presence might benefit from a priori selection of model features determined by their theoretical alignment with the targeted construct. This perspective was exemplified by Joksimović et al. (2014), who identified LIWC features which are both conceptually and empirically affiliated with an individual’s cognitive processing in online discourse. In this study, researchers selected several LIWC features thought to be associated with increased cognitive load (e.g., exclusive, causal words, etc.) and evaluated the prevalence of these features in online discussion posts relative to the students' observed phase of cognitive presence. As hypothesized, Joksimović et al. (2014) observed higher concentration of these words as students progressed toward the latter phases of cognitive presence (i.e., integration and resolution). Shortly following the publication of Joksimović et al.'s (2014) findings, an updated version of the language inquiry software was released (LIWC2015). This update expanded the tool's internal dictionary to introduce several novel linguistic features, including the cognitive processing category. Consistent with many of the features analyzed by Joksimović et al. (2014), cognitive processing is a composite of several word subcategories related to an individual's verbal expression of insight, causation, 10 discrepancy, tentativeness, certainty, and differentiation (Pennebaker et al., 2015; see Table 3). Yet aside from one isolated recent study (Moore et al., 2019), the cognitive presence literature has lacked a specific focus on this new linguistic feature since its addition to LIWC dictionaries. Additionally, as the technologies for large-scale automated language processing become more advanced, several additional questions remain regarding its application of the CoI framework, including its relationships between specific instruction modalities and learning outcomes, prompting the present study. 11 Chapter 3: Current Study The present study sought to address three primary questions regarding the role of linguistic features within an undergraduate-level online learning environment. Specifically, we asked the following: 1. To what extent are the linguistic features of student discussion posts, defined as cognitive processes, viable indicators of cognitive presence phases as described in the CoI framework? 2. To what extent does variation in the instruction modality (blended vs. fully online) of a college course influence student's cognitive processing language in online discussion forums? 3. To what extent does cognitive processing language influence students' academic performance in an online course? To answer these questions, we used a combination of manual content analysis and automated natural language processing, techniques which are further described in following chapters. 12 Chapter 4: Method In this section, we describe the research context and sample used in this study, the procedures for analyzing this data, as well as statistical procedures used to model relationships between variables. Participants At a large research university in the Midwestern U.S., two classes of undergraduate college students (N = 53) enrolled in an introductory course covering the theoretical and methodological foundations of Educational Psychology. Students voluntarily enrolled in this course to receive academic credit. These student participants had self-selected into two separate offerings of this course, one of which was held in the Fall Semester of 2019 ('Semester A'; n = 33), while the other was held in the Fall Semester of 2020 ('Semester B'; n = 20). Although the course materials and instructor remained consistent between the two semesters, the mode of instruction differed because of public health restrictions which arose from the COVID-19 pandemic. Semester A was held in a blended learning format with face-to-face lectures and asynchronous online discussion. Semester B was held in a fully online format, with lectures delivered over web conferencing software and, like Semester A, included students taking part in asynchronous online discussions. Materials and Data Collection The following sections detail the materials and data collection process used in this study. All raw data (i.e., discussion post transcripts, student grades) were collected from the online learning management system used by the instructor of both semesters of the course. Note that all data were collected with a waiver of consent because the local 13 Institutional Review Board deemed this study to be of minimal risk to participants. Thus, specific demographic information of the study participants could not be collected. Discussion Posts In both offerings of this course (regardless of modality), the instructor assigned weekly required readings for students to reflect on with their peers in the asynchronous online discussion forum. These readings consisted of theoretical or empirical research articles which held considerable influence in the field of educational psychology. Each week, students were expected to contribute a minimum of a one paragraph-long discussion post describing their thoughts about the article. To encourage students' critical thinking skills, the instructor included a list of open-ended prompts for students to base their discussion posts on. These prompts included seeking clarification (when necessary), raising questions/issues with aspects of the article, drawing connections between the research findings and other topic areas/research studies, and proposing alternative solutions or explanations for the study's findings. Students were also expected to read and respond to their peers' posts, but these responses were not included as a part of their discussion grade. The precise language used by the instructor for these discussions is provided in Appendix A. Students' discussion post transcripts were copied and pasted from the university's online learning platform to word processing software and prepared for analysis. Across 12 weekly discussion threads, students contributed 626 posts (M = 11.81, SD = 2.25) throughout both offerings of the course. As described in greater detail in the following sections, these discussion posts were analyzed using manual content analysis of cognitive presence, as well as automated linguistic analysis of cognitive processing language. 14 Assessment Data During the final weeks of the course, students were expected to write an in-depth analysis concerning a research topic relevant to the ones discussed throughout the course. This assessment could take one of two forms: either a comprehensive literature review or a hypothetical research proposal of the chosen subject area (see Appendix B). Student scores on these final term papers were recorded from the university's online gradebook for analysis of the third research question, which inquired about the relations between students' academic performance and cognitive processing language in online discourse. We operationalized academic performance via this assignment for two reasons. First, as the instructor did not administer a formal final exam, this paper accounted for the largest proportion of the student's final course grade (30%). Second, the instructor's outlined expectations for this assignment were well-aligned with both the subjects and reflective style of writing encouraged in the asynchronous online discussion forums – suggesting that using student's scores on this assignment would be an effective summative assessment of their learning from participation in online discussions. Note that grade data were unavailable for three students in Semester A and one student in Semester B, requiring all data associated with these four students to be omitted from the analysis of the third research question. Study Procedure In the following sections, we outline the procedures used to assess the cognitive variables underlying student's written reflections in their discussion posts. These analyses included (a) the manual coding of student's cognitive presence according to the Practical Inquiry model, and (b) automated analysis of student's use of cognitive processing 15 language through the LIWC software. However, given the unstructured nature of discussion post transcripts, a few prerequisite decisions had to be made prior to conducting these analyses. This involved both cleaning of the texts as well as segmenting individual discussion posts into an appropriate unit of analysis. Text Cleaning Basic text cleaning measures were applied to prepare our sample for linguistic analysis, such as the correction of common spelling errors and removal of URLs from student discussion posts. While doing so, we noticed that students frequently incorporated direct quotations from their assigned readings. We were concerned that including extensive quotations could yield LIWC output that was skewed toward the language used in the readings, rather than the students' original writing. A paired-samples t-test was conducted to compare the cognitive processing values in a sample of thematic units with and without quotations included in the LIWC input. On average, thematic units with the quotations included in LIWC input had higher cognitive processing values (M = 18.35, SD = 4.73) compared to the same units when quotations were removed (M = 17.26, SD = 6.12). This difference of 1.09 was statistically significant, t(53) = 2.57, p < .05. Thus, quotes exceeding five words were replaced by a tag that the LIWC software would not register in its analysis. Unit of Analysis Many schools of thought exist regarding the ideal unit for the analysis of online discussion post transcripts. In their review of relevant literature, Rourke et al. (2001) note the most common units used in these studies include the whole message, individual sentences, and thematic units. Following recommendations by De Wever et al. (2006), we 16 chose a unit of analysis based on considerations of our specific study's context. Specifically, we observed that students would often include several distinct insights about the weekly discussion topic within a single post, rather than making multiple separate posts for each unique idea. Based on this observation, we rejected the whole message unit of analysis, believing that assigning one code to a discussion post in its entirety would conceal potentially meaningful information about the learner’s cognitive presence. Additionally, because the LIWC software calculates cognitive processing values in proportion to the total word count of the text input, smaller text inputs result in less reliable measurements (Pennebaker et al., 2015), suggesting that using individual sentences would diminish the confidence in our linguistic analysis. Thus, we ultimately segmented discussion transcript data according to thematic units, or 'units of meaning' (Henri, 1992). Discussion post data were segmented into thematic units according to a protocol developed by the research team. This protocol instructed coder(s) to read the entirety of a student's discussion post several times. Once a firm understanding of the post was established, coder(s) would independently identify the primary arguments or ideas brought up by the author of the post. These coders would then segment these passages at specific points where a student appeared to transition from one idea to the next rhetorically. These transitions were often (but not exclusively) signified by linguistic markers such as "However,", or "On the other hand,". To ensure confidence in this segmentation protocol, a sample of 31 student discussion posts was used to train and establish reliability between the two researchers assigned to segment the data. Once an acceptable inter-rater reliability was reached (IRR = .84), the two researchers segmented 17 the remaining discussion post data. This resulted in 1852 total thematic units, which would be the final sample size used to code for cognitive presence and the automated linguistic analysis. See Table 1 for summary information regarding the sample. Table 1 Sampled text from online discussion threads in two undergraduate college classes. Discussion Posts Thematic Units Topic Semester A Semester B Semester A Semester B Active Learning 32 20 96 64 Growth Mindset 30 14 95 39 Problem-Solving 34 23 93 65 Learning by Teaching 37 19 104 58 Concreteness Fading 30 17 92 66 Inquiry-Based Learning 32 19 90 61 Discovery Learning 30 20 84 63 Concepts & Categories 31 22 90 63 Distributed Learning 32 19 89 65 Stereotype Threat 36 21 102 58 Prior Knowledge 34 16 99 47 Mental Models 33 25 92 77 Total 626 1852 Note. Semester A contained 33 students; Semester B contained 20 students. Qualitative Analysis We employed a combination of deductive and inductive approaches for coding phases of cognitive presence within students’ discussion posts. Researchers were trained on the coding system first established by Garrison et al. (1999) and later revisited by Park (2009). This system is based on the previously described Practical Inquiry model and provided researchers with indicators and examples reflecting the different phases of cognitive presence. Coders initially coded a random subset of the data (N = 148) using this set of a priori codes. After code comparison and discussion, a few minor adjustments were made to the original coding system to better align with trends noted in the dataset. For example, there would be occasional instances where a unit could not be accurately 18 characterized by any of the indicators in the Practical Inquiry model. These units were typically either the exchange of social pleasantries or information wholly irrelevant to the subject of discussion. If both coders agreed that a particular unit could not be sufficiently described by any of the indicators in the Practical Inquiry model, they would code said units as “Other”. Codes were mutually exclusive, in that a single observation could not be considered both a triggering event and exploration, for example. Following this revision, coders underwent another round of coding to assess inter- rater reliability. After reaching a sufficient level of agreement (IRR = .93; Cohen’s k = .85), the two coders subsequently coded the remaining data. Table 2 represents the final coding scheme used to conduct the qualitative analyses. Table 2 Cognitive presence coding system used in manual content analysis. Code Definition Example Triggering Event Sense of Puzzlement Expressing of confusion, unease “This was confusing!” Expressing Interest Expressing interest, intrigue, etc. “It’s so fascinating how…” Clarification Effort to ensure correct understanding “What did they mean by…” Restating Summarizing a previously made point “On page 498, they claim…” Exploration Information Exchange Adding new information “Yesterday I learned that…” Agree/disagree Unsubstantiated (dis)agreement “I agree.” Personal Narrative Sharing relevant personal experiences “In high school, I…” Opinion Expressing a belief or attitude “I disliked how…” Integration Connect/Build-On Connecting or expanding on ideas “This seems related to…” Explain/Solve Offering explanation/solution to issue “We could fix this by… “ Agree/Disagree Substantiated (dis)agreement “I agree because…” Resolution Thought Experiment Well-structured, hypothetical reasoning “Imagine if…” Apply/Test/Defend Reasoning for supporting idea/solution “It would work because…” Follow-Up Inquiry Questions based on new understanding “Considering this, how…?” Note. Text segments which could not be characterized by any of these indicators were coded as “Other”. 19 Natural Language Processing Following our coding of cognitive presence, we processed these 1852 thematic units of student discussion post data through the LIWC software to identify relative levels of cognitive processing language. As previously described, the cognitive processing category is a composite of 797 target words associated with the expression of insight, causation, discrepancy, certainty, differentiation, and tentativeness (Pennebaker et al., 2015). See Table 3 for examples of words included in this LIWC category. Table 3 Cognitive processing subcategories and example target words. LIWC Category Examples Words in Category Cognitive Processes Cause, know, ought 797 Insight Think, know 259 Causation Because, effect 135 Discrepancy Should, would 83 Tentative Maybe, perhaps 178 Certainty Always, never 113 Differentiation Hasn’t, but, else 81 Note. Adapted from Pennebaker et al. (2015). LIWC = Language Inquiry and Word Count (2015) software. The LIWC program processed these data and provided us with numerical values which quantifies the degree to which students are using language reflective of these processes. These values were then used in a series of regression analyses to investigate relations between cognitive processing language, cognitive presence, instructional modality, and students' academic performance in the course. 20 Chapter 5: Results Prior to the analysis of specific research questions, we used descriptive statistics to examine general trends in the dataset. As illustrated in Table 4, students used a relatively high percentage of cognitive processing language on average, which also appeared to vary depending on the researcher-coded phase of cognitive presence. Among these, the exploration phase of cognitive presence was the most frequently occurring code in the data (29%), followed by integration (27%), triggering events (21%), resolution (17%), and “Other” (6%). Although less common, thematic units coded as the resolution phase of cognitive presence also yielded the highest values of cognitive processing language compared to the other three phases described in the Practical Inquiry model. Table 4 Descriptive statistics of cognitive processing language for each phase of cognitive presence. Phase Thematic Units M SD Triggering Event 396 18.49 6.94 Exploration 539 19.10 6.44 Integration 505 18.86 5.44 Resolution 309 20.26 5.44 Other 103 16.49 8.23 Research Question 1 After examining these descriptive statistics, we turned to the investigation of our first research question: To what extent are the linguistic features of student discussion posts, defined as cognitive processes, viable indicators of higher-level cognition presence described by the CoI framework? To answer this question, we developed a series of candidate models estimating the effects of each phase of cognitive presence on students’ cognitive processing language 21 scores. However, given the inherently nested structure of the data set, the standard assumption of independent observations required for multiple linear regression could not be met and linear mixed modeling was used. Specifically, each candidate model incorporated nested random effects to account for both the individual differences between students as well as unwanted variability introduced as a result of repeated observations. In each model, we included the fixed effect(s) of cognitive presence phase(s) predicting the degree to which an observation contained language associated with cognitive processing. The first developed model (Model A) comprised all four phases of cognitive presence. We then employed backward elimination to identify whether this fully specified model could be reduced by sequentially eliminating lower-level variables without excessively compromising model fit. Specifically, we began by eliminating the term associated with the triggering event phase, followed by exploration, and finally, integration. From this elimination procedure, three additional candidate models were produced, and observations of diagnostic plots determined that the assumptions for regression analyses were reasonably met. After fitting each candidate model, we examined corrected Akaike Information Criteria (AICc) values to identify the model of best fit. See Table 5 for coefficient-level estimates, variance components, and information criteria used for model comparison. At the coefficient-level, intercepts represent the approximate means for observations not coded as any cognitive presence phase, whereas model estimates for each phase indicate the estimated change in cognitive processing values relative to its intercept. 22 Table 5 Coefficients and standard errors for four candidate models predicting use of cognitive processing language in online discussion posts of undergraduate college students. Cognitive Processing Language Model A Model B Model C Model D Fixed Effects Intercept 16.43 (0.64) 18.06 (0.34) 18.64 (0.27) 18.67 (0.25) Triggering Event 2.05 (0.68) Exploration 2.74 (0.66) 1.11 (0.38) Integration 2.33 (0.67) 0.69 (0.39) 0.12 (0.34) Resolution 3.88 (0.70) 2.25 (0.45) 1.67 (0.41) 1.63 (0.39) Random Effects σ2 Discussion 1.58 1.48 1.32 1.58 σ2 Student 1.80 1.79 1.78 1.78 σ2 Residual 35.66 35.94 36.26 35.67 Goodness of Fit AIC 12016.3 12023.4 12029.8 12027.9 BIC 12060.5 12062.1 12063.0 12063.0 Note. All models were fitted using Maximum Likelihood Estimation. Standard errors in parentheses. AIC = Akaike Information Criteria. BIC = Bayesian Information Criterion. These comparisons revealed that, given the data and other candidate models, Model A demonstrated the strongest empirical evidence of predicting cognitive processing language. This model consisted of all four phases included in the practical inquiry model, all of which contributed to a meaningful degree of variation in cognitive processing scores. This model shows that higher levels of cognitive processing language are associated with the more advanced phases of cognitive presence. Specifically, it predicts the highest use of cognitive processing language for student messages characterized as the resolution phase of cognitive presence, followed by exploration, integration, and triggering event(s). Model comparisons also revealed that including the lower-level parameters (i.e., triggering event and exploration) did not contribute to unnecessary model complexity. As previously mentioned, the data set incorporated multiple observations of 23 cognitive processing language for each student’s discussion post, and each student provided multiple discussion post contributions. Thus, it was necessary to understand how much of the variation in cognitive processing language scores is due to individual differences in students’ writing style or the level of cognitive processing language which a particular discussion post might have elicited compared to others. In the adopted model (Model A), the writing style characteristic to individual students accounted for about 5% of the total variance in cognitive processing language, while about 4% of the variance in cognitive processing language in the data was associated with discussion post-level differences. Research Question 2 After selecting a model predicting cognitive processing language based on cognitive presence phases, we underwent a similar process to address the second research question: To what extent does variation in the instruction modality (blended vs. fully online) of a college course influence student’s cognitive processing language in online discussion forums? Analysis of this question included the development and evaluation of a model which incorporating the effect of instructional modality on cognitive processing language (Model E). Like the model adopted in the previous analysis, Model E included all four fixed effects of cognitive presence phases as well as a nested effect structure. However, unlike Model A, we incorporated an additional fixed effect: a binary predictor reflecting whether a particular thematic unit was written by a student in either the blended or fully online version of the course (1 = blended; 0 = fully online). Following model development, two criteria were employed in deciding whether to 24 keep instruction modality as a parameter in the final adopted model. First, we conducted a Likelihood Ratio Test to compare relative goodness-of-fit between Model A and Model E. Results from this test showed that adding the instruction modality parameter failed to explain a meaningful amount of variation in cognitive processing language scores, χ2(1) = 0.159, p = .689. In addition, AICc value comparison between the two models suggested that the cost of including this additional parameter outweighed the impact of the parameter’s effect. Provided with these two criteria, we determined that Model A persisted as the strongest candidate model for our data. Research Question 3 We then sought to address our final research question: To what extent does cognitive processing language influence students' academic performance in an online course? We employed a simple linear regression to test if students’ use of cognitive processing language in online discourse predicted their performance on a major written assessment. The predictor, or independent variable in this model, comprised the sum of all cognitive processing words used by students across all their individual contributions to the online discussion forum (M = 485.20, SD = 131.06). The outcome, or dependent variable in this model included student’s scores on the final term paper, an assessment which comprised 30% of their final grade in the course. Students could earn a maximum of 300 points (M = 254.71, SD = 44.47). Note that assessment data from four of the fifty- three student participants included in the prior analyses were missing, requiring their exclusion from this regression (N = 49). The developed model explained a statistically significant proportion of the variance in final term paper scores, R2 = 0.096, F(1, 47) = 4.97, p < .05. At the coefficient 25 level, each additional cognitive processing word used by students in their discussion posts was associated with a 0.10 increase in points earned on their final term paper, a finding which was deemed statistically significant (α = 0.05; see Table 6). Table 6 Estimates of a model predicting students’ final term paper scores based on cognitive processing words used in online learning discourse. 95% CI Effect Estimate SE LL UL p Intercept 203.79 23.64 156.24 251.34 < .001 Cognitive Processing Words 0.10 0.05 0.01 0.20 .031 Note. N = 49. CI = confidence interval; LL = lower limit; UL = upper limit. On a larger scale, the model estimates predict a near 14-point increase (β = 13.76) in final term paper scores for each standard deviation increase in cognitive processing language (SD = 131.06). Considering this assessment was graded out of 300 points, a difference of 14 points accounts for nearly half a letter grade (i.e., receiving an “A” over a “B” grade, etc.). In addition, a follow-up analyses using a Welch t-test found differences in term paper scores between the two groups to be non-significant, t(44.2) = - 0.564, p = .576, suggesting that variation in instructional modality were unlikely to have confounded the effect of cognitive processing language on academic performance. 26 Chapter 6: Discussion Overall, we found a high level of cognitive processing language in students’ contributions to online discussions. Across all units, cognitive processing language comprised 16 – 20% of the total word count. To contextualize this finding, Pennebaker et al. (2015) reported that, in a corpus of both online and physical texts, cognitive processing language only comprised about 11% of the total word count. For cognitive presence, the most commonly observed phase was exploration (29%), closely followed by integration (27%). Considering that this finding is consistent with prior research in the CoI framework (Garrison et al., 1999; Joksimović et al., 2014; Park, 2009), we can conclude that the forms of engagement most characteristic of online discussion forums include the exchange of information/brainstorming ideas and the synthesis of said information/ideas into more coherent representations. However, it is important to note that triggering events (21%) and resolution (17%) were also semi-regularly occurring phases. Cognitive Presence Model In the first research question, we sought to operationalize online learner’s cognitive presence via linguistic proxies generated by the LIWC software. To do so, we manually coded thematic units from student discussion posts according to the practical inquiry model of cognitive presence. Subsequently, we developed a series of theoretically-informed candidate models using combinations of the four cognitive presence codes as predictors of cognitive processing language. Examination of model evidence (AICc values) revealed that all four phases of cognitive presence were important for predicting cognitive processing language scores. Our findings suggest that 27 the automated analyses of discussion transcripts can produce linguistic proxies for the phase of cognitive presence, perhaps avoiding the feasibility concerns associated with manual content analysis of large sets of data (Kovanović et al., 2016). Our model suggests that the amount of cognitive processing language used by students in online discourse depends on their progression through the cycle of Practical Inquiry. Between the four phases of cognitive presence, the resolution phase was associated with the greatest use of cognitive processing language in online learning discourse, followed by exploration, integration, and triggering events. The Practical Inquiry model suggests resolution to be the highest level of cognitive presence and is associated with deep-level learning and critical thinking (Garrison et al., 1999). Correspondingly, it is often noted to be one of the most difficult phases of the practical inquiry model to assess in the manual content analysis procedures used in prior research (Akyol & Garrison, 2011; Garrison & Arbaugh, 2007), motivating the present study’s effort to expand our measurement capabilities. Our adopted model also showed a greater effect of the exploration phase on cognitive processing language compared to integration. This finding might appear theoretically inconsistent given that the Practical Inquiry model places the integration phase as a relatively higher indicator of cognition. We offer two potential explanations for why this might be the case. First, because LIWC calculates cognitive processing language values based on the composite of several subcategories, it is possible that units coded as exploration contained higher values in one or two specific subcategories compared to integration, potentially resulting in higher overall cognitive processing values. This explanation is supported by Joksimović et al. (2014), who found a notably 28 higher concentration of insight words (e.g., ‘think’, ‘know’, etc.) in the exploration compared to the integration phase. A second potential explanation for these findings may result from the dictionary-based approach used by the LIWC program. It is possible that the type of discursive activity occurring during moments of exploration (i.e., information exchange/brainstorming) naturally lends itself to be more easily captured by word-count based measures compared to the activity occurring in moments of integration, or idea synthesis. Measurement of the integration phase might require more complex NLP models, such as ones that incorporate the semantic relationships between words or the overall coherence of written text. Overall, it is important to emphasize that the cycle of Practical Inquiry is fluid, meaning that discursive activity is not easily bounded within a strict realm of four discrete cognitive presence phases (as evidenced by a moderate proportion of thematic units coded as ‘Other’). However, when taken as a whole, the present study found that student’s progression from low- to high-level cognitive presence is reflected by their increased usage of words relating to cognitive processing. Instruction Modality The existing research literature on distance/online learning has shown that such modalities are at least as effective as traditional, face-to-face learning environments in supporting positive student learning outcomes (Siemens et al., 2015). Distance learning can take many forms, such as blended/hybrid or fully online, yet little inquiry has been made about how these specific instruction modalities might affect the development of cognitive presence in student discussions. Because blended modalities offer a combination of online and face-to-face interaction, one might assume these offer 29 relatively more opportunities for learners to engage in reflective inquiry (Garrison & Kanuka, 2004) and prior research in the CoI framework has provided some moderate support for that hypothesis (Akyol & Garrison, 2011). However, the results from our analysis do not support this intuition. Given our data and model findings, we suggest that the two modes of instruction are equivalent in terms of the cognitive processing language used by students during asynchronous discussion. Academic Performance Conceptually, cognitive presence can be understood as a learning process defined by the progression through a cycle of practical inquiry (Garrison, 2007). Accordingly, it would make sense for researchers to want to discern the impact of this learning process on actualized student outcomes. Such ‘learning products’ are impactful for students' continued academic development and eventual attainment of undergraduate college degrees. In the final set of analyses, we investigated the effect of students’ cognitive processing language use on their academic performance in a college course. Our findings revealed that, when students use more language indicative of cognitive processing, they also scored higher on their final term papers. Prior research using the LIWC instrument has demonstrated an impact of students' use of small words, such as articles and pronouns, on their academic outcomes (Pennebaker et al., 2014; Robinson et al., 2013). Other literature has correspondingly found strong associations between coded cognitive presence phases and final course performance (Akyol & Garrison, 2011; Galikyan & Admiraal, 2019; Guo et al., 2021). However, no research to our knowledge has incorporated cognitive processing language in their analysis of cognitive presence to study such outcomes. This study is the first to 30 model these academic outcomes by operationalizing cognitive presence through automated linguistic proxies. Our combined findings suggest that cognitive processing words reflect meaningful processes (i.e., cognitive presence) essential for student learning in online environments. Limitations The present study included a few limitations worthy of acknowledgement. First, the LIWC software is restricted to the classification of individual words, as opposed to the classification of sentences, paragraphs, etc. Unlike human coders, it cannot detect irony, sarcasm, idiom, or any other sub-textual characteristics contained within the text input. This is precisely the reason we refer to cognitive processing as a surface-level measurement of students’ cognitive presence, as LIWC only incorporates directly observable characteristics of text input (e.g., frequencies of free morphemes, punctuation marks, etc.) in its analysis of written text. This study did attempt to account for this limitation by integrating manual content analysis, which is generally much more considerate of subtext given its reliance on human interpretation. The second limitation of this study pertains to the unit of analysis. Instead of syntactical units (sentences, paragraphs, whole messages, etc.), we opted for thematic units, or ‘units of meaning’ (Rourke et al., 2001). Compared to segmenting by message or paragraph, these offered a better representation of our data; however, thematic units are not without their own disadvantages. De Wever et al. (2006) note that these units are often poorly operationalized and vulnerable to subjective research interpretations, raising concerns about generalizability. To mitigate this limitation, we developed a segmentation protocol which was found to be sufficiently reliable. 31 The final limitation regards our data sampling context, specifically for the Semester B course offering. Students in the Semester B group had enrolled in a fully online version of this course because of public health restrictions mandated by the COVID-19 pandemic. Under normal conditions, college students select coursework based on a variety of factors, including instruction modality (McPartlan et al., 2021). Students in this study did not have this choice, meaning individual differences based on modality preferences were not accounted for. However, this might have been advantageous, as students could not self-select into their preferred instruction modalities, minimizing potential sampling bias. Conclusion The present study offers a unique methodological approach to the study of cognitive presence, which is a critical dimension of student learning in online Communities of Inquiry. In the CoI research literature, quantitative content analysis is the traditional methodological approach for describing a learner's cognitive presence. This approach is considerably time-consuming and labor-intensive, a constraint which may restrain the ability for instructors in higher education to apply the CoI framework in the real-time monitoring of their student’s learning (Kovanović et al., 2016). In response to these challenges, this study used surface-level linguistic indicators of cognitive processing as a proxy measure of students' cognitive presence in online asynchronous discourse. We found high levels of language reflecting students’ cognitive presence in asynchronous online discussion forums, which varied depending on their phase of practical inquiry. From these findings, we conclude that asynchronous online discussion forums can be a potent method for online instructors to advance student learning, and that 32 automated linguistic measures of cognitive presence may serve as effective indicators of this learning. 33 References Adedoyin, O. B., & Soykan, E. (2020). Covid-19 pandemic and online learning: The challenges and opportunities. Interactive Learning Environments, 0(0), 1–13. https://doi.org/10.1080/10494820.2020.1813180 Akyol, Z., & Garrison, R. (2011). Understanding cognitive presence in an online and blended community of inquiry: Assessing outcomes and processes for deep approaches to learning. British Journal of Educational Technology, 42(2), 233– 250. https://doi.org/10.1111/j.1467-8535.2009.01029.x Al-Husban, N. A. (2020). Critical thinking skills in asynchronous discussion forums: A case study. International Journal of Technology in Education, 3(2), 82–91. Anderson, T., Rourke, Liam, Garrison, R., & Archer, W. (2001). Assessing teaching presence in a computer conferencing context. https://auspace.athabascau.ca/handle/2149/725 Bernard, R., Borokhovski, E., Schmid, R., Tamim, R., & Abrami, P. (2014). A meta- analysis of blended learning and technology use in higher education: From the general to the applied. Journal of Computing in Higher Education, 26. https://doi.org/10.1007/s12528-013-9077-3 Boyd, R., & Pennebaker, J. (2015). A way with words: Using language for psychological science in the modern era (pp. 222–236). Chowdhary, K. R. (2020). Natural Language Processing. In K. R. Chowdhary (Ed.), Fundamentals of Artificial Intelligence (pp. 603–649). Springer India. https://doi.org/10.1007/978-81-322-3972-7_19 Chowdhury, G. G. (2003). Natural Language Processing. Annual Review of Information Science and Technology, 37(1), 51–89. https://doi.org/10.1002/aris.1440370103 Crawford, J., Butler-Henderson, K., Rudolph, J., Malkawi, B., Glowatz, M., Burton, R., Magni, P., & Lam, S. (2020). COVID-19: 20 countries’ higher education intra- period digital pedagogy responses. Journal of Applied Learning & Teaching, 3(1), Article 1. https://doi.org/10.37074/jalt.2020.3.1.7 De Wever, B., Schellens, T., Valcke, M., & Van Keer, H. (2006). Content analysis schemes to analyze transcripts of online asynchronous discussion groups: A review. Computers & Education, 46(1), 6–28. https://doi.org/10.1016/j.compedu.2005.04.005 Dewey, J. (1910). How we think. D.C. Heath & Co. Donnelly, R., & Gardner, J. (2011). Content analysis of computer conferencing transcripts. Interactive Learning Environments, 19(4), 303–315. https://doi.org/10.1080/10494820903075722 Galikyan, I., & Admiraal, W. (2019). Students’ engagement in asynchronous online discussion: The relationship between cognitive presence, learner prominence, and academic performance. The Internet and Higher Education, 43, 100692. https://doi.org/10.1016/j.iheduc.2019.100692 Garrison, D. R. (2007). Online community of inquiry review: Social, cognitive, and teaching presence issues. Journal of Asynchronous Learning Networks, 11(1), 61– 72. 34 Garrison, D. R., & Akyol, Z. (2013). Toward the development of a metacognition construct for communities of inquiry. The Internet and Higher Education, 17, 84– 89. https://doi.org/10.1016/j.iheduc.2012.11.005 Garrison, D. R., Anderson, T., & Archer, W. (1999). Critical inquiry in a text-based environment: Computer conferencing in higher education. The Internet and Higher Education, 2(2–3), 87–105. https://doi.org/10.1016/S1096- 7516(00)00016-6 Garrison, D. R., Anderson, T., & Archer, W. (2001). Critical thinking, cognitive presence, and computer conferencing in distance education. American Journal of Distance Education, 15(1), 7–23. https://doi.org/10.1080/08923640109527071 Garrison, D. R., & Arbaugh, J. B. (2007). Researching the community of inquiry framework: Review, issues, and future directions. The Internet and Higher Education, 10(3), 157–172. https://doi.org/10.1016/j.iheduc.2007.04.001 Garrison, D. R., & Cleveland-Innes, M. (2005). Facilitating cognitive presence in online learning: Interaction is not enough. American Journal of Distance Education, 19(3), 133–148. https://doi.org/10.1207/s15389286ajde1903_2 Garrison, D. R., & Kanuka, H. (2004). Blended learning: Uncovering its transformative potential in higher education. The Internet and Higher Education, 7(2), 95–105. https://doi.org/10.1016/j.iheduc.2004.02.001 Guo, P., Saab, N., Wu, L., & Admiraal, W. (2021). The Community of Inquiry perspective on students’ social presence, cognitive presence, and academic performance in online project-based learning. Journal of Computer Assisted Learning, 37(5), 1479–1493. https://doi.org/10.1111/jcal.12586 Hayati, H., Abdessamad, C., Idrissi, M., & Bennani, S. (2019). Doc2vec & Naïve Bayes: Learners’ cognitive presence assessment through asynchronous online discussion TQ transcripts. International Journal of Emerging Technologies in Learning (iJET), 14, 70. https://doi.org/10.3991/ijet.v14i08.9964 Hayati, H., Khalidi Idrissi, M., & Bennani, S. (2020). Automatic classification for cognitive engagement in online discussion forums: Text mining and machine learning approach. In I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. Millán (Eds.), Artificial Intelligence in Education (Vol. 12164, pp. 114–118). Springer. https://doi.org/10.1007/978-3-030-52240-7_21 Henri, F. (1992). Computer conferencing and content analysis. In A. R. Kaye (Ed.), Collaborative Learning Through Computer Conferencing (pp. 117–136). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-77684-7_8 Hew, K. F., & Cheung, W. S. (2014). Students’ and instructors’ use of massive open online courses (MOOCs): Motivations and challenges. Educational Research Review, 12, 45–58. https://doi.org/10.1016/j.edurev.2014.05.001 Hind, H., Khalidi Idrissi, M., & Bennani, S. (2018). Automatic assessment of CoI- cognitive presence within asynchronous online learning. 17th International Conference on Information Technology Based Higher Education and Training (ITHET), 1–5. https://doi.org/10.1109/ITHET.2018.8424791 Hoppe, H. U. (2017). Computational methods for the analysis of learning and knowledge building communities. In Handbook of learning analytics (First, pp. 23–33). Society for Learning Analytics Research (SoLAR). 35 Joksimović, S., Gašević, D., Kovanović, V., Adesope, O., & Hatala, M. (2014). Psychological characteristics in cognitive presence of communities of inquiry: A linguistic analysis of online discussions. The Internet and Higher Education, 22, 1–10. https://doi.org/10.1016/j.iheduc.2014.03.001 Knight, S., & Littleton, K. (2015). Discourse Centric Learning Analytics: Mapping the terrain. Journal of Learning Analytics, 2(1), Article 1. https://doi.org/10.18608/jla.2015.21.9 Kovanović, V., Joksimović, S., Gašević, D., & Hatala, M. (2014). Automated cognitive presence detection in online discussion transcripts. CEUR Workshop Proceedings, 1137. https://www.research.ed.ac.uk/en/publications/automated-cognitive- presence-detection-in-online-discussion-trans Kovanović, V., Joksimović, S., Waters, Z., Gašević, D., Kitto, K., Hatala, M., & Siemens, G. (2016). Towards automated content analysis of discussion transcripts: A cognitive presence case. Proceedings of the Sixth International Conference on Learning Analytics & Knowledge - LAK ’16, 15–24. https://doi.org/10.1145/2883851.2883950 Krippendorff, K. (2003). Content analysis: An introduction to its methodology. SAGE Publications. McPartlan, P., Rutherford, T., Rodriguez, F., Shaffer, J. F., & Holton, A. (2021). Modality motivation: Selection effects and motivational differences in students who choose to take courses online. The Internet and Higher Education, 49, 100793. https://doi.org/10.1016/j.iheduc.2021.100793 Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The effectiveness of online and blended learning: A meta-analysis of the empirical literature. Teachers College Record: The Voice of Scholarship in Education, 115(3), 1–47. https://doi.org/10.1177/016146811311500307 Moore, R. L., Oliver, K. M., & Wang, C. (2019). Setting the pace: Examining cognitive processing in MOOC discussion forums with automatic text analysis. Interactive Learning Environments, 27(5–6), 655–669. https://doi.org/10.1080/10494820.2019.1610453 Neto, V., Rolim, V., Ferreira, R., Kovanović, V., Gašević, D., Dueire Lins, R., & Lins, R. (2018). Automated analysis of cognitive presence in online discussions written in Portuguese. In V. Pammer-Schindler, M. Pérez-Sanagustín, H. Drachsler, R. Elferink, & M. Scheffel (Eds.), Lifelong Technology-Enhanced Learning (pp. 245–261). Springer International Publishing. https://doi.org/10.1007/978-3-319- 98572-5_19 Neto, V., Rolim, V., Pinheiro, A., Lins, R. D., Gašević, D., & Mello, R. F. (2021). Automatic content analysis of online discussions for cognitive presence: A study of the generalizability across educational contexts. IEEE Transactions on Learning Technologies, 14(3), 299–312. https://doi.org/10.1109/TLT.2021.3083178 Park, C. L. (2009). Replicating the use of a cognitive presence measurement tool. Journal of Interactive Online Learning, 8(2), 16. Pena-Shaff, J. B., & Nicholls, C. (2004). Analyzing student interactions and meaning construction in computer bulletin board discussions. Computers & Education, 42(3), 243–265. https://doi.org/10.1016/j.compedu.2003.08.003 36 Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. https://doi.org/10.15781/T29G6Z Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., & Beaver, D. I. (2014). When small words foretell academic success: The case of college admissions essays. PLoS ONE, 9(12), e115844. https://doi.org/10.1371/journal.pone.0115844 Robinson, R. L., Navea, R., & Ickes, W. (2013). Predicting final course performance from students’ written self-introductions: A LIWC analysis. Journal of Language and Social Psychology, 32(4), 469–479. https://doi.org/10.1177/0261927X13476869 Rourke, L., Anderson, T., Garrison, D. R., & Archer, W. (2001). Methodological issues in the content analysis of computer conference transcripts. International Journal of Artificial Intelligence in Education, 11. https://auspace.athabascau.ca/handle/2149/715 Shea, P., & Bidjerano, T. (2010). Learning presence: Towards a theory of self-efficacy, self-regulation, and the development of a communities of inquiry in online and blended learning environments. Computers & Education, 55(4), 1721–1731. https://doi.org/10.1016/j.compedu.2010.07.017 Shea, P., & Bidjerano, T. (2012). Learning presence as a moderator in the community of inquiry model. Computers & Education, 59(2), 316–326. https://doi.org/10.1016/j.compedu.2012.01.011 Shea, P., Hayes, S., Uzuner-Smith, S., Gozza-Cohen, M., Vickers, J., & Bidjerano, T. (2014). Reconceptualizing the community of inquiry framework: An exploratory analysis. The Internet and Higher Education, 23, 9–17. https://doi.org/10.1016/j.iheduc.2014.05.002 Siemens, G. (2013). Learning Analytics: The emergence of a discipline. American Behavioral Scientist, 57(10), 1380–1400. https://doi.org/10.1177/0002764213498851 Siemens, G., Gašević, D., & Dawson, S. (2015). Preparing for the digital university: A review of the history and current state of distance, blended and online learning. Athabasca University Press. https://research.monash.edu/en/publications/preparing-for-the-digital-university- a-review-of-the-history-and- Tan, C. L., & Ng, L. L. (2014). Assessing critical thinking performance of postgraduate students in threaded discussions. In International Association for Development of the Information Society. International Association for the Development of the Information Society. https://eric.ed.gov/?id=ED557315 Van Wart, M., Ni, A., Medina, P., Canelon, J., Kordrostami, M., Zhang, J., & Liu, Y. (2020). Integrating students’ perspectives about online learning: A hierarchy of factors. International Journal of Educational Technology in Higher Education, 17(1), 53. https://doi.org/10.1186/s41239-020-00229-8 Vygotsky, L. S. (1962). Thought and language (E. Hanfmann & G. Vakar, Eds.). MIT Press. https://doi.org/10.1037/11193-000 Vygotsky, L. S. (1978). Mind in society: Development of higher psychological processes (M. Cole, V. Jolm-Steiner, S. Scribner, & E. Souberman, Eds.). Harvard University Press. https://doi.org/10.2307/j.ctvjf9vz4 37 Zhao, Y., Lei, J., Yan, B., Lai, C., & Tan, H. S. (2005). What makes the difference? A practical analysis of research on the effectiveness of distance education. Teachers College Record, 107(8), 1836–1884. https://doi.org/10.1111/j.1467- 9620.2005.00544.x 38 Appendix A During most weeks, there will be one class devoted to Discussion. (During other weeks, there will be two.) On Discussion days, we will read and discuss one or more scientific articles. When the readings are marked “***”, you must post a question or comment about one of these articles to the discussion forum on the Canvas site. Your question/comment should be at least one paragraph long, and should represent your grappling with the article. • It might concern an aspect that you are not sure you understand. If so, it is not enough to say that you didn’t understand X – you must say what you think X means, and why you are unsure of your understanding. • It might concern a problem with the theoretical claims, experimental design, or interpretation of the findings. • It might draw a connection to another article that we have read, or to a finding or theory that you know of but that we have not covered. • It might propose a new study, for example to rule out an alternate explanation or to build on the findings to address a new research question. Your question/comment should be directed to the whole class, not to me. I will select a subset of these questions/comments for further discussion in class. 39 Appendix B You are required to write a final paper of approximately 10 double-spaced pages, with at least 8 full pages of body text. (The rest can include a mandatory list of references, an optional title page, and figures and tables as appropriate.) The topic can be one we discussed in class. It can also be a topic that we did not discuss in class, but which is relevant to educational psychology, subject to instructor approval. Your paper should review one aspect of the educational psychology literature in depth and provide a cohesive summary of conceptual and empirical advances. It should contain the following sections: an Introduction identifying the literature of interest and the research questions it targets, a Review section summarizing relevant studies in a principled way, a Discussion section evaluating the current state of the literature, and a Future Directions section discussing outstanding empirical questions and suggesting future studies. • If you feel more ambitious, your final paper can describe a small empirical study that you run to address an open question in cognitive psychology. The study should be co-designed with the instructor. Because this is a pilot study, the number of participants can be rather small (as few as five). The paper should contain the standard sections of an empirical paper: an Introduction reviewing the relevant literature and motivating your research question (this will be relatively short and will review only 3 or 4 papers) • a Method section describing the mechanics of the study • a Results section describing your analyses of the data • a Discussion section interpreting the results • a Future Directions section suggesting possible follow-up experiments. Acknowledgements Abstract Table of Contents Acknowledgements i Abstract ii Table of Contents iii List of Tables iv Chapter 1: Introduction 1 Chapter 2: Literature Review 3 Online Communities of Inquiry 3 Cognitive Presence 4 Discourse-Centric Learning Analytics 6 Chapter 3: Current Study 11 Chapter 4: Method 12 Participants 12 Materials and Data Collection 12 Study Procedure 14 Chapter 5: Results 20 Research Question 1 20 Research Question 2 23 Research Question 3 24 Chapter 6: Discussion 26 Cognitive Presence Model 26 Instruction Modality 28 Academic Performance 29 Limitations 30 Conclusion 31 References 33 Appendix A 38 Appendix B 39 List of Tables Chapter 1: Introduction Chapter 2: Literature Review Online Communities of Inquiry Cognitive Presence Discourse-Centric Learning Analytics Chapter 3: Current Study Chapter 4: Method Participants Materials and Data Collection Discussion Posts Assessment Data Study Procedure Text Cleaning Unit of Analysis Qualitative Analysis Natural Language Processing Chapter 5: Results Research Question 1 Research Question 2 Research Question 3 Chapter 6: Discussion Cognitive Presence Model Instruction Modality Academic Performance Limitations Conclusion References Appendix A Appendix B