Cognitive Processing Language in Communities of Inquiry: An Examination of 

Cognitive Presence, Instruction Modality, and Academic Performance of Online Learners 

 
A THESIS 

SUBMITTED TO THE FACULTY OF 

UNIVERSITY OF MINNESOTA 

BY 

 
Samuel Bullard 

 
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS 

FOR THE DEGREE OF 

MASTER OF ARTS 

 
Keisha Varma, Ph.D., Advisor 

 
May 2024


© Samuel Bullard, 2024 


i 

Acknowledgements 

I would first like to acknowledge my advisor, Dr. Keisha Varma, who supported 

and guided me throughout the process of writing this thesis. Next, I want to express my 

gratitude to my other committee members and thesis reviewers, Dr. Panayiota Kendeou 

and Dr. Seth Thompson, for dedicating their time to provide me with valuable feedback 

and encouragement. I would also like to thank Dr. Martin Van Boekel and Beryl 

Belmonte, whose assistance during the early stages of this project was instrumental in 

making this work possible. Lastly, I want to thank my family, who provided me with 

enduring support throughout this journey.


ii 

Abstract 

Within the Community of Inquiry framework, the cognitive presence of learners 

represents a critical feature of the knowledge construction process in online settings. In 

this study, we analyzed 1852 thematic units of college students’ online discussion posts 

using both manual coding of cognitive presence and automated linguistic analysis of 

cognitive processing language. Following these preliminary analyses, we developed a 

series of regression models to examine relations between cognitive presence, instruction 

modality, academic performance, and cognitive processing language. We compared four 

multilevel models using phases of cognitive presence and instruction modality as 

predictors of cognitive processing language. The final model comprised all four cognitive 

presence phases, excluding instruction modality. We also found an effect of this linguistic 

proxy measure on students’ academic performance. These findings have both practical 

and theoretical implications for future use of surface-level linguistic proxies to assess 

student learning in online discourse-based learning environments.


iii 

Table of Contents 

Acknowledgements ............................................................................................................ i 

Abstract .............................................................................................................................. ii 

Table of Contents ............................................................................................................. iii 

List of Tables .................................................................................................................... iv 

Chapter 1: Introduction ................................................................................................... 1 

Chapter 2: Literature Review .......................................................................................... 3 

Online Communities of Inquiry ...................................................................................... 3 

Cognitive Presence .......................................................................................................... 4 

Discourse-Centric Learning Analytics ............................................................................ 6 

Chapter 3: Current Study .............................................................................................. 11 

Chapter 4: Method .......................................................................................................... 12 

Participants .................................................................................................................... 12 

Materials and Data Collection ....................................................................................... 12 

Study Procedure ............................................................................................................ 14 

Chapter 5: Results........................................................................................................... 20 

Research Question 1 ...................................................................................................... 20 

Research Question 2 ...................................................................................................... 23 

Research Question 3 ...................................................................................................... 24 

Chapter 6: Discussion ..................................................................................................... 26 

Cognitive Presence Model ............................................................................................ 26 

Instruction Modality ...................................................................................................... 28 

Academic Performance ................................................................................................. 29 

Limitations .................................................................................................................... 30 

Conclusion ..................................................................................................................... 31 

References ........................................................................................................................ 33 

Appendix A ...................................................................................................................... 38 

Appendix B ...................................................................................................................... 39 


iv 

List of Tables 

Table 1. Sampled text from online discussion threads in two undergraduate college 

classes. .............................................................................................................................. 17 

Table 2. Cognitive presence coding system used in manual content analysis. ................. 18 

Table 3. Cognitive processing subcategories and example target words. ......................... 19 

Table 4. Descriptive statistics of cognitive processing language for each phase of 

cognitive presence. ............................................................................................................ 20 

Table 5. Coefficients and standard errors for four candidate models predicting use of 

cognitive processing language in online discussion posts of undergraduate college 

students. ............................................................................................................................ 22 

Table 6. Estimates of a model predicting students’ final term paper scores based on 

cognitive processing words used in online learning discourse. ........................................ 25


1 

Chapter 1: Introduction 

Although student enrollment in blended or online course modalities has been a 

noted trend in higher education for years (Hew & Cheung, 2014), the compulsory shift to 

virtual learning during the COVID-19 pandemic magnified the need for pedagogical tools 

which can facilitate a level of student engagement analogous to in-person instruction 

(Adedoyin & Soykan, 2020; Crawford et al., 2020). One instrument for achieving this 

goal is instructors' use of asynchronous online discussion forums, in which students post 

written messages reflecting on learning material and interact with their classmates. 

However, the current research literature presents contradictory findings regarding 

the medium's effectiveness in promoting the cognitive skills (e.g., critical thinking, 

reflective inquiry, etc.) necessary for reaching satisfactory levels of academic 

performance. On the one hand, some studies suggest student learning is enhanced in these 

forums via the co-construction of knowledge with peers (De Wever et al., 2006; Galikyan 

& Admiraal, 2019; Pena-Shaff & Nicholls, 2004). On the other hand, findings from other 

studies suggest that these interactions rarely go beyond a surface-level exchange of 

information (Al-Husban, 2020; Garrison, 2007; Garrison & Cleveland-Innes, 2005; Tan 

& Ng, 2014). This lack of consensus holds implications for both the research and 

instructional practices in higher education.  

In the present study, we aimed to clarify these discrepancies by developing a 

model which provides empirical support for the identification of higher-order cognition 

in online discourse based on discrete, surface-level linguistic information. In addition, we 

compared the cognitive processing language used in discussion posts by students enrolled 

in the same college course, differentiated solely by the modality of its instructional 


2 

content delivery (i.e., blended vs. entirely online). Finally, we modeled ethe association 

between these cognitive processes and students' academic performance in the course. To 

do so, we integrated two distinct theoretical and methodological approaches: the 

Community of Inquiry (CoI) Framework (Garrison et al., 1999) and Learning Analytics 

(Siemens, 2013). It was our expectation that integrating these two approaches will result 

in a more comprehensive understanding of the cognitive processes that are facilitated in 

online learning discourse. 


3 

Chapter 2: Literature Review 

The educational research literature has consistently supported the notion that 

distance education is at least as effective in supporting positive student learning outcomes 

compared to traditional, face-to-face instruction (Siemens et al., 2015). However, 

distance education is not a singularly defined set of instructional practices. Online 

mediums, like traditional classrooms, can create many different forms of learning 

opportunities. Provided the sufficient technological support, instructors may teach their 

courses entirely online using a variety of asynchronous (e.g., online discussion forums, 

pre-recorded lectures) or synchronous (e.g., live video conferencing software) resources. 

Alternatively, instructors might also consider combining elements of both online and 

face-to-face learning (i.e., blended learning; Siemens et al., 2015). Meta-analyses of 

studies comparing traditional, blended, and online learning have shown that, on average, 

student learning outcomes are enhanced by courses using blended modes of instruction, 

compared to those which are either only face-to-face or online (Bernard et al., 2014; 

Means et al., 2013; Zhao et al., 2005).  

Online Communities of Inquiry 

The Community of Inquiry framework (CoI) has persisted as a foundational 

theoretical guide for research on online teaching and learning. This framework was 

originally developed around the turn of the 21st century, a time when higher education 

saw greater pedagogical application for the 'world wide web' and computer-mediated 

conferencing technology (Garrison et al., 1999).  

The CoI framework describes three interrelated elements inherent to successful 

computer-mediated education: (1) cognitive presence, defined as the ability for learners 


4 

to construct meaning via online communication (Garrison et al., 2001), (2) social 

presence, or the capacity for a computer-mediated learning environment to foster social 

and emotional connections (Garrison, 2007), and (3) teaching presence, characterized as 

pedagogical elements such as the curricular design or facilitation of instructional 

strategies within a virtual educational experience (Anderson et al., 2001). Some 

proponents of the CoI framework have also advocated for the inclusion of a fourth 

element reflecting the self-regulative behaviors inherent to online learning (i.e. “learning 

presence”; Shea et al., 2014; Shea & Bidjerano, 2010, 2012). However, the integration of 

this construct is somewhat contested given its conceptual similarities with elements of  

cognitive presence and failure to accommodate the collaborative nature of co-

constructing knowledge (Garrison & Akyol, 2013). Thus, the inclusion of learner 

presence as a unique component in the CoI framework has not yet been broadly 

recognized by the empirical literature. 

Cognitive Presence 

Cognitive presence serves as the primary theoretical construct within the CoI 

framework for describing the cognitive processes required for text-based collaborative 

learning environments to engage learners in critical thinking and inquiry (Garrison et al., 

2001). Garrison (2007) characterized cognitive presence as "a cycle of practical inquiry 

where participants move deliberately from understanding the problem or issue through to 

exploration, integration and application" (p. 65). Research has consistently shown that the 

cognitive presence of learners within an online community is critical to their perceived 

learning, academic performance, and overall satisfaction with their online educational 

experience (Akyol & Garrison, 2011; Galikyan & Admiraal, 2019; Van Wart et al., 


5 

2020). Given this influence, both education researchers and practitioners alike have 

vested interest in developing means for evaluating the degree to which learners are 

cognitively present during their online courses. 

Traditional methods for studying cognitive presence within text-based discourse 

commonly include manual content analysis (De Wever et al., 2006; Garrison et al., 1999; 

Kovanović et al., 2016), a method which Krippendorff (2003) described as "a research 

technique for making replicable and valid inferences from texts (or other meaningful 

matter) to the contexts of their use" (p. 18). To make such inferences regarding online 

learners' cognitive presence, the CoI framework uses the Practical Inquiry model to guide 

this process (Garrison et al., 1999). Inspired by Dewey (1910), this model describes 

cognitive presence in terms of a progression through four phases of inquiry: 

1. Triggering Events, in which an issue or problem is identified causing some 

unease among learners; 

2. Exploration, where students exchange relevant information or ideas related to 

the object of inquiry; 

3. Integration, or the synthesis of ideas deliberated on in the exploration phase; 

4. Resolution of the problem via hypothesis testing or application of ideas to new 

domains.  

Each of these four phases comprise the Practical Inquiry model and are described by a set 

of indicators to facilitate the coding process (see Table 2).  

Since the original publication of the CoI framework, the growth of technology 

and increasing ubiquity of the internet has made online interaction a pervasive feature of 

our personal, professional, and social lives – a development which has undoubtedly had 


6 

far-reaching implications for research on online learning. While employing manual 

content analysis may have been workable for early CoI research, the unprecedented 

amount of unstructured data (i.e., written text) made available to researchers because of 

widespread internet adoption present critical challenges to the framework's 

methodological feasibility in the current digital age (Boyd & Pennebaker, 2015). These 

concerns can be characterized in a variety of ways.  

For one, coding data by hand is a time-consuming and labor-intensive process, 

especially for large sets of data (Kovanović et al., 2016). In addition, manual coding 

largely depends on the interpretations of individual coders, making generalization across 

studies of cognitive presence difficult to assess (De Wever et al., 2006). Thus, despite the 

method offering a strong theoretical foundation for understanding cognitive presence in 

online communities, it is most often restricted to the retroactive analyses of research 

teams, limited sample sizes, and of minimal practical utility for instructors – resulting in a 

framework constrained in its ability to inform instructional design and practice (Donnelly 

& Gardner, 2011; Kovanović et al., 2016). Such methodological limitations require 

researchers to integrate new analytic approaches which allow for larger-scale analyses of 

the development of cognitive presence in online learning discourse. 

Discourse-Centric Learning Analytics 

The field of Discourse-Centric Learning Analytics focuses on the question of how 

discursive learning processes might be measured and eventually improved in online 

learning contexts (Knight & Littleton, 2015). Inspired by Vygotsky's (1962, 1978) 

emphasis on the relationship between language and thought, this discipline places 

particular importance on the linguistic activity of learners engaged in discourse, arguing 


7 

that such activity "is not merely an indicator (or proxy) for deeper learning, [rather] it is 

often the site of that learning" (Knight & Littleton, 2015, p. 187). Given this sentiment, 

research on online learning has gradually shifted attention towards the automated 

processing of natural language features embedded within online discourse.  

Natural Language Processing (NLP) is broadly defined as a computational 

approach for analyzing and interpreting human languages (Chowdhury, 2003). With the 

massive corpus of text data provided by the world wide web, NLP techniques are 

especially useful in performing various functions, such as information retrieval/indexing, 

text classification, translation, knowledge acquisition, and others (Chowdhary, 2020; 

Chowdhury, 2003). These analyses can vary in complexity, ranging from simpler, rule-

based language processing to more advanced machine learning systems. Although, like 

manual content analysis, one of the primary functions of these techniques is to extract 

semantic information from learning content (such as discussion post transcripts), it differs 

in its use of automation and computational capabilities (Hoppe, 2017). 

The Language Inquiry and Word Count software (LIWC) is one example of a 

popular dictionary-based NLP tool (Pennebaker et al., 2015). The LIWC software 

classifies linguistic activity into various psychological categories by comparing words in 

a text input with a validated internal dictionary, identifying "target" words which are then 

quantified in relation to the overall text input. In contrast to manual content analysis, 

these analyses are automated, and therefore can be conducted with minimal user effort. 

On the front-end of the software, users simply upload a file containing their text input, 

select the desired word categories for the program to analyze, and readily available 

numerical data is provided within a matter of seconds. As a result, this analytic software 


8 

might be a feasible option for educators seeking to monitor the level of cognitive 

presence of their students in real-time, or for researchers studying online Communities of 

Inquiry on a larger scale than previously thought possible.  

Several studies have used LIWC measures in their examination of student 

learning outcomes. In their analysis of college admissions essays, Pennebaker et al. 

(2014) found strong associations between students' use of function words (e.g., articles, 

prepositions, etc.) and their college GPA. Similar associations have been found between 

the linguistic features comprising student's written introductions of themselves and their 

academic performance in their college coursework (Robinson et al., 2013). While 

informative, these studies sampled longer, less discursive forms of student writing from 

face-to-face learning environments, leading one to question how such a tool might be 

appropriated for the analysis of language in online, discussion-based learning 

environments.  

Further, Kovanović et al. (2014, 2016) demonstrated the potential of using these 

linguistic features for automated detection of cognitive presence phases using more 

computationally complex techniques such as random forests classification. Such efforts 

have since been expanded upon in subsequent literature, resulting in classification models 

with an appreciable degree of accuracy (Hayati et al., 2019, 2020; Hind et al., 2018; Neto 

et al., 2018, 2021). These combined findings suggest that particular LIWC measures may 

be promising candidates to further explore the cognitive presence of online learners, and 

perhaps eventually expand instructor's ability to evaluate the success of their online 

discussion-based courses to facilitate these learning processes in real-time. 

However, to reach this degree of accuracy, such methods required a substantial 


9 

collection of linguistic features to train the classification algorithm – many features 

which, on a conceptual level, appear wholly unrelated to the targeted construct (i.e., 

cognitive presence phase) resulting in theoretically questionable interpretations. To 

operationalize the cognitive presence construct with linguistic proxies, and therefore open 

the possibility of automated assessment, greater confidence in the construct validity of 

such proxies must be established. Thus, research aimed at identifying linguistic proxies of 

cognitive presence might benefit from a priori selection of model features determined by 

their theoretical alignment with the targeted construct.  

This perspective was exemplified by Joksimović et al. (2014), who identified 

LIWC features which are both conceptually and empirically affiliated with an 

individual’s cognitive processing in online discourse. In this study, researchers selected 

several LIWC features thought to be associated with increased cognitive load (e.g., 

exclusive, causal words, etc.) and evaluated the prevalence of these features in online 

discussion posts relative to the students' observed phase of cognitive presence. As 

hypothesized, Joksimović et al. (2014) observed higher concentration of these words as 

students progressed toward the latter phases of cognitive presence (i.e., integration and 

resolution).  

Shortly following the publication of Joksimović et al.'s (2014) findings, an 

updated version of the language inquiry software was released (LIWC2015). This update 

expanded the tool's internal dictionary to introduce several novel linguistic features, 

including the cognitive processing category. Consistent with many of the features 

analyzed by Joksimović et al. (2014), cognitive processing is a composite of several word 

subcategories related to an individual's verbal expression of insight, causation, 


10 

discrepancy, tentativeness, certainty, and differentiation (Pennebaker et al., 2015; see 

Table 3). Yet aside from one isolated recent study (Moore et al., 2019), the cognitive 

presence literature has lacked a specific focus on this new linguistic feature since its 

addition to LIWC dictionaries. Additionally, as the technologies for large-scale 

automated language processing become more advanced, several additional questions 

remain regarding its application of the CoI framework, including its relationships 

between specific instruction modalities and learning outcomes, prompting the present 

study. 


11 

 Chapter 3: Current Study 

The present study sought to address three primary questions regarding the role of 

linguistic features within an undergraduate-level online learning environment. 

Specifically, we asked the following:  

1. To what extent are the linguistic features of student discussion posts, defined as 

cognitive processes, viable indicators of cognitive presence phases as described in 

the CoI framework? 

2. To what extent does variation in the instruction modality (blended vs. fully 

online) of a college course influence student's cognitive processing language in 

online discussion forums?  

3. To what extent does cognitive processing language influence students' academic 

performance in an online course?  

To answer these questions, we used a combination of manual content analysis and 

automated natural language processing, techniques which are further described in 

following chapters.


12 

Chapter 4: Method 

In this section, we describe the research context and sample used in this study, the 

procedures for analyzing this data, as well as statistical procedures used to model 

relationships between variables.  

Participants 

At a large research university in the Midwestern U.S., two classes of 

undergraduate college students (N = 53) enrolled in an introductory course covering the 

theoretical and methodological foundations of Educational Psychology. Students 

voluntarily enrolled in this course to receive academic credit. These student participants 

had self-selected into two separate offerings of this course, one of which was held in the 

Fall Semester of 2019 ('Semester A'; n = 33), while the other was held in the Fall 

Semester of 2020 ('Semester B'; n = 20). Although the course materials and instructor 

remained consistent between the two semesters, the mode of instruction differed because 

of public health restrictions which arose from the COVID-19 pandemic. Semester A was 

held in a blended learning format with face-to-face lectures and asynchronous online 

discussion. Semester B was held in a fully online format, with lectures delivered over 

web conferencing software and, like Semester A, included students taking part in 

asynchronous online discussions. 

Materials and Data Collection 

The following sections detail the materials and data collection process used in this 

study. All raw data (i.e., discussion post transcripts, student grades) were collected from 

the online learning management system used by the instructor of both semesters of the 

course. Note that all data were collected with a waiver of consent because the local 


13 

Institutional Review Board deemed this study to be of minimal risk to participants. Thus, 

specific demographic information of the study participants could not be collected. 

Discussion Posts 

In both offerings of this course (regardless of modality), the instructor assigned 

weekly required readings for students to reflect on with their peers in the asynchronous 

online discussion forum. These readings consisted of theoretical or empirical research 

articles which held considerable influence in the field of educational psychology. Each 

week, students were expected to contribute a minimum of a one paragraph-long 

discussion post describing their thoughts about the article. To encourage students' critical 

thinking skills, the instructor included a list of open-ended prompts for students to base 

their discussion posts on. These prompts included seeking clarification (when necessary), 

raising questions/issues with aspects of the article, drawing connections between the 

research findings and other topic areas/research studies, and proposing alternative 

solutions or explanations for the study's findings. Students were also expected to read and 

respond to their peers' posts, but these responses were not included as a part of their 

discussion grade. The precise language used by the instructor for these discussions is 

provided in Appendix A.  

Students' discussion post transcripts were copied and pasted from the university's 

online learning platform to word processing software and prepared for analysis. Across 

12 weekly discussion threads, students contributed 626 posts (M = 11.81, SD = 2.25) 

throughout both offerings of the course. As described in greater detail in the following 

sections, these discussion posts were analyzed using manual content analysis of cognitive 

presence, as well as automated linguistic analysis of cognitive processing language. 


14 

Assessment Data 

During the final weeks of the course, students were expected to write an in-depth 

analysis concerning a research topic relevant to the ones discussed throughout the course. 

This assessment could take one of two forms: either a comprehensive literature review or 

a hypothetical research proposal of the chosen subject area (see Appendix B). Student 

scores on these final term papers were recorded from the university's online gradebook 

for analysis of the third research question, which inquired about the relations between 

students' academic performance and cognitive processing language in online discourse.  

We operationalized academic performance via this assignment for two reasons. 

First, as the instructor did not administer a formal final exam, this paper accounted for the 

largest proportion of the student's final course grade (30%). Second, the instructor's 

outlined expectations for this assignment were well-aligned with both the subjects and 

reflective style of writing encouraged in the asynchronous online discussion forums – 

suggesting that using student's scores on this assignment would be an effective 

summative assessment of their learning from participation in online discussions. Note 

that grade data were unavailable for three students in Semester A and one student in 

Semester B, requiring all data associated with these four students to be omitted from the 

analysis of the third research question. 

Study Procedure 

In the following sections, we outline the procedures used to assess the cognitive 

variables underlying student's written reflections in their discussion posts. These analyses 

included (a) the manual coding of student's cognitive presence according to the Practical 

Inquiry model, and (b) automated analysis of student's use of cognitive processing 


15 

language through the LIWC software. However, given the unstructured nature of 

discussion post transcripts, a few prerequisite decisions had to be made prior to 

conducting these analyses. This involved both cleaning of the texts as well as segmenting 

individual discussion posts into an appropriate unit of analysis.   

Text Cleaning 

Basic text cleaning measures were applied to prepare our sample for linguistic 

analysis, such as the correction of common spelling errors and removal of URLs from 

student discussion posts. While doing so, we noticed that students frequently 

incorporated direct quotations from their assigned readings. We were concerned that 

including extensive quotations could yield LIWC output that was skewed toward the 

language used in the readings, rather than the students' original writing. A paired-samples 

t-test was conducted to compare the cognitive processing values in a sample of thematic 

units with and without quotations included in the LIWC input. On average, thematic units 

with the quotations included in LIWC input had higher cognitive processing values (M = 

18.35, SD = 4.73) compared to the same units when quotations were removed (M = 

17.26, SD  = 6.12). This difference of 1.09 was statistically significant, t(53) = 2.57, p < 

.05. Thus, quotes exceeding five words were replaced by a tag that the LIWC software 

would not register in its analysis. 

Unit of Analysis  

Many schools of thought exist regarding the ideal unit for the analysis of online 

discussion post transcripts. In their review of relevant literature, Rourke et al. (2001) note 

the most common units used in these studies include the whole message, individual 

sentences, and thematic units. Following recommendations by De Wever et al. (2006), we 


16 

chose a unit of analysis based on considerations of our specific study's context.  

Specifically, we observed that students would often include several distinct insights about 

the weekly discussion topic within a single post, rather than making multiple separate 

posts for each unique idea. Based on this observation, we rejected the whole message unit 

of analysis, believing that assigning one code to a discussion post in its entirety would 

conceal potentially meaningful information about the learner’s cognitive presence. 

Additionally, because the LIWC software calculates cognitive processing values in 

proportion to the total word count of the text input, smaller text inputs result in less 

reliable measurements (Pennebaker et al., 2015), suggesting that using individual 

sentences would diminish the confidence in our linguistic analysis. Thus, we ultimately 

segmented discussion transcript data according to thematic units, or 'units of meaning' 

(Henri, 1992).  

Discussion post data were segmented into thematic units according to a protocol 

developed by the research team. This protocol instructed coder(s) to read the entirety of a 

student's discussion post several times. Once a firm understanding of the post was 

established, coder(s) would independently identify the primary arguments or ideas 

brought up by the author of the post. These coders would then segment these passages at 

specific points where a student appeared to transition from one idea to the next 

rhetorically. These transitions were often (but not exclusively) signified by linguistic 

markers such as "However,", or "On the other hand,". To ensure confidence in this 

segmentation protocol, a sample of 31 student discussion posts was used to train and 

establish reliability between the two researchers assigned to segment the data. Once an 

acceptable inter-rater reliability was reached (IRR = .84), the two researchers segmented 


17 

the remaining discussion post data. This resulted in 1852 total thematic units, which 

would be the final sample size used to code for cognitive presence and the automated 

linguistic analysis. See Table 1 for summary information regarding the sample.  

Table 1 

Sampled text from online discussion threads in two undergraduate college classes. 

 Discussion Posts Thematic Units 

Topic Semester A Semester B Semester A Semester B 

Active Learning 32 20 96 64 

Growth Mindset 30 14 95 39 

Problem-Solving 34 23 93 65 

Learning by Teaching 37 19 104 58 

Concreteness Fading 30 17 92 66 

Inquiry-Based Learning 32 19 90 61 

Discovery Learning 30 20 84 63 

Concepts & Categories 31 22 90 63 

Distributed Learning 32 19 89 65 

Stereotype Threat 36 21 102 58 

Prior Knowledge 34 16 99 47 

Mental Models 33 25 92 77 

Total 626 1852 

Note. Semester A contained 33 students; Semester B contained 20 students. 

Qualitative Analysis 

We employed a combination of deductive and inductive approaches for coding 

phases of cognitive presence within students’ discussion posts. Researchers were trained 

on the coding system first established by Garrison et al. (1999) and later revisited by Park 

(2009). This system is based on the previously described Practical Inquiry model and 

provided researchers with indicators and examples reflecting the different phases of 

cognitive presence. Coders initially coded a random subset of the data (N = 148) using 

this set of a priori codes. After code comparison and discussion, a few minor adjustments 

were made to the original coding system to better align with trends noted in the dataset.  

For example, there would be occasional instances where a unit could not be accurately 


18 

characterized by any of the indicators in the Practical Inquiry model. These units were 

typically either the exchange of social pleasantries or information wholly irrelevant to the 

subject of discussion. If both coders agreed that a particular unit could not be sufficiently 

described by any of the indicators in the Practical Inquiry model, they would code said 

units as “Other”. Codes were mutually exclusive, in that a single observation could not be 

considered both a triggering event and exploration, for example.  

Following this revision, coders underwent another round of coding to assess inter-

rater reliability. After reaching a sufficient level of agreement (IRR = .93; Cohen’s k = 

.85), the two coders subsequently coded the remaining data. Table 2 represents the final 

coding scheme used to conduct the qualitative analyses.  

Table 2 

Cognitive presence coding system used in manual content analysis. 

Code Definition Example 

Triggering Event   

Sense of Puzzlement Expressing of confusion, unease “This was confusing!” 

Expressing Interest Expressing interest, intrigue, etc. “It’s so fascinating how…”  

Clarification Effort to ensure correct understanding “What did they mean by…” 

Restating Summarizing a previously made point  “On page 498, they claim…” 

Exploration   

Information Exchange Adding new information “Yesterday I learned that…” 

Agree/disagree Unsubstantiated (dis)agreement “I agree.” 

Personal Narrative Sharing relevant personal experiences “In high school, I…” 

Opinion Expressing a belief or attitude “I disliked how…” 

Integration   

Connect/Build-On Connecting or expanding on ideas “This seems related to…” 

Explain/Solve  Offering explanation/solution to issue “We could fix this by… “ 

Agree/Disagree Substantiated (dis)agreement “I agree because…”  

Resolution   

Thought Experiment Well-structured, hypothetical reasoning  “Imagine if…”  

Apply/Test/Defend Reasoning for supporting idea/solution “It would work because…” 

Follow-Up Inquiry Questions based on new understanding  “Considering this, how…?”  

Note. Text segments which could not be characterized by any of these indicators were coded as “Other”. 


19 

Natural Language Processing 

Following our coding of cognitive presence, we processed these 1852 thematic 

units of student discussion post data through the LIWC software to identify relative levels 

of cognitive processing language. As previously described, the cognitive processing 

category is a composite of 797 target words associated with the expression of insight, 

causation, discrepancy, certainty, differentiation, and tentativeness (Pennebaker et al., 

2015). See Table 3 for examples of words included in this LIWC category. 

Table 3 

Cognitive processing subcategories and example target words. 

LIWC Category Examples Words in Category 

Cognitive Processes Cause, know, ought 797 

Insight Think, know 259 

Causation Because, effect 135 

Discrepancy Should, would 83 

Tentative Maybe, perhaps 178 

Certainty Always, never 113 

Differentiation Hasn’t, but, else 81 

Note. Adapted from Pennebaker et al. (2015). LIWC = Language Inquiry and Word Count (2015) software. 

The LIWC program processed these data and provided us with numerical values 

which quantifies the degree to which students are using language reflective of these 

processes. These values were then used in a series of regression analyses to investigate 

relations between cognitive processing language, cognitive presence, instructional 

modality, and students' academic performance in the course.


20 

Chapter 5: Results 

Prior to the analysis of specific research questions, we used descriptive statistics 

to examine general trends in the dataset. As illustrated in Table 4, students used a 

relatively high percentage of cognitive processing language on average, which also 

appeared to vary depending on the researcher-coded phase of cognitive presence. Among 

these, the exploration phase of cognitive presence was the most frequently occurring code 

in the data (29%), followed by integration (27%), triggering events (21%), resolution 

(17%), and “Other” (6%). Although less common, thematic units coded as the resolution 

phase of cognitive presence also yielded the highest values of cognitive processing 

language compared to the other three phases described in the Practical Inquiry model. 

Table 4 

Descriptive statistics of cognitive processing language for each phase of cognitive 

presence. 

Phase Thematic Units M SD 

Triggering Event 396 18.49 6.94 

Exploration 539 19.10 6.44 

Integration 505 18.86 5.44 

Resolution 309 20.26 5.44 

Other 103 16.49 8.23 

Research Question 1 

After examining these descriptive statistics, we turned to the investigation of our 

first research question: To what extent are the linguistic features of student discussion 

posts, defined as cognitive processes, viable indicators of higher-level cognition presence 

described by the CoI framework?  

To answer this question, we developed a series of candidate models estimating the 

effects of each phase of cognitive presence on students’ cognitive processing language 


21 

scores. However, given the inherently nested structure of the data set, the standard 

assumption of independent observations required for multiple linear regression could not 

be met and linear mixed modeling was used. Specifically, each candidate model 

incorporated nested random effects to account for both the individual differences between 

students as well as unwanted variability introduced as a result of repeated observations.  

In each model, we included the fixed effect(s) of cognitive presence phase(s) 

predicting the degree to which an observation contained language associated with 

cognitive processing. The first developed model (Model A) comprised all four phases of 

cognitive presence. We then employed backward elimination to identify whether this 

fully specified model could be reduced by sequentially eliminating lower-level variables 

without excessively compromising model fit. Specifically, we began by eliminating the 

term associated with the triggering event phase, followed by exploration, and finally, 

integration. From this elimination procedure, three additional candidate models were 

produced, and observations of diagnostic plots determined that the assumptions for 

regression analyses were reasonably met. After fitting each candidate model, we 

examined corrected Akaike Information Criteria (AICc) values to identify the model of 

best fit.  

See Table 5 for coefficient-level estimates, variance components, and information 

criteria used for model comparison. At the coefficient-level, intercepts represent the 

approximate means for observations not coded as any cognitive presence phase, whereas 

model estimates for each phase indicate the estimated change in cognitive processing 

values relative to its intercept. 


22 

Table 5 

Coefficients and standard errors for four candidate models predicting use of cognitive 

processing language in online discussion posts of undergraduate college students. 

 Cognitive Processing Language 

 Model A Model B Model C Model D 

Fixed Effects 

Intercept 16.43 (0.64) 18.06 (0.34) 18.64 (0.27) 18.67 (0.25) 

Triggering Event 2.05 (0.68)    

Exploration 2.74 (0.66) 1.11 (0.38)   

Integration 2.33 (0.67) 0.69 (0.39) 0.12 (0.34)  

Resolution 3.88 (0.70) 2.25 (0.45) 1.67 (0.41) 1.63 (0.39) 

Random Effects 

σ2
Discussion  1.58 1.48 1.32 1.58 

σ2
Student 1.80 1.79 1.78 1.78 

σ2
Residual 35.66 35.94 36.26 35.67 

Goodness of Fit 

AIC 12016.3 12023.4 12029.8 12027.9 

BIC 12060.5 12062.1 12063.0 12063.0 

Note. All models were fitted using Maximum Likelihood Estimation. Standard errors in parentheses. AIC = Akaike 

Information Criteria. BIC = Bayesian Information Criterion. 

These comparisons revealed that, given the data and other candidate models, 

Model A demonstrated the strongest empirical evidence of predicting cognitive 

processing language. This model consisted of all four phases included in the practical 

inquiry model, all of which contributed to a meaningful degree of variation in cognitive 

processing scores. This model shows that higher levels of cognitive processing language 

are associated with the more advanced phases of cognitive presence. Specifically, it 

predicts the highest use of cognitive processing language for student messages 

characterized as the resolution phase of cognitive presence, followed by exploration, 

integration, and triggering event(s). Model comparisons also revealed that including the 

lower-level parameters (i.e., triggering event and exploration) did not contribute to 

unnecessary model complexity.  

As previously mentioned, the data set incorporated multiple observations of 


23 

cognitive processing language for each student’s discussion post, and each student 

provided multiple discussion post contributions. Thus, it was necessary to understand 

how much of the variation in cognitive processing language scores is due to individual 

differences in students’ writing style or the level of cognitive processing language which 

a particular discussion post might have elicited compared to others. In the adopted model 

(Model A), the writing style characteristic to individual students accounted for about 5% 

of the total variance in cognitive processing language, while about 4% of the variance in 

cognitive processing language in the data was associated with discussion post-level 

differences. 

Research Question 2 

After selecting a model predicting cognitive processing language based on 

cognitive presence phases, we underwent a similar process to address the second research 

question: To what extent does variation in the instruction modality (blended vs. fully 

online) of a college course influence student’s cognitive processing language in online 

discussion forums?  

Analysis of this question included the development and evaluation of a model 

which incorporating the effect of instructional modality on cognitive processing language 

(Model E). Like the model adopted in the previous analysis, Model E included all four 

fixed effects of cognitive presence phases as well as a nested effect structure. However, 

unlike Model A, we incorporated an additional fixed effect: a binary predictor reflecting 

whether a particular thematic unit was written by a student in either the blended or fully 

online version of the course (1 = blended; 0 = fully online).  

Following model development, two criteria were employed in deciding whether to 


24 

keep instruction modality as a parameter in the final adopted model. First, we conducted 

a Likelihood Ratio Test to compare relative goodness-of-fit between Model A and Model 

E. Results from this test showed that adding the instruction modality parameter failed to 

explain a meaningful amount of variation in cognitive processing language scores, χ2(1) 

= 0.159, p = .689. In addition, AICc value comparison between the two models suggested 

that the cost of including this additional parameter outweighed the impact of the 

parameter’s effect.  Provided with these two criteria, we determined that Model A 

persisted as the strongest candidate model for our data.  

Research Question 3 

We then sought to address our final research question: To what extent does 

cognitive processing language influence students' academic performance in an online 

course? We employed a simple linear regression to test if students’ use of cognitive 

processing language in online discourse predicted their performance on a major written 

assessment. The predictor, or independent variable in this model, comprised the sum of 

all cognitive processing words used by students across all their individual contributions to 

the online discussion forum (M = 485.20, SD = 131.06). The outcome, or dependent 

variable in this model included student’s scores on the final term paper, an assessment 

which comprised 30% of their final grade in the course. Students could earn a maximum 

of 300 points (M = 254.71, SD = 44.47). Note that assessment data from four of the fifty-

three student participants included in the prior analyses were missing, requiring their 

exclusion from this regression (N = 49).  

The developed model explained a statistically significant proportion of the 

variance in final term paper scores, R2 = 0.096, F(1, 47) = 4.97, p < .05. At the coefficient 


25 

level, each additional cognitive processing word used by students in their discussion 

posts was associated with a 0.10 increase in points earned on their final term paper, a 

finding which was deemed statistically significant (α = 0.05; see Table 6). 

Table 6 

Estimates of a model predicting students’ final term paper scores based on cognitive 

processing words used in online learning discourse. 

   95% CI  

Effect Estimate SE LL UL p 

Intercept 203.79 23.64 156.24 251.34 < .001 

Cognitive Processing Words 0.10 0.05 0.01 0.20 .031 

Note. N = 49. CI = confidence interval; LL = lower limit; UL = upper limit. 

On a larger scale, the model estimates predict a near 14-point increase (β = 13.76) 

in final term paper scores for each standard deviation increase in cognitive processing 

language (SD = 131.06). Considering this assessment was graded out of 300 points, a 

difference of 14 points accounts for nearly half a letter grade (i.e., receiving an “A” over 

a “B” grade, etc.). In addition, a follow-up analyses using a Welch t-test found 

differences in term paper scores between the two groups to be non-significant, t(44.2) = -

0.564, p = .576, suggesting that variation in instructional modality were unlikely to have 

confounded the effect of cognitive processing language on academic performance.


26 

Chapter 6: Discussion 

Overall, we found a high level of cognitive processing language in students’ 

contributions to online discussions. Across all units, cognitive processing language 

comprised 16 – 20% of the total word count. To contextualize this finding, Pennebaker et 

al. (2015) reported that, in a corpus of both online and physical texts, cognitive 

processing language only comprised about 11% of the total word count. For cognitive 

presence, the most commonly observed phase was exploration (29%), closely followed 

by integration (27%). Considering that this finding is consistent with prior research in the 

CoI framework (Garrison et al., 1999; Joksimović et al., 2014; Park, 2009), we can 

conclude that the forms of engagement most characteristic of online discussion forums 

include the exchange of information/brainstorming ideas and the synthesis of said 

information/ideas into more coherent representations. However, it is important to note 

that triggering events (21%) and resolution (17%) were also semi-regularly occurring 

phases. 

Cognitive Presence Model 

In the first research question, we sought to operationalize online learner’s 

cognitive presence via linguistic proxies generated by the LIWC software. To do so, we 

manually coded thematic units from student discussion posts according to the practical 

inquiry model of  cognitive presence. Subsequently, we developed a series of 

theoretically-informed candidate models using combinations of the four cognitive 

presence codes as predictors of cognitive processing language. Examination of model 

evidence (AICc values) revealed that all four phases of cognitive presence were 

important for predicting cognitive processing language scores. Our findings suggest that 


27 

the automated analyses of discussion transcripts can produce linguistic proxies for the 

phase of cognitive presence, perhaps avoiding the feasibility concerns associated with 

manual content analysis of large sets of data (Kovanović et al., 2016). 

Our model suggests that the amount of cognitive processing language used by 

students in online discourse depends on their progression through the cycle of Practical 

Inquiry. Between the four phases of cognitive presence, the resolution phase was 

associated with the greatest use of cognitive processing language in online learning 

discourse, followed by exploration, integration, and triggering events. The Practical 

Inquiry model suggests resolution to be the highest level of cognitive presence and is 

associated with deep-level learning and critical thinking (Garrison et al., 1999). 

Correspondingly, it is often noted to be one of the most difficult phases of the practical 

inquiry model to assess in the manual content analysis procedures used in prior research 

(Akyol & Garrison, 2011; Garrison & Arbaugh, 2007), motivating the present study’s 

effort to expand our measurement capabilities. 

Our adopted model also showed a greater effect of the exploration phase on 

cognitive processing language compared to integration. This finding might appear 

theoretically inconsistent given that the Practical Inquiry model places the integration 

phase as a relatively higher indicator of cognition. We offer two potential explanations 

for why this might be the case. First, because LIWC calculates cognitive processing 

language values based on the composite of several subcategories, it is possible that units 

coded as exploration contained higher values in one or two specific subcategories 

compared to integration, potentially resulting in higher overall cognitive processing 

values. This explanation is supported by Joksimović et al. (2014), who found a notably 


28 

higher concentration of insight words (e.g., ‘think’, ‘know’, etc.) in the exploration 

compared to the integration phase. A second potential explanation for these findings may 

result from the dictionary-based approach used by the LIWC program. It is possible that 

the type of discursive activity occurring during moments of exploration (i.e., information 

exchange/brainstorming) naturally lends itself to be more easily captured by word-count 

based measures compared to the activity occurring in moments of integration, or idea 

synthesis. Measurement of the integration phase might require more complex NLP 

models, such as ones that incorporate the semantic relationships between words or the 

overall coherence of written text. 

Overall, it is important to emphasize that the cycle of Practical Inquiry is fluid, 

meaning that discursive activity is not easily bounded within a strict realm of four 

discrete cognitive presence phases (as evidenced by a moderate proportion of thematic 

units coded as ‘Other’). However, when taken as a whole, the present study found that 

student’s progression from low- to high-level cognitive presence is reflected by their 

increased usage of words relating to cognitive processing. 

Instruction Modality 

The existing research literature on distance/online learning has shown that such 

modalities are at least as effective as traditional, face-to-face learning environments in 

supporting positive student learning outcomes (Siemens et al., 2015). Distance learning 

can take many forms, such as blended/hybrid or fully online, yet little inquiry has been 

made about how these specific instruction modalities might affect the development of 

cognitive presence in student discussions. Because blended modalities offer a 

combination of online and face-to-face interaction, one might assume these offer 


29 

relatively more opportunities for learners to engage in reflective inquiry (Garrison & 

Kanuka, 2004) and prior research in the CoI framework has provided some moderate 

support for that hypothesis (Akyol & Garrison, 2011). However, the results from our 

analysis do not support this intuition. Given our data and model findings, we suggest that 

the two modes of instruction are equivalent in terms of the cognitive processing language 

used by students during asynchronous discussion. 

Academic Performance 

Conceptually, cognitive presence can be understood as a learning process defined 

by the progression through a cycle of practical inquiry (Garrison, 2007). Accordingly, it 

would make sense for researchers to want to discern the impact of this learning process 

on actualized student outcomes. Such ‘learning products’ are impactful for students' 

continued academic development and eventual attainment of undergraduate college 

degrees. In the final set of analyses, we investigated the effect of students’ cognitive 

processing language use on their academic performance in a college course. Our findings 

revealed that, when students use more language indicative of cognitive processing, they 

also scored higher on their final term papers. 

Prior research using the LIWC instrument has demonstrated an impact of students' 

use of small words, such as articles and pronouns, on their academic outcomes 

(Pennebaker et al., 2014; Robinson et al., 2013). Other literature has correspondingly 

found strong associations between coded cognitive presence phases and final course 

performance (Akyol & Garrison, 2011; Galikyan & Admiraal, 2019; Guo et al., 2021). 

However, no research to our knowledge has incorporated cognitive processing language 

in their analysis of cognitive presence to study such outcomes. This study is the first to 


30 

model these academic outcomes by operationalizing cognitive presence through 

automated linguistic proxies. Our combined findings suggest that cognitive processing 

words reflect meaningful processes (i.e., cognitive presence) essential for student 

learning in online environments. 

Limitations 

The present study included a few limitations worthy of acknowledgement. First, 

the LIWC software is restricted to the classification of individual words, as opposed to 

the classification of sentences, paragraphs, etc. Unlike human coders, it cannot detect 

irony, sarcasm, idiom, or any other sub-textual characteristics contained within the text 

input. This is precisely the reason we refer to cognitive processing as a surface-level 

measurement of students’ cognitive presence, as LIWC only incorporates directly 

observable characteristics of text input (e.g., frequencies of free morphemes, punctuation 

marks, etc.) in its analysis of written text. This study did attempt to account for this 

limitation by integrating manual content analysis, which is generally much more 

considerate of subtext given its reliance on human interpretation. 

The second limitation of this study pertains to the unit of analysis. Instead of 

syntactical units (sentences, paragraphs, whole messages, etc.), we opted for thematic 

units, or ‘units of meaning’ (Rourke et al., 2001). Compared to segmenting by message or 

paragraph, these offered a better representation of our data; however, thematic units are 

not without their own disadvantages. De Wever et al. (2006) note that these units are 

often poorly operationalized and vulnerable to subjective research interpretations, raising 

concerns about generalizability. To mitigate this limitation, we developed a segmentation 

protocol which was found to be sufficiently reliable. 


31 

The final limitation regards our data sampling context, specifically for the 

Semester B course offering. Students in the Semester B group had enrolled in a fully 

online version of this course because of public health restrictions mandated by the 

COVID-19 pandemic. Under normal conditions, college students select coursework 

based on a variety of factors, including instruction modality (McPartlan et al., 2021). 

Students in this study did not have this choice, meaning individual differences based on 

modality preferences were not accounted for. However, this might have been 

advantageous, as students could not self-select into their preferred instruction modalities, 

minimizing potential sampling bias. 

Conclusion 

The present study offers a unique methodological approach to the study of 

cognitive presence, which is a critical dimension of student learning in online 

Communities of Inquiry. In the CoI research literature, quantitative content analysis is the 

traditional methodological approach for describing a learner's cognitive presence. This 

approach is considerably time-consuming and labor-intensive, a constraint which may 

restrain the ability for instructors in higher education to apply the CoI framework in the 

real-time monitoring of their student’s learning (Kovanović et al., 2016). In response to 

these challenges, this study used surface-level linguistic indicators of cognitive 

processing as a proxy measure of students' cognitive presence in online asynchronous 

discourse. We found high levels of language reflecting students’ cognitive presence in 

asynchronous online discussion forums, which varied depending on their phase of 

practical inquiry. From these findings, we conclude that asynchronous online discussion 

forums can be a potent method for online instructors to advance student learning, and that 


32 

automated linguistic measures of cognitive presence may serve as effective indicators of 

this learning.


33 

References 

Adedoyin, O. B., & Soykan, E. (2020). Covid-19 pandemic and online learning: The 

challenges and opportunities. Interactive Learning Environments, 0(0), 1–13. 

https://doi.org/10.1080/10494820.2020.1813180 

Akyol, Z., & Garrison, R. (2011). Understanding cognitive presence in an online and 

blended community of inquiry: Assessing outcomes and processes for deep 

approaches to learning. British Journal of Educational Technology, 42(2), 233–

250. https://doi.org/10.1111/j.1467-8535.2009.01029.x 

Al-Husban, N. A. (2020). Critical thinking skills in asynchronous discussion forums: A 

case study. International Journal of Technology in Education, 3(2), 82–91. 

Anderson, T., Rourke, Liam, Garrison, R., & Archer, W. (2001). Assessing teaching 

presence in a computer conferencing context. 

https://auspace.athabascau.ca/handle/2149/725 

Bernard, R., Borokhovski, E., Schmid, R., Tamim, R., & Abrami, P. (2014). A meta-

analysis of blended learning and technology use in higher education: From the 

general to the applied. Journal of Computing in Higher Education, 26. 

https://doi.org/10.1007/s12528-013-9077-3 

Boyd, R., & Pennebaker, J. (2015). A way with words: Using language for psychological 

science in the modern era (pp. 222–236). 

Chowdhary, K. R. (2020). Natural Language Processing. In K. R. Chowdhary (Ed.), 

Fundamentals of Artificial Intelligence (pp. 603–649). Springer India. 

https://doi.org/10.1007/978-81-322-3972-7_19 

Chowdhury, G. G. (2003). Natural Language Processing. Annual Review of Information 

Science and Technology, 37(1), 51–89. https://doi.org/10.1002/aris.1440370103 

Crawford, J., Butler-Henderson, K., Rudolph, J., Malkawi, B., Glowatz, M., Burton, R., 

Magni, P., & Lam, S. (2020). COVID-19: 20 countries’ higher education intra-

period digital pedagogy responses. Journal of Applied Learning & Teaching, 3(1), 

Article 1. https://doi.org/10.37074/jalt.2020.3.1.7 

De Wever, B., Schellens, T., Valcke, M., & Van Keer, H. (2006). Content analysis 

schemes to analyze transcripts of online asynchronous discussion groups: A 

review. Computers & Education, 46(1), 6–28. 

https://doi.org/10.1016/j.compedu.2005.04.005 

Dewey, J. (1910). How we think. D.C. Heath & Co. 

Donnelly, R., & Gardner, J. (2011). Content analysis of computer conferencing 

transcripts. Interactive Learning Environments, 19(4), 303–315. 

https://doi.org/10.1080/10494820903075722 

Galikyan, I., & Admiraal, W. (2019). Students’ engagement in asynchronous online 

discussion: The relationship between cognitive presence, learner prominence, and 

academic performance. The Internet and Higher Education, 43, 100692. 

https://doi.org/10.1016/j.iheduc.2019.100692 

Garrison, D. R. (2007). Online community of inquiry review: Social, cognitive, and 

teaching presence issues. Journal of Asynchronous Learning Networks, 11(1), 61–

72. 


34 

Garrison, D. R., & Akyol, Z. (2013). Toward the development of a metacognition 

construct for communities of inquiry. The Internet and Higher Education, 17, 84–

89. https://doi.org/10.1016/j.iheduc.2012.11.005 

Garrison, D. R., Anderson, T., & Archer, W. (1999). Critical inquiry in a text-based 

environment: Computer conferencing in higher education. The Internet and 

Higher Education, 2(2–3), 87–105. https://doi.org/10.1016/S1096-

7516(00)00016-6 

Garrison, D. R., Anderson, T., & Archer, W. (2001). Critical thinking, cognitive 

presence, and computer conferencing in distance education. American Journal of 

Distance Education, 15(1), 7–23. https://doi.org/10.1080/08923640109527071 

Garrison, D. R., & Arbaugh, J. B. (2007). Researching the community of inquiry 

framework: Review, issues, and future directions. The Internet and Higher 

Education, 10(3), 157–172. https://doi.org/10.1016/j.iheduc.2007.04.001 

Garrison, D. R., & Cleveland-Innes, M. (2005). Facilitating cognitive presence in online 

learning: Interaction is not enough. American Journal of Distance Education, 

19(3), 133–148. https://doi.org/10.1207/s15389286ajde1903_2 

Garrison, D. R., & Kanuka, H. (2004). Blended learning: Uncovering its transformative 

potential in higher education. The Internet and Higher Education, 7(2), 95–105. 

https://doi.org/10.1016/j.iheduc.2004.02.001 

Guo, P., Saab, N., Wu, L., & Admiraal, W. (2021). The Community of Inquiry 

perspective on students’ social presence, cognitive presence, and academic 

performance in online project-based learning. Journal of Computer Assisted 

Learning, 37(5), 1479–1493. https://doi.org/10.1111/jcal.12586 

Hayati, H., Abdessamad, C., Idrissi, M., & Bennani, S. (2019). Doc2vec & Naïve Bayes: 

Learners’ cognitive presence assessment through asynchronous online discussion 

TQ transcripts. International Journal of Emerging Technologies in Learning 

(iJET), 14, 70. https://doi.org/10.3991/ijet.v14i08.9964 

Hayati, H., Khalidi Idrissi, M., & Bennani, S. (2020). Automatic classification for 

cognitive engagement in online discussion forums: Text mining and machine 

learning approach. In I. I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. 

Millán (Eds.), Artificial Intelligence in Education (Vol. 12164, pp. 114–118). 

Springer. https://doi.org/10.1007/978-3-030-52240-7_21 

Henri, F. (1992). Computer conferencing and content analysis. In A. R. Kaye (Ed.), 

Collaborative Learning Through Computer Conferencing (pp. 117–136). Springer 

Berlin Heidelberg. https://doi.org/10.1007/978-3-642-77684-7_8 

Hew, K. F., & Cheung, W. S. (2014). Students’ and instructors’ use of massive open 

online courses (MOOCs): Motivations and challenges. Educational Research 

Review, 12, 45–58. https://doi.org/10.1016/j.edurev.2014.05.001 

Hind, H., Khalidi Idrissi, M., & Bennani, S. (2018). Automatic assessment of CoI-

cognitive presence within asynchronous online learning. 17th International 

Conference on Information Technology Based Higher Education and Training 

(ITHET), 1–5. https://doi.org/10.1109/ITHET.2018.8424791 

Hoppe, H. U. (2017). Computational methods for the analysis of learning and knowledge 

building communities. In Handbook of learning analytics (First, pp. 23–33). 

Society for Learning Analytics Research (SoLAR). 


35 

Joksimović, S., Gašević, D., Kovanović, V., Adesope, O., & Hatala, M. (2014). 

Psychological characteristics in cognitive presence of communities of inquiry: A 

linguistic analysis of online discussions. The Internet and Higher Education, 22, 

1–10. https://doi.org/10.1016/j.iheduc.2014.03.001 

Knight, S., & Littleton, K. (2015). Discourse Centric Learning Analytics: Mapping the 

terrain. Journal of Learning Analytics, 2(1), Article 1. 

https://doi.org/10.18608/jla.2015.21.9 

Kovanović, V., Joksimović, S., Gašević, D., & Hatala, M. (2014). Automated cognitive 

presence detection in online discussion transcripts. CEUR Workshop Proceedings, 

1137. https://www.research.ed.ac.uk/en/publications/automated-cognitive-

presence-detection-in-online-discussion-trans 

Kovanović, V., Joksimović, S., Waters, Z., Gašević, D., Kitto, K., Hatala, M., & 

Siemens, G. (2016). Towards automated content analysis of discussion 

transcripts: A cognitive presence case. Proceedings of the Sixth International 

Conference on Learning Analytics & Knowledge - LAK ’16, 15–24. 

https://doi.org/10.1145/2883851.2883950 

Krippendorff, K. (2003). Content analysis: An introduction to its methodology. SAGE 

Publications. 

McPartlan, P., Rutherford, T., Rodriguez, F., Shaffer, J. F., & Holton, A. (2021). 

Modality motivation: Selection effects and motivational differences in students 

who choose to take courses online. The Internet and Higher Education, 49, 

100793. https://doi.org/10.1016/j.iheduc.2021.100793 

Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The effectiveness of online and 

blended learning: A meta-analysis of the empirical literature. Teachers College 

Record: The Voice of Scholarship in Education, 115(3), 1–47. 

https://doi.org/10.1177/016146811311500307 

Moore, R. L., Oliver, K. M., & Wang, C. (2019). Setting the pace: Examining cognitive 

processing in MOOC discussion forums with automatic text analysis. Interactive 

Learning Environments, 27(5–6), 655–669. 

https://doi.org/10.1080/10494820.2019.1610453 

Neto, V., Rolim, V., Ferreira, R., Kovanović, V., Gašević, D., Dueire Lins, R., & Lins, R. 

(2018). Automated analysis of cognitive presence in online discussions written in 

Portuguese. In V. Pammer-Schindler, M. Pérez-Sanagustín, H. Drachsler, R. 

Elferink, & M. Scheffel (Eds.), Lifelong Technology-Enhanced Learning (pp. 

245–261). Springer International Publishing. https://doi.org/10.1007/978-3-319-

98572-5_19 

Neto, V., Rolim, V., Pinheiro, A., Lins, R. D., Gašević, D., & Mello, R. F. (2021). 

Automatic content analysis of online discussions for cognitive presence: A study 

of the generalizability across educational contexts. IEEE Transactions on 

Learning Technologies, 14(3), 299–312. 

https://doi.org/10.1109/TLT.2021.3083178 

Park, C. L. (2009). Replicating the use of a cognitive presence measurement tool. Journal 

of Interactive Online Learning, 8(2), 16. 

Pena-Shaff, J. B., & Nicholls, C. (2004). Analyzing student interactions and meaning 

construction in computer bulletin board discussions. Computers & Education, 

42(3), 243–265. https://doi.org/10.1016/j.compedu.2003.08.003 


36 

Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development 

and psychometric properties of LIWC2015. https://doi.org/10.15781/T29G6Z 

Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., & Beaver, D. I. (2014). 

When small words foretell academic success: The case of college admissions 

essays. PLoS ONE, 9(12), e115844. https://doi.org/10.1371/journal.pone.0115844 

Robinson, R. L., Navea, R., & Ickes, W. (2013). Predicting final course performance 

from students’ written self-introductions: A LIWC analysis. Journal of Language 

and Social Psychology, 32(4), 469–479. 

https://doi.org/10.1177/0261927X13476869 

Rourke, L., Anderson, T., Garrison, D. R., & Archer, W. (2001). Methodological issues 

in the content analysis of computer conference transcripts. International Journal 

of Artificial Intelligence in Education, 11. 

https://auspace.athabascau.ca/handle/2149/715 

Shea, P., & Bidjerano, T. (2010). Learning presence: Towards a theory of self-efficacy, 

self-regulation, and the development of a communities of inquiry in online and 

blended learning environments. Computers & Education, 55(4), 1721–1731. 

https://doi.org/10.1016/j.compedu.2010.07.017 

Shea, P., & Bidjerano, T. (2012). Learning presence as a moderator in the community of 

inquiry model. Computers & Education, 59(2), 316–326. 

https://doi.org/10.1016/j.compedu.2012.01.011 

Shea, P., Hayes, S., Uzuner-Smith, S., Gozza-Cohen, M., Vickers, J., & Bidjerano, T. 

(2014). Reconceptualizing the community of inquiry framework: An exploratory 

analysis. The Internet and Higher Education, 23, 9–17. 

https://doi.org/10.1016/j.iheduc.2014.05.002 

Siemens, G. (2013). Learning Analytics: The emergence of a discipline. American 

Behavioral Scientist, 57(10), 1380–1400. 

https://doi.org/10.1177/0002764213498851 

Siemens, G., Gašević, D., & Dawson, S. (2015). Preparing for the digital university: A 

review of the history and current state of distance, blended and online learning. 

Athabasca University Press. 

https://research.monash.edu/en/publications/preparing-for-the-digital-university-

a-review-of-the-history-and- 

Tan, C. L., & Ng, L. L. (2014). Assessing critical thinking performance of postgraduate 

students in threaded discussions. In International Association for Development of 

the Information Society. International Association for the Development of the 

Information Society. https://eric.ed.gov/?id=ED557315 

Van Wart, M., Ni, A., Medina, P., Canelon, J., Kordrostami, M., Zhang, J., & Liu, Y. 

(2020). Integrating students’ perspectives about online learning: A hierarchy of 

factors. International Journal of Educational Technology in Higher Education, 

17(1), 53. https://doi.org/10.1186/s41239-020-00229-8 

Vygotsky, L. S. (1962). Thought and language (E. Hanfmann & G. Vakar, Eds.). MIT 

Press. https://doi.org/10.1037/11193-000 

Vygotsky, L. S. (1978). Mind in society: Development of higher psychological processes 

(M. Cole, V. Jolm-Steiner, S. Scribner, & E. Souberman, Eds.). Harvard 

University Press. https://doi.org/10.2307/j.ctvjf9vz4 


37 

Zhao, Y., Lei, J., Yan, B., Lai, C., & Tan, H. S. (2005). What makes the difference? A 

practical analysis of research on the effectiveness of distance education. Teachers 

College Record, 107(8), 1836–1884. https://doi.org/10.1111/j.1467-

9620.2005.00544.x 


38 

Appendix A 

During most weeks, there will be one class devoted to Discussion. (During other 

weeks, there will be two.) On Discussion days, we will read and discuss one or more 

scientific articles. When the readings are marked “***”, you must post a question or 

comment about one of these articles to the discussion forum on the Canvas site. Your 

question/comment should be at least one paragraph long, and should represent your 

grappling with the article. 

 
• It might concern an aspect that you are not sure you understand. If so, it is 

not enough to say that you didn’t understand X – you must say what you 

think X means, and why you are unsure of your understanding. 

• It might concern a problem with the theoretical claims, experimental 

design, or interpretation of the findings.  

• It might draw a connection to another article that we have read, or to a 

finding or theory that you know of but that we have not covered.  

• It might propose a new study, for example to rule out an alternate 

explanation or to build on the findings to address a new research question. 

Your question/comment should be directed to the whole class, not to me. I 

will select a subset of these questions/comments for further discussion in 

class.


39 

Appendix B 

You are required to write a final paper of approximately 10 double-spaced pages, 

with at least 8 full pages of body text. (The rest can include a mandatory list of 

references, an optional title page, and figures and tables as appropriate.) The topic can be 

one we discussed in class. It can also be a topic that we did not discuss in class, but which 

is relevant to educational psychology, subject to instructor approval. Your paper should 

review one aspect of the educational psychology literature in depth and provide a 

cohesive summary of conceptual and empirical advances. It should contain the following 

sections: an Introduction identifying the literature of interest and the research questions it 

targets, a Review section summarizing relevant studies in a principled way, a Discussion 

section evaluating the current state of the literature, and a Future Directions section 

discussing outstanding empirical questions and suggesting future studies. 

 
• If you feel more ambitious, your final paper can describe a small empirical 

study that you run to address an open question in cognitive psychology. 

The study should be co-designed with the instructor. Because this is a pilot 

study, the number of participants can be rather small (as few as five). The 

paper should contain the standard sections of an empirical paper: 

an Introduction reviewing the relevant literature and motivating your 

research question (this will be relatively short and will review only 3 or 4 

papers) 

• a Method section describing the mechanics of the study 

• a Results section describing your analyses of the data 

• a Discussion section interpreting the results 

• a Future Directions section suggesting possible follow-up experiments. 


	Acknowledgements
	Abstract
	Table of Contents
	Acknowledgements i
	Abstract ii
	Table of Contents iii
	List of Tables iv
	Chapter 1: Introduction 1
	Chapter 2: Literature Review 3
	Online Communities of Inquiry 3
	Cognitive Presence 4
	Discourse-Centric Learning Analytics 6

	Chapter 3: Current Study 11
	Chapter 4: Method 12
	Participants 12
	Materials and Data Collection 12
	Study Procedure 14

	Chapter 5: Results 20
	Research Question 1 20
	Research Question 2 23
	Research Question 3 24

	Chapter 6: Discussion 26
	Cognitive Presence Model 26
	Instruction Modality 28
	Academic Performance 29
	Limitations 30
	Conclusion 31

	References 33
	Appendix A 38
	Appendix B 39
	List of Tables
	Chapter 1: Introduction
	Chapter 2: Literature Review
	Online Communities of Inquiry
	Cognitive Presence
	Discourse-Centric Learning Analytics

	Chapter 3: Current Study
	Chapter 4: Method
	Participants
	Materials and Data Collection
	Discussion Posts
	Assessment Data

	Study Procedure
	Text Cleaning
	Unit of Analysis
	Qualitative Analysis
	Natural Language Processing


	Chapter 5: Results
	Research Question 1
	Research Question 2
	Research Question 3

	Chapter 6: Discussion
	Cognitive Presence Model
	Instruction Modality
	Academic Performance
	Limitations
	Conclusion

	References
	Appendix A
	Appendix B