Examination 
Of Three Practice Schedules for 
Single Digit Math 
 
A Dissertation 
SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA  
BY 
 
 
Kyle Wagner 
 
 
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF 
DOCTOR OF PHILOSOPHY 
 
 
Dr. Kristen McMaster, Advisor 
 
 
September 2019 
 
 
 
 
 
 
 
 
 
 
 
 
 
Copyright page 
Kyle B. Wagner 2019 © 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
i 
 
Acknowledgements 
This dissertation represents an important waypoint in my own personal and 
professional adventure, and like many epic quests would have been doomed to failure if 
undertaken alone.  I would like to offer my sincerest thanks to some of the guides 
companions, and fellow travelers who have helped me get here. 
First, my advisor, Kristen McMaster has been a better mentor than I could have 
imagined.  She has shaped my approach to research and professional conduct, and has 
pushed me to expect more of myself and my work.  I am a better researcher and 
professional because of her. 
Second, I would also like to thank my committee.  Their input and support has 
been incredibly valuable throughout this project.  This is a better dissertation because of 
their questions, recommendations, and time. 
   Third, I would like to thank my wife, Hallie.  Her example, reassurance, and 
support has kept me on track and focused.  She has encouraged me to follow my passion, 
and has helped me build confidence in my strengths.  Along with my wife, my parents 
have been in my corner and incredibly understanding of a son who can’t seem to avoid 
going back to school just one more time. 
Fourth, this project owes a lot to the team at FastBridge.  Their software was the 
backbone of this design and allowed me to focus on the science rather than burying 
myself in “how to” books for software design.   
Finally, I need to thank and acknowledge all of my family, friends, colleagues, 
professors, office staff, and everyone who has supported me and contributed to who I am.  
My time in this program has been amazing.  I have grown more than I could have 
imagined, and I couldn’t have done it without help from all of you. 
 
 
 
ii 
 
 
Dedication  
This dissertation is dedicated to my daughters, Penelope and Minerva.  Follow 
your passion.  Thank you for inspiring me to follow mine. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
iii 
 
Abstract 
The primary goal of this project is to expand and generalize the literature base for 
interleaved practice.  This study compares interleaved practice to repetitive practice and 
incremental rehearsal within the context of learning single digit math facts.  Third grade 
(n = 34) and fourth grade (n = 40) students learned target single digit math facts in one of 
three practice schedules.  Using a within-subjects counterbalanced and crossed design, 
students were exposed to three different learning conditions.  Comparisons were made 
regarding accuracy of responses during acquisition trials and retention trials, as well as 
learning efficiency.  Results indicated very few differences between practice conditions 
regarding acquisition accuracy, increased accuracy during retention trials for interleaved 
and incremental rehearsal practice, and higher learning efficiency for interleaved practice 
when compared to incremental rehearsal.  Student pretest accuracy moderated effects of 
practice schedule and opportunities to practice resulting in different outcomes for 
students with different levels of mastery at the outset of the intervention.  This study is 
the first comparison of interleaved and incremental rehearsal practice, and the results 
suggest that interleaved practice is the most efficient schedule for drilling math facts. 
 
 
 
 
 
 
 
 
iv 
 
 
v 
 
Table of Contents 
List of Tables ................................................................................................................................ vii 
List of Figures .............................................................................................................................. viii 
Chapter 1 INTRODUCTION ....................................................................................................... 1 
Problem Statement..................................................................................................................... 5 
Learning Targets ........................................................................................................................ 6 
Study Purpose and Research Questions................................................................................... 9 
Hypotheses ................................................................................................................................ 10 
Structure of Dissertation ......................................................................................................... 11 
Chapter 2 LITERATURE REVIEW ......................................................................................... 12 
Conceptual Framework and Relevant Constructs ................................................................ 13 
Background and Groundwork ................................................................................................ 16 
Benefits of interleaving. ....................................................................................................... 16 
Task characteristics and levels of interleaving. ................................................................. 18 
Synthesis of the background. .............................................................................................. 24 
Purpose of the Present Review ................................................................................................ 25 
Method ...................................................................................................................................... 25 
Results ....................................................................................................................................... 26 
Discussion ................................................................................................................................. 32 
Limitations of the Review .................................................................................................... 34 
Future Research ................................................................................................................... 37 
Chapter 3  METHODS ................................................................................................................ 39 
Research Questions Revisited ................................................................................................. 39 
Setting and Participants .......................................................................................................... 39 
Design ........................................................................................................................................ 41 
Materials ................................................................................................................................... 44 
Student interface. ................................................................................................................. 44 
Social validity questionnaire. .............................................................................................. 46 
Procedure .................................................................................................................................. 47 
Measures ................................................................................................................................... 48 
Outcome variables. .............................................................................................................. 48 
Predictor variables. .............................................................................................................. 50 
vi 
 
Analysis Plan ............................................................................................................................ 51 
Research Question 1. ........................................................................................................... 51 
Research Question 2. ........................................................................................................... 54 
Research Question 3. ........................................................................................................... 55 
Model Checking ....................................................................................................................... 56 
Chapter 4  RESULTS .................................................................................................................. 57 
Random Assignment ................................................................................................................ 57 
Combining Addition and Multiplication for the Analysis .................................................... 57 
Research Question 1 ................................................................................................................ 58 
Research Question 2 ................................................................................................................ 61 
Research Question 3 ................................................................................................................ 71 
Targets Correct at Immediate and Delayed Posttests ...................................................... 72 
Time in Practice by Practice Condition ............................................................................. 76 
Correct Targets per 20 Minutes of Practice ...................................................................... 79 
Model Checking for Research Question 3 ......................................................................... 83 
Chapter 5  DISCUSSION ............................................................................................................ 89 
REFERENCES ............................................................................................................................. 99 
Appendices .................................................................................................................................. 112 
Appendix A: Candidate Models for Research Question 2 ..................................................... 112 
 
  
 
vii 
 
List of Tables 
 
Table 1 Examples of Practice Schedules ......................................................................................... 4 
Table 2 Examples of Interleaving at Different Levels and Dimensions ........................................ 22 
Table 3 Summary of Results for Interleaved Practice Schedule Studies ....................................... 27 
Table 4 Summary of Task Characteristics and How Studies Interleaved ...................................... 36 
Table 5 Participant Characteristics ................................................................................................ 40 
Table 6 Latin Square for Counterbalancing .................................................................................. 42 
Table 7 Student Experience of Study ............................................................................................. 42 
Table 8  Breakdown and Planned Spacing of Acquisition vs Retention Trials Across Practice 
Sessions .......................................................................................................................................... 43 
Table 9 Breakdown and Example of Altered Spacing of Acquisition vs Retention Trials Across 
Practice Sessions ............................................................................................................................ 44 
Table 10 Problem Organization ..................................................................................................... 46 
Table 11 Variable List ................................................................................................................... 53 
Table 12 Number of Participants in Each Condition ..................................................................... 57 
Table 13 Addition and Multiplication First Pretest Descriptive Statistics for Proportion Correct
 ....................................................................................................................................................... 58 
Table 14  Accuracy Rates by Operation and Practice Schedule .................................................... 59 
Table 15  Skew and Kurtosis for Three Numeric Predictors Used in Models for Research 
Question 2 ...................................................................................................................................... 62 
Table 16 Summary of the Nine Selected Retention Models Ordered by AICc Weight ................. 63 
Table 17 Parameter Estimates for the Nine Selected Models Ordered by AICc Weight With Odds 
Ratios Calculated from Beta Averages .......................................................................................... 64 
Table 18 Number of students in each Practice Schedule at Immediate and Delayed Posttests ..... 72 
Table 19 Summary Table for Model Predicting Correct Targets by Practice Schedule at 
Immediate Posttest ......................................................................................................................... 73 
Table 20 Summary Table for Model Predicting Correct Targets by Practice Schedule at Delayed 
Posttest ........................................................................................................................................... 73 
Table 21 Summary Table for Model Predicting Minutes in Practice by Practice Schedule and 
Pretest at Immediate Posttest ......................................................................................................... 77 
Table 22  Summary Table for Model Predicting Minutes in Practice by Practice Schedule and 
Pretest at Delayed Posttest ............................................................................................................. 78 
Table 23  Summary Table for Model Predicting Targets Learned per 20 Minutes of Practice by 
Practice Schedule and Pretest at Immediate Posttest ..................................................................... 80 
Table 24  Summary Table for Model Predicting Targets Learned per 20 Minutes of Practice by 
Practice Schedule and Pretest at Delayed Posttest ......................................................................... 80 
Table 25 Descriptive Statistics for Research Question 3 Model Residuals ................................... 83 
Table 26 Descriptive Statistics for Survey Questions .................................................................... 86 
Table A1 Candidate models for Research Question 2 ........................................................... 112 
 
viii 
 
 
List of Figures 
 
Figure 1. Representation of task dimensions. ................................................................................ 20 
Figure 2. Example of student interface. ......................................................................................... 45 
Figure 3. Histogram of correct and incorrect response frequencies by Incremental Rehearsal (IR), 
Interleaved (IL), and Repetitive (Rep) practice schedules. ............................................................ 61 
Figure 4. Graph of probability of a correct response on retention trials by practice schedule 
(Interleaved (IL), Incremental Rehearsal (IR), and Repetitive (Rep)) based on model 1 and split 
into four quartiles from the first pretest. ........................................................................................ 68 
Figure 5. Model 1 retention ROC Curve........................................................................................ 69 
Figure 6. Deviance residuals for retention model. ......................................................................... 70 
Figure 7. Half-normal plot for retention model. ............................................................................ 70 
Figure 8. Targets correct at immediate posttest ............................................................................. 75 
Figure 9. Targets correct at delayed posttest .................................................................................. 75 
Figure 10. Time in practice at immediate posttest ......................................................................... 78 
Figure 11. Time in practice at delayed posttest ............................................................................. 79 
Figure 12. Number of Targets Learned per 20 Minutes of Practice .............................................. 82 
Figure 13. Number of targets learned per 20 minutes of practice. ................................................. 82 
Figure 14. Density plot of residuals from models used to address research question 3. ................ 84 
Figure 15. Scatter plot of residuals from models used to address research question 3 with a loess 
smoother ......................................................................................................................................... 85 
Figure 16. Proportion of responses in each of seven responses for first survey question: “How 
helpful was this practice?” ............................................................................................................. 87 
Figure 17. Proportion of responses in each of seven responses for second survey question: “How 
fun was this practice?” ................................................................................................................... 87 
 
 
 
 
 
 
 
 
 
1 
 
Chapter 1 
INTRODUCTION 
Students’ acquisition, retention, and transfer of academic skills are critical goals 
of our education system.  Acquisition is conceptualized as gaining facility in a task in the 
short term (Kornell, Castel, Eich, & Bjork, 2010; Pashler, Rohrer, Cepeda, & Carpenter, 
2007; Sorensen & Woltz, 2016), retention is maintenance of that facility over some 
period of time, and transfer is applying increased facility in one skill to another skill 
(Healy, Kole, & Bourne, 2014).  Within the context of multi-tiered systems of supports, 
students who struggle to acquire and retain skills at the same rates as their classmates are 
specifically monitored and receive intervention to increase the trajectory of their learning.  
A core concept of an intensive intervention is that of engaging in instructional practices 
that are more efficient and effective than those provided within a standard core 
instructional approach. Teachers often direct students to practice specific skills with the 
belief that more practice leads to acquisition and retention.   
The conceptualization of the relation between practice and acquisition and 
retention has evolved from simple ideas such as the Total Time Hypothesis (Cooper & 
Pantle, 1967), in which the only important variable is presumed to be the number of 
practice opportunities (exposures to the target).  Subsequent investigation has highlighted 
the added benefit of the distribution of practice (Cepeda, Pashler, Vul, Wixted, & Rohrer, 
2006), showing that there are ways to modify practice to take better advantage of finite 
resources such as time, attention, and staffing.  Changing the schedule of practice to 
increase effectiveness and efficiency of classroom time is closely tied to the overarching 
2 
 
idea of intensive intervention.  This project is designed to further expand a literature base 
that can be used to inform instructional practices and increase the efficiency of learning 
in K-12 settings. 
 This project aims to take a practice schedule with some promise (interleaved 
practice) and compare it to practice schedules that are currently in use and to a dosage 
control.  To that end, this study is focused on the comparison of three practice schedules.   
Repetitive practice can serve as a business as usual dosage control.  This is a simple 
practice schedule and involves the learner repeating a single target skill for a set number 
of trials before switching to the next target.  Within an academic context a repetitive 
schedule might involve a student who is learning transcription skills writing a letter 
several times before moving to another letter, or a student studying math facts answering 
the same target problem a number of times before moving to the next problem.  In this 
study, repetition is being used to represent the total time hypothesis mentioned above.  It 
is a no-frills example of dosage.   
Incremental rehearsal practice schedules are evidence that the field has moved 
beyond the total time hypothesis.  They are a type of distributed practice (Varma & 
Schleisman, 2014) in which the learner inserts gradually increasing numbers of known 
skills in between unknown (target) skills.  If a student has a set of math facts that have 
been mastered, they would be interspersed between exposures of a target fact. 
Incremental rehearsal has been chosen because it is a practice schedule that has a solid 
foundation in the literature and has been widely adopted in classrooms.  Searches for 
incremental rehearsal return links to interventioncentral.org, school and district 
3 
 
webpages, and other education resources sites.  It has been established as an effective tool 
for acquiring and retaining academic skills (Burns, 2005; Joseph, 2006).   
Finally, an interleaved practice schedule has the learner switch between target 
skills.  In a classroom setting, a teacher might select three target math facts for a student 
and mix them together during a practice session.  An interleaved schedule can repeat the 
same pattern or have a pseudo-random pattern (ABC-ABC-ABC vs ACB-BCB-ACA).  
Interleaved schedules are much less prevalent in the scientific literature and the 
vocabulary of practitioners.  Despite a deep background in motor learning and cognitive 
literature (Carvalho & Goldstone, 2015; Magill & Hall, 1990; C. H. Shea, Kohl, & 
Indermill, 1990), these schedules have yet to have a large impact in educational 
psychology literature.  This study will focus on examining the utility of interleaved 
practice for learning single digit math facts.  See Table 1 for examples of all three 
schedules. 
  An example of these three schedules taken out of a classroom context 
might look at improving target basketball skills.  Imagine someone attempting to improve 
performance on shooting free throws, lay-ups, and a baseline three point shot in 
basketball.  If the player is instructed to shoot 10 shots from each place in a repetitive 
schedule they would shoot 10 lay-ups, then 10 free throws, and finally 10 baseline shots.  
In an incremental rehearsal schedule, a player would shoot a free throw, then make a 
chest pass (assuming that a chest pass is a mastered skill), then a free throw, then a chest 
pass and a bounce pass (again, assuming that a bounce pass is a mastered skill), and so on 
until the player has shot 10 free throws.  The process would then be repeated for the other 
4 
 
two target skills. A player practicing in an interleaved schedule would shoot a free throw, 
then a lay-up, and then a baseline shot.  The player will then switch skills each trial until 
10 shots from each position have been attempted.   
Table 1 
Examples of Practice Schedules 
Incremental Rehearsal  Repetitive  Interleaved 
Targets 4 x 6 4 x 6 7 x 9 7 x 9 8 x 3 8 x 3 
 
Targets 9 x 2 
 
Targets 3 x 5 
4 x 6 1 x 1 1 x 1 1 x 1 1 x 1 1 x 1 1 x 1 
 9 x 2 9 x 2   3 x 5 2 x 8 
7 x 9 4 x 6 2 x 3 7 x 9 2 x 3 8 x 3 2 x 3 
 7 x 7 9 x 2   2 x 8 7 x 9 
8 x 3 1 x 1 7 + 3 1 x 1 7 + 3 1 x 1 7 + 3 
 4 x 3 9 x 2   7 x 9 3 x 5 
 2 x 3 8 + 1 2 x 3 8 + 1 2 x 3 8 + 1     9 x 2     3 x 5 
  4 x 6 1 x 9 7 x 9 1 x 9 8 x 3 1 x 9     9 x 2     7 x 9 
  1 x 1 6 x 2 1 x 1 6 x 2 1 x 1 6 x 2     9 x 2     2 x 8 
  2 x 3 5 + 5 2 x 3 5 + 5 2 x 3 5 + 5     9 x 2     2 x 8 
  7 + 3 4 x 6 7 + 3 7 x 9 7 + 3 8 x 3     9 x 2     7 x 9 
  4 x 6 1 x 1 7 x 9 1 x 1 8 x 3 1 x 1     7 x 7     3 x 5 
  1 x 1 2 x 3 1 x 1 2 x 3 1 x 1 2 x 3     7 x 7     7 x 9 
  2 x 3 7 + 3 2 x 3 7 + 3 2 x 3 7 + 3     7 x 7     3 x 5 
  7 + 3 8 + 1 7 + 3 8 + 1 7 + 3 8 + 1     7 x 7     7 x 9 
  8 + 1 1 x 9 8 + 1 1 x 9 8 + 1 1 x 9     7 x 7     2 x 8 
  4 x 6 6 x 2 7 x 9 6 x 2 8 x 3 6 x 2     7 x 7     3 x 5 
  1 x 1 5 + 5 1 x 1 5 + 5 1 x 1 5 + 5     7 x 7     2 x 8 
  2 x 3 4 + 4 2 x 3 4 + 4 2 x 3 4 + 4     7 x 7     7 x 9 
  7 + 3 4 x 6 7 + 3 7 x 9 7 + 3 8 x 3     7 x 7     3 x 5 
  8 + 1 1 x 1 8 + 1 1 x 1 8 + 1 1 x 1     4 x 3     7 x 9 
  1 x 9 2 x 3 1 x 9 2 x 3 1 x 9 2 x 3     4 x 3     2 x 8 
  4 x 6 7 + 3 7 x 9 7 + 3 8 x 3 7 + 3     4 x 3     7 x 9 
  1 x 1 8 + 1 1 x 1 8 + 1 1 x 1 8 + 1     4 x 3     2 x 8 
  2 x 3 1 x 9 2 x 3 1 x 9 2 x 3 1 x 9     4 x 3     2 x 8 
  7 + 3 6 x 2 7 + 3 6 x 2 7 + 3 6 x 2     4 x 3     3 x 5 
  8 + 1 5 + 5 8 + 1 5 + 5 8 + 1 5 + 5     4 x 3     2 x 8 
  1 x 9 4 + 4 1 x 9 4 + 4 1 x 9 4 + 4     4 x 3     7 x 9 
  6 x 2 3 + 8 6 x 2 3 + 8 6 x 2 3 + 8     4 x 3     3 x 5 
 
5 
 
 
Problem Statement 
The primary goal of this project is to expand and generalize the literature base for 
interleaved practice.  This dissertation describes the effects of interleaving target skills 
within academic practice compared to a dosage control (repetitive) and to familiar and 
effective (incremental rehearsal) schedules.  The practice of interleaving has a well-
established base in cognitive and motor learning literature (Birnbaum, Kornell, Bjork, & 
Bjork, 2013; Carvalho & Goldstone, 2014a; Magill & Hall, 1990; J. B. Shea & Morgan, 
1979; Zulkiply & Burt, 2013).  Research in these areas has demonstrated that participants 
who practice in an interleaved schedule tend to retain target skills at a higher rate than 
those who practice in repetitive blocks.  A foundational study that demonstrated that 
effect compared learning and performance on barrier knockdown tasks (J. B. Shea & 
Morgan, 1979).  Participants in a high contextual interference, or interleaved, condition 
retained acquired movement patterns better than did participants who practiced in a 
repetitive schedule.  Subsequent research in motor learning (Magill & Hall, 1990), 
cognitive (Carvalho & Goldstone, 2014b; Zulkiply & Burt, 2013), and academic (Rohrer, 
Dedrick, & Stershic, 2015) contexts has reinforced the original findings. 
 Thus far, only seven studies have extended the literature base for interleaved 
practice manipulations into K-12 academic settings (Booth et al., 2015; Ostrow, 
Heffernan, Heffernan, & Peterson, 2015; Rau, Aleven, & Rummel, 2013; Rau, Aleven, 
Rummel, & Pardos, 2014; Rohrer, Dedrick, & Burgess, 2014; Rohrer et al., 2015; Taylor 
& Rohrer, 2010).  While small, this group of studies has demonstrated the positive effect 
associated with an interleaved practice schedule across writing, fractions, and geometry, 
6 
 
with effect sizes ranging from d = .20 to as high as d = 2.02.  Overall, the evidence 
presented in these seven studies suggests that employing an interleaved practice schedule 
has potential to have a demonstrable and meaningful effect on the efficiency of learning 
in classrooms.   
 The studies referenced above have demonstrated the promise of interleaved 
practice schedules, but have some limitations.  For example, most of these studies 
focused on comparisons between repetitive schedules and interleaved practice and their 
contribution to the retention of target skills. Exceptions are studies by Rau (2013a & 
2013b), who examined interleaving on different dimensions, specifically task type and 
task presentation format.  Additional research is needed to compare interleaved practice 
to other schedules that have been associated with strong retention benefits, such as 
incremental rehearsal (Burns, 2005; MacQuarrie, Tucker, Burns, & Hartman, 2002; 
Varma & Schleisman, 2014).  Specifically, while incremental rehearsal is associated with 
better retention when compared to traditional drill and repetitive practice, it is necessarily 
a lengthy procedure.  Students must be exposed to 45 non-target trials to be exposed to a 
target just nine times.  If an interleaved schedule can lead to similar, or better, retention 
through a simple switch in practice schedule, there is potential to reduce practice time by 
a factor of 6. This study provides the opportunity to compare the effect of different 
practice schedules on acquisition and efficiency, as well as retention.   
Learning Targets 
Adequate mathematics preparation is important for personal success and the 
success of a technical society (National Mathematics Advisory Panel, 2008).  Students 
7 
 
with more math preparation earn more, and countries are increasingly dependent on a 
workforce that is mathematically literate (National Mathematics Advisory Panel, 2008).  
To that end, an important goal of the American educational institution should be giving 
students tools they will need to build their mathematical skills.   
Math is a deep and complicated topic.  Math skills range from basic concepts of 
numeracy and the determination of whether one set has more of something than another, 
to the complexities of calculus, matrix algebra, and geometry.  Teaching math is also a 
broad and deep endeavor.  Instruction must match student assets and needs, and engage 
the learner to form a conceptual understanding of the topic (Carr & Alexeev, 2011; 
National Research Council & Mathematics Learning Study Committee, 2001; Geary, 
2005; Gersten et al., 2009; National Mathematics Advisory Panel, 2008; Powell, Fuchs, 
& Fuchs, 2013).  The National Research Council (2001) breaks math proficiency into 
five intertwined strands: Conceptual Understanding, Procedural Fluency, Strategic 
Competence, Adaptive Reasoning, and Production Disposition.  No strand can be taught 
in isolation, and each strand is further comprised of more component parts.  This project 
is focused on math facts, an aspect of the Procedural Fluency strand.  The National 
Research Council (2001) describes Procedural Fluency as “skill in carrying out 
procedures flexibly, accurately, efficiently, and appropriately.”  Fast and accurate access 
to single digit addition and multiplication facts is an important part of that overarching 
Procedural Fluency strand.   
Although building fluency in single digit addition and multiplication facts should 
not be the sole focus of math instruction, several sources underscore the importance of 
8 
 
fluent access to those skills (Gersten et al., 2009; National Mathematics Advisory Panel, 
2008; Powell et al., 2013).   The literature indicates that increased fluency with one-digit 
math facts facilitates learning of more complex processes.  To that end, the goal of this 
project is to compare the learning of single-digit math facts across three practice 
schedules.  This comparison is intended to add to a body of literature that will help 
practitioners make teaching decisions that are more effective and efficient for their 
students.    
 Single digit math facts play an important role in the context of math instruction.  
Within the context of this project, they are also relevant as a target skill in that they are 
discreet, important for learners, and have a finite set (and thus have a learning 
endpoint)—all properties of a specific skill that can be learned through practice.  As 
mentioned, the interleaved literature has a deep base in cognitive psychology and motor 
learning.  The foundational Shea and Morgan (1979) study used barrier knockdown tasks 
that were discreet, had specific solutions, drew on a deeper latent coordination skill, and 
had potential to transfer to other related tasks.  Similarly, single digit math facts are 
discreet, benefit from a latent numeracy skill, and are useful in their application to more 
complex and diverse mathematics skills.  A more efficient path to accurate and fluent 
math facts frees time for instruction, and increases the opportunity for learning more 
complex skills (National Research Council & Mathematics Learning Study Committee, 
2001).  Single digit math facts are also a part of the Common Core State Standards 
(2010) for grades 1-3.  For these reasons, single digit math facts are an ideal skill for this 
extension of the interleaved practice literature base. 
9 
 
 Single digit addition and multiplication facts are useful within the context of this 
study because of some important attributes.  First, there is a finite and convenient number 
of single digit addition and multiplication facts.  From 0:0 to 9:9 (leaving out reciprocals) 
there are 55 single digit addition and 55 single digit multiplication problems.  Sets of 55 
are easy to divide into subsets, and an exhaustive item base can be used within those 
parameters.  Second, each of those 110 items has a sum or product that is a positive 
integer.  Subtraction and division of single digits can result in negative and fractional 
numbers respectively.  As the goal of this study is a simple generalization and 
comparison of interleaved practice, an item set that is wieldy and results in simple 
responses is most desirable.   
Study Purpose and Research Questions 
The goal of the proposed research is to compare acquisition, retention, and 
efficiency in learning across three practice schedules: 1) a repetitive schedule that acts as 
a control, 2) an incremental rehearsal schedule that has been demonstrated to improve 
retention of learning targets, and 3) an interleaved practice schedule, that is just recently 
being examined in K-12 academic learning.  For the purposes of this study, acquisition is 
defined as accuracy with a single practice session, retention is defined as accuracy at an 
assessment opportunity outside of a practice session, and efficiency is defined as the 
amount of practice divided by the amount of time spent in the intervention (average of x 
problems retained per y minutes of practice).   
 
 
10 
 
Specifically, my research questions are: 
1)  How is acquisition of target math facts influenced by practice schedule 
(repetitive, incremental rehearsal, and interleaved)?   
2) Does retention of target math facts differ by practice schedule?  
3) Does efficiency of learning differ by practice schedule in terms of time 
investment per math fact? 
Hypotheses 
Research Question 1: Likelihood of a correct response will increase at a faster rate for the 
Repetitive schedule, but will asymptote over the course of several sessions.  Likelihood 
of a correct response will increase the next most quickly for incremental rehearsal.  
Interleaved practice will yield the slowest change.  Repetitive practice has demonstrated a 
link with fast acquisition across the literature (Magill & Hall, 1990). 
Research Question 2: Likelihood of a correct response at retention trials will be highest 
for incremental rehearsal and interleaved practice and will be almost indistinguishable 
between the two.  Interleaved and incremental rehearsal practice are both associated with 
established records of high retention. 
Research Question 3: Interleaved practice should be associated with a much better 
efficiency rate than incremental rehearsal.  The nature of the schedules dictates this 
difference.  Seven exposures to three targets each requires 21 trials in an interleaved 
schedule.  The same number of exposures in an incremental rehearsal schedule requires 
105 trials.  If 42 trials (seven trials in each of six bundles) is enough to achieve high rates 
11 
 
of delayed retention with a Repetitive schedule, then it should have an efficiency score 
similar to interleaved practice.   
Structure of Dissertation 
This paper describes a study that examines acquisition, retention, and student 
efficiency in learning single digit math facts.  The analytical framework is that of mixed-
effects logistic regression models for the first two research questions, and linear 
regression modeling for the third.  No published articles have used this analytical 
framework in the context of comparing practice schedules in K-12 academic skills.  
Chapter 2 provides a context for interleaving in cognitive literature followed by a 
systematic literature review of interleaved practice in K-12 settings.  Chapter 3 describes, 
in detail, the methods alluded to above, and outlines reasoning for using those methods.  
Results of the study are in Chapter 4, and a discussion of the results, limitations of the 
project, and future research directions are in Chapter 5. 
12 
 
Chapter 2 
LITERATURE REVIEW 
 
Time and practice are requisite for learners to acquire skills (Cooper & Pantle, 
1967).  Over time, the science of learning has developed and the Total Time 
Hypothesis—the idea that only the amount of practice influences skill acquisition—has 
been discarded as research into practices such as distributed learning, or increasing the 
amount of time between practice opportunities, have been explored (Cepeda et al., 2006).   
The finite nature of the temporal resources available to teachers underscores a need for 
the development of still more efficient methods of practice.  Within special education, 
specifically, finding ways to intensify instruction that address specific student needs is an 
important pursuit (Fuchs, Fuchs, & Malone, 2017).   
While there is a clear foundation in the literature for various methods of 
increasing the effectiveness of student practice—for example, using distributed practice 
or incremental rehearsal (Benjamin & Tullis, 2010; Burns, 2005; Codding, Archer, & 
Connell, 2010; Fishman, Keller, & Atkinson, 1968; Gettinger, Bryant, & Fayne, 1982; 
Schutte et al., 2015; Varma & Schleisman, 2014)—this review will focus on interleaved 
practice, and the specific benefits and parameters associated with it.  Interleaved practice, 
defined as the interleaving of a target skill among other tasks, has been shown to produce 
a retention benefit compared to traditional repetitive practice above and beyond the 
benefit of distributed practice (Kang & Pashler, 2012; Lee & Magill, 1983; Magill & 
Hall, 1990; Rohrer, 2012; Taylor & Rohrer, 2010).  The goal of this review is to describe 
the context for research on interleaved practice, discuss possible parameters that limit or 
13 
 
enhance the effect of interleaved practice, review current literature related to the 
effectiveness of interleaved practice in academic skills, and suggest a program of 
research for the future. 
Conceptual Framework and Relevant Constructs 
 For the purposes of this review, I adopt a model in which learning is broken into 
three outcomes: acquisition (sometimes called induction) (Kornell, Castel, Eich, & Bjork, 
2010; Pashler, Rohrer, Cepeda, & Carpenter, 2007; Sorensen & Woltz, 2016), retention, 
and transfer (Healy et al., 2014).  Acquisition is conceptualized as gaining facility in a 
task in the short term.  Retention is maintenance of that facility over some period of time.  
Transfer is applying some increased facility in one skill to another.   
This model can be illustrated with the following example:  Someone new to the 
game of golf might practice to improve in all three aspects of learning.  At the driving 
range, this novice golfer may take lessons from an instructor, and over the course of an 
hour improve a swing such that the ball consistently (80% of the time) lands within 15 
feet of a target at 50 yards.  The golfer has demonstrated acquisition of the skill of 
accurately hitting a golf ball at a 50 yard target.  However, upon returning to the range 
later, the novice’s accuracy has fallen back to the pre-lesson level of 5%.  After more 
lessons, this golfer might be able to maintain 80% accuracy across multiple sessions on 
the range, demonstrating retention.  If the burgeoning golfer is able to apply this newly 
acquired and retained skill to other tasks, such as hitting targets at 20 yards or 100 yards, 
he or she has demonstrated transfer.   
14 
 
Research related to interleaved practice tends to focus on retention (Magill & 
Hall, 1990).  When acquisition is measured and compared between practice schedules, 
researchers have often found that interleaved practice does not lead to more efficient skill 
acquisition (Blandin, Proteau, & Alain, 1994; Magill & Hall, 1990; Pollatou, 
Kioumourtzoglou, Agelousis, & Mavromatis, 1997; C. H. Shea et al., 1990; J. B. Shea & 
Morgan, 1979).  The acquisition detriment can appear paradoxical when compared to the 
retention benefit, because it seems odd that practice that leads to higher accuracy in fewer 
trials is also associated with poorer retention.  Transfer is less often addressed in the 
literature.  The focus on retention, rather than transfer, in the literature is reflected in the 
studies covered in the rest of this review.  Transfer is a topic that is certainly worthy of 
examination; however, the dearth of extant literature addressing the effects of interleaved 
practice on transfer means that such a review will have to wait. 
 Within the contextual interference and interleaved practice literature, researchers 
have often compared blocked practice schedules with interleaved, or high contextual 
interference, practice schedules (Birnbaum et al., 2013; Kang & Pashler, 2012; Lee & 
Magill, 1983; Magill & Hall, 1990; Taylor & Rohrer, 2010).  Blocked practice refers to a 
practice schedule in which the same task is repeated until all practice trials for that task 
have been completed before moving to another task (Magill & Hall, 1990; C. H. Shea et 
al., 1990).  An interleaved or high contextual interference schedule introduces distractor 
tasks between target tasks to create interference between iterations of the target task 
(Kang & Pashler, 2012; Magill & Hall, 1990; J. B. Shea & Morgan, 1979).  Many times 
the distractor tasks differ from the target task in some way.  For example, in Shea and 
Morgan’s (1979) study, participants learned three similar, but different, tasks that 
15 
 
required them to knock down small barriers set up in different patterns.  In another study 
(Landin & Hebert, 1997) participants practiced basketball shots from different positions 
on the court.   
For the purposes of this paper, practice schedules that are high in contextual 
interference or in which a target task is interleaved with other tasks will be called 
interleaved schedules.  A high contextual interference schedule may not always be 
strictly interleaved; many researchers, particularly in the motor learning literature, have 
used pseudo-random schedules that might place two identical trials next to each other 
(Blandin et al., 1994; Lee & Magill, 1983; Magill & Hall, 1990; Pollatou et al., 1997; J. 
B. Shea & Morgan, 1979).  However, the characteristics of interest are that interleaved 
schedules provide built-in distribution of practice trials, and create an environment in 
which different tasks are presented together.  Further, to avoid confusion around the 
terms blocked, blocking, block, and repetitive, any practice schedule in which the main 
feature is that each trial of a target task is practiced at one time before shifting to another 
task will be referred to as a repetitive or repeating schedule.  Some researchers 
(Kulasegaram et al., 2015; Sorensen & Woltz, 2016) refer to the control condition as 
blocked when stimuli that are different, but highly similar, are presented together, while 
interleaved practice refers to the interleaving of items from different categories.  Other 
authors (J. B. Shea & Morgan, 1979; Stambaugh, 2011) use the term blocked to refer to 
identical stimuli presented one after the other. Again, to reduce confusion, these 
schedules will be referred to as repetitive as they repeat the salient dimension of the task.   
16 
 
Background and Groundwork   
There is a robust foundation of interleaved practice in motor learning literature 
(Blandin et al., 1994; Lee & Magill, 1983; Magill & Hall, 1990; Pollatou et al., 1997; J. 
B. Shea & Morgan, 1979).  There is also a body of literature that examines the benefits of 
interleaved practice in learning to discriminate between artists (Kornell & Bjork, 2008), 
bird identification (Birnbaum et al., 2013), learning clarinet music (Carter & Grahn, 
2016), and mirror-drawing (Desmottes, Maillart, & Meulemans, 2017).  The literature 
represents convincing evidence for the utility of interleaved practice for improving 
retention when compared to blocked or repetitive practice. 
Benefits of interleaving. In a foundational motor learning study, Shea and 
Morgan (1979) used three different barrier knockdown tasks, and compared the results of 
participants in an interleaved practice condition with the performance of participants in a 
repetitive practice condition.  The authors found that, during acquisition, the interleaved 
schedule appeared to lead to worse performance than the repetitive schedule; however, it 
led to a retention benefit for the random practice schedule on the tasks that were learned. 
They also found a transfer benefit for tasks that were similar, but not identical, to the 
tasks targeted in practice.  This benefit is derived in part from the natural spacing of the 
target tasks, a benefit conveyed by any distributed practice schedule (Benjamin & Tullis, 
2010), but has been shown to convey an additional benefit, which has been attributed in 
part to the learner’s increased opportunity to compare the tasks that have been presented 
(Carpenter, Cepeda, Rohrer, Kang, & Pashler, 2012), and, in motor skills, the opportunity 
to reconstruct the action plan of the target tasks (Magill & Hall, 1990).   
17 
 
One possible explanation for the retention benefit of interleaved practice over 
repetitive practice is that trials that interleave target tasks have a built-in distribution of 
practice.  The benefits of distributed practice schedules to retention are well documented 
(Benjamin & Tullis, 2010; Burns, 2005; Codding et al., 2010; Fishman et al., 1968; 
Gettinger et al., 1982; Schutte et al., 2015; Varma & Schleisman, 2014).  While 
increasing the time between study instances (the inter-study interval) can lead to better 
retention (Cepeda et al., 2006), there is evidence of a benefit of interleaved practice 
above and beyond the distribution of practice that occurs as an artifact of interleaving 
(Mitchell, Nash, & Hall, 2008).  Mitchell et al. (2008) controlled for trial spacing and 
found a retention benefit for interleaved practice in a discrimination task that asked 
participants (n=24 undergraduate students) to engage in pattern recognition.  Their 
second experiment (with n=32 undergraduate students) added an interrupter condition in 
which the inter-study interval was filled by a distractor task. Participants in the 
interleaved condition performed better even when the intra-task interval was held 
constant between conditions.  This finding suggests there is something about interleaving 
similar tasks that provides a learning benefit beyond the mechanisms of distribution.  
These findings were replicated in a later study (Zulkiply & Burt, 2013) that also 
controlled for the temporal interval between exposures. 
 To explain the retention disparity between interleaved and distributed practice, 
Birnbaum et al. (2013) describe a “discriminative-contrast hypothesis,” in which part of 
the benefit of interleaved practice is derived from comparing two similar but different 
stimuli that are presented consecutively.  In a pair of studies, participants (n=102) 
recruited through Amazon’s Mechanical Turk were asked to learn to pair pictures of birds 
18 
 
with their species via either a contiguous interleaving schedule or an interleaving 
schedule that interspersed unrelated trivia questions.  The discriminative-contrast 
hypothesis would predict that the contiguous schedule would lead to better performance 
on a retention task than the schedule with interspersed trivia conditions.  Because the 
contiguously interleaved condition led to better performance, the authors concluded that 
the discriminative-contrast hypothesis was supported.  Birnbaum et al. (2013) raised 
questions related to how task characteristics influence the effect of interleaved practice on 
learning outcomes. 
The transfer benefit of variable practice was further explored in a study that uses 
the Tower of Hanoi (ToH) puzzle (Vakil & Heled, 2016).  In this study, 84 participants 
practiced the puzzle with either the same ToH configuration or a combination of ToH 
configurations.  Skill acquisition through practice for both conditions was quite similar.  
Transfer cost, both in terms of number of moves needed to solve and time before first 
move, was lower for the variable condition.  This finding provides promising evidence 
for the benefit of an interleaved practice schedule in learning abstract, procedural tasks.   
Task characteristics and levels of interleaving.  Some researchers have focused 
on the nature of the target task.  Within the field of motor learning, in which interleaved 
practice has its foundation, tasks can have various characteristics (Schmidt & Lee, 2011).  
For example, walking is a continuous task in that it links a specific right foot/left foot 
cycle over and over again (you would likely not consider a two-step sequence to be 
walking outside of some very specific circumstances), whereas throwing a ball at a target 
is a discreet task.  Cognitive and academic tasks can be broken down into various 
dimensions.  Several studies (Birnbaum et al., 2013; Carvalho & Goldstone, 2014a, 
19 
 
2014b, 2015; Kornell & Bjork, 2008) used discrimination tasks where participants were 
asked to distinguish between artists, shapes, birds, or patterns.  Other studies (Rohrer et 
al., 2014; Rohrer & Taylor, 2007; Sana, Yan, & Kim, 2017) focused on procedural skills 
like solving area problems in geometry.   
Another dimension to consider is the presentation of the task.  Fractions can be 
presented as numbers or shaded portions of a shape (Rau, Aleven, & Rummel, 2010; Rau, 
et al., 2013).  Whether procedural or discriminatory, tasks can fall along a continuum of 
similarity (Carvalho & Goldstone, 2014b, 2014a, 2015) where one could conceivably 
interleave different multiplication math facts (very similar) or interleave finding the area 
of three-dimensional shapes with naming the capitals of African countries (very 
different).   
Further, it may be helpful to consider task characteristics.  While the literature 
hasn’t formalized such a model, tasks seem to exist along continua in four dimensions:  
similar to dissimilar, easy to difficult discriminability, simple to complex, and discrete to 
continuous (see Figure 1).  The next two paragraphs describe tasks along the similarity 
and discriminability dimensions.  
20 
 
 
Figure 1. Representation of task dimensions. 
 
 Task similarity appears to have an influence on the interleaved practice effect.  
Carvalho and Goldstone (2014a, 2014b, 2015) conducted a series of studies in which they 
asked participants (n=290 undergraduates, n=241 undergraduates, and n=211 
undergraduates, respectively) to discriminate between stimuli that varied on two levels.  
Shapes were generated for the studies.  All shapes were at least slightly different from 
each other (level 1), but might have features in common within a category (level 2).  
Thus, several shapes from category A would differ only slightly from themselves, but 
differ greatly from all the shapes in category B.  From this series of studies, the authors 
concluded that there was a retention benefit for interleaved practice when the participants 
were studying similar stimuli; however, that benefit disappeared when the stimuli were 
dissimilar.  There was also a transfer benefit for interleaved practice with high similarity 
stimuli when participants were asked to learn novel stimuli. The authors (Carvalho & 
Goldstone, 2015) also included conditions in which participants had to generate the 
21 
 
category to which the stimuli belonged.  In this study, participants in the generative 
condition performed better when using an interleaved schedule than when using a 
repetitive schedule.  The opposite was true of the passive condition in which participants 
were told the category.   
 Another facet of task representation is discriminability.  Zulkiply and Burt (2013) 
varied discriminable load by increasing the number of distractors present in a 
discrimination task.  The more distractors present on a stimulus, the less discriminable it 
is.  They found that participants (n=125 undergraduates) performed better on a task with 
interleaved practice when the discriminable load was low (several distractors), and the 
retention benefit was reversed when there was high discriminability present.   
 Within a practice session, tasks can be arranged such that interleaving can take 
place at multiple levels or dimensions.  For example, interleaving could happen at the 
individual task level, with clusters of tasks, at task type, or with practice sessions.  
Interleaving could also take place in the presentation of the material in that the same task 
type, or even the same basic task, could be presented in several ways.  Table 2 depicts 
examples of how interleaving can occur across dimensions such as task representation or 
task type, as well as across levels of individual tasks, clusters or practice sessions.  Even 
within the levels described, one could interleave different subjects.  A spelling task could 
be interleaved with a math problem and a novel sight word.  Given the discriminative-
contrast hypothesis, intersubject interleaving may not afford much benefit above a 
distributed schedule, but the example is still useful in conceptualizing the many ways 
interleaving can be applied. 
22 
 
Table 2 
Examples of Interleaving at Different Levels and Dimensions 
Dimension Example 
Task representation 1
2
+
1
4
 
.5 + .25 
    
1
2
+
1
4
 
.5 + .25 
    
Task Type 1
2
+
1
4
 
15 + 17 11 – 4 1
2
+
1
4
 
15 + 17 11 - 4 
Level       
Individual Task 1
2
+
1
4
 
15 + 17 11 - 4 1
2
+
1
4
 
15 + 17 11 - 4 
Task Cluster 1
2
+
1
4
 
1
3
+
2
6
 
3
8
+
1
4
 
2 x 12 9 x 6 21 x 2 
Practice Session 10min 
math 
10min 
reading 
10min 
writing 
10min 
math 
10min 
reading 
10min 
writing 
 
 Kulasegaram et al. (2015) extended the idea of mixing repetitive and interleaved 
practice into yet another application.  They had 42 undergraduate students study 
physiological concepts in either an interleaved schedule (all concepts read before 
practicing) or repetitive schedule (practice for a concept followed the reading).  The 
second level of the design manipulated whether the practice itself was repetitive or 
interleaved.  Participants completed practice problems that pertained to one (repetitive) or 
two (interleaved) different organs.  Results showed no main effect of practice type or 
number of organs practiced.  Results of near and far transfer tests revealed a benefit of 
23 
 
studying multiple organs.  Results of far transfer tests revealed benefits of both 
interleaved learning and practicing multiple organs.   
The article “Mixing topics while studying does not enhance learning” (Hausman 
& Kornell, 2014) is another example of the importance of attending to the level of 
interleaving in a practice schedule.  This article describes a series of experiments that 
asked participants (n=55, n=79, n=77, and n=133, respectively) to learn English 
translations of Indonesian words and anatomical definitions.  There was no significant 
benefit for interleaved practice in the first two experiments, as well as the fourth.  A 
retention benefit was found for the repetitive condition in experiment 3.  These results 
appear to run counter to the evidence described above.  However, in this study, the 
repetitive condition interleaved material at the individual task level, while the mixed 
condition interleaved at the cluster level.  The relative retention benefit or detriment of 
the schedule depends in part on the similarity of the task.  Hausman and Kornell’s (2014) 
findings parallel those of Carvalho and Goldstone (2014a, 2014b, 2015).   
 Sorensen and Woltz (2016) asked participants (n=160) to memorize which non-
words were associated with which non-word categories, and varied the amount of 
repetitive and interleaved practice for the participants.  The authors found acquisition and 
retention benefits for the most repetitive schedule.  The authors had four non-words in 
each of six categories.  The most repetitive schedule presented all four words in a given 
category in sequence before moving to the next category (“𝐴1𝐴2𝐴3𝐴4𝐵1𝐵2𝐵3𝐵4 …” 
(Sorensen & Woltz, 2016)) while the high interleaved schedule combined single 
exemplars from each of the categories together in a practice block.  Looking at that 
24 
 
practice schedule through the lens of the Carvalho and Goldstone (2014a, 2014b, 2015) 
studies, it appears that Sorensen and Woltz (2016) replicated the finding that interleaved 
practice is ideal for studying different, but highly similar, stimuli.   
 Synthesis of the background.  Interleaved practice has a growing body of 
literature with boundaries and parameters that are gaining definition.  Interleaved practice 
is fertile ground for research into remaining questions.  A consideration that was not 
addressed in any studies described above is the importance of the underlying structure of 
tasks and practice.  Sorensen and Woltz’s (2016) findings appear counter to much of the 
foundational research in this area, perhaps due to the level of interleaving, as I discussed 
above.  Their contradictory findings may also be related to the nature of the task.  
Sorensen and Woltz (2016) asked participants to learn which non-words belonged in 
specific categories that were named with other non-words.  Questions remain as to 
whether it matters that everything about the task was arbitrary, and whether the similarity 
dimension interacted with learners’ prior knowledge of the “language.”  A series of 
experiments by Carvalho and Goldston (2014a, 2014b, 2015) placed great emphasis on 
task similarity, indicating that a complete lack of structure may influence any effect 
interleaved practice has on learning outcomes.  It may be important to consider the 
complexity of the task (Blasiman, 2017), and how complexity, and these other task 
dimensions, might interact with individual differences in the learner.  
Another important consideration may be the combination and degree of 
interleaving.  A constant throughout the literature on interleaved practice is the relative 
acquisition benefit of repetitive practice.  Given the apparent acquisition benefit of a 
repetitive schedule, and retention and transfer benefits of interleaved schedules (at least 
25 
 
within the parameters of tasks that benefit from interleaved practice), it may be that there 
is a way to leverage both schedules.  The preceding introduction into interleaved practice 
provides some context for the focus of the remainder of this review, which examines the 
extension of interleaved practice research into academic learning.  The beginning of this 
paper alluded to the importance of finding more efficient methods of learning so they can 
be applied to the specific environment of a classroom.  As described below, interleaving 
during practice provides a promising path towards developing more efficient learning 
environments for students. 
Purpose of the Present Review 
 Despite its potential for improving retention as demonstrated above, interleaved 
practice is not well known among educators (Morehead, Rhodes, & DeLozier, 2016). 
This section will describe several studies that explored the potential for/ interleaved 
practice in academic settings.  The purpose of this review is to summarize current 
literature pertaining to the use of interleaved practice schedules in academic skills, and 
provide some brief methodological comments with the aim of guiding future research.   
Method 
I searched for articles for this review in PsychInfo and ERIC electronic databases, 
using the following parameters: variations on “contextual interference,” “interleave,” 
“contingent switching,” or “win-shift, lose-stay” (the last two to capture specific 
variations of high contextual interference schedules (Simon, Lee, & Cullen, 2008), 
combined with variations on “math,” “writing,” and “reading.”  Articles about interleaved 
practice in academic skills, specifically the application of an interleaved practice schedule 
26 
 
to learning materials in a K-12 classroom, were included.  Articles that referenced 
academic skills in post-secondary environments were excluded, as were articles that 
focused on K-12 academic skills but did not use an interleaved practice schedule.  
References from articles obtained in the search were added to the review if they pertained 
to interleaved practice schedules with academic skills. 
Results 
 My search returned seven studies that examined the effect of interleaved practice 
on academic skills with K-12 populations.  Of those, six studies targeted math (Ostrow, 
Heffernan, Heffernan, & Peterson, 2015; Rau, Aleven, Rummel, & Pardos, 2013; Rau et 
al., 2013; Rohrer et al., 2014; Rohrer, Dedrick, & Stershic, 2015; Taylor & Rohrer, 
2010), and one focused on handwriting (Ste-Marie, Clark, Findlay, & Latimer, 2004).  
An additional study evaluated a math intervention in which interleaved practice was a 
feature but was not examined specifically (Booth et al., 2015).  See Table 3 for a 
breakdown of participants, measures and results of the studies cited below.  
27 
 
Table 3 
Summary of Results for Interleaved Practice Schedule Studies 
 Participants Learned Skill Measure Acquisition Results 
Retention Results (brief 
delay) 
Retention Results (Long 
Delay) 
Ste-Marie et al., 
2004 
44 1st grade students 
from three 
classrooms in two 
schools 
Three novel 
symbols 
Reproducing three novel 
symbols scored on 3 point 
scale 
Acquisition benefit for 
Blocked practice p.08. 
Random practice did 
not improve to level of 
Blocked practice 
Benefit for Random 
practice p=.0467 N/A 
 
50 6-7 year old 
students from two 
schools 
Three novel 
symbols 
Reproducing three novel 
cursive letters scored  
Benefit for Random 
practice in trial blocks 
2, 3, and 4 
Interleaved benefit for a 
and h. repetitive benefit for 
y. 
Interleaved benefit for a 
and y. Repetitive benefit 
for h. 
 
68 5.5-7 year old 
students from five 
schools 
Three novel 
symbols 
Reproducing three novel 
cursive letters scored  
Main effect for trial 
block 
Benefit for Random 
practice p=.10, d=.65 
Benefit for Random 
practice d= 1.03 
Rohrer et al., 2014 
140 12 year old 
students taught by 
three teachers in 
eight classes 
Four different 
kinds of 
mathematics 
problems 
Two week delay test of 
three novel problems of 
each of the four types N/A  
N/A  
 
Retention benefit for 
Interleaved group. 
t(139)10.49,  p<.001, 
d=1.05 
Rohrer et al., 2015 
126 middle school 
students 
Math problems 
related to 
graphing and 
slope 
1 and 30 day delayed 
retention tests 
N/A 
 
Benefit for interleaving 
p=.02, d=.42 
Benefit for interleaving 
p<.001, d=.79 
Taylor and 
Rohrer, 2010 
22 fourth graders 
from Florida 
Solving four 
types of math 
problems Tests of each problem type 
Practice benefit for 
Blocked group. 
t(22)=4.94, p<.01 
d=2.02 
Retention benefit for 
Interleaved group. 
t(22)2.96, p<.01, d=1.21 
No significant benefit 
t(22)=1.19  
Ostrow et al., 
2015 
146 High and Low 
skill seventh grade 
students  Geometry Posttest at 2-5 days N/A N/A 
Interleaved main effect 
for the Low skill group 
p<.05, g=.6, but not High 
group (p>.05) 
 Rau et al., 2013 
230 4th and 5th 
grade students  
Fraction related 
math skills 
Computer based math 
assessment  
No benefit for any 
group during 
acquisition phase. 
No main effect of 
condition 
No main effect of 
condition 
28 
 
       
Table 3 (continued) 
 Participants Learned Skill Measure Acquisition Results 
Retention Results (brief 
delay) 
Retention Results (Long 
Delay) 
Rau et al., 2013 
(Operational 
Results) 
101 5th and 6th 
grade students  Fractions 
Computer based math 
assessment  
N/A 
 
 
No significant effects for 
efficiency or effectiveness 
No significant effects for 
efficiency or 
effectiveness 
Rau et al., 2013 
(Representational 
Effectiveness) 
101 5th and 6th 
grade students Fractions 
Computer based math 
assessment  
N/A 
 
 
Benefit for interleaved 
types t(100)=2.03, p<.05, 
d=.09 
Benefit for interleaved 
types t(100)=4.74, p<.01, 
d=.21 
Rau et al., 2013 
(Representational 
Efficiency) 
101 5th and 6th 
grade students  Fractions 
Computer based math 
assessment  
N/A 
 
 
Benefit for interleaved 
types t(100)2.34, p<.05, 
d.37 
Benefit for interleaved 
types t(100)=5.55, p<.01, 
d=.88 
29 
 
In a series of three experiments Ste-Marie et al. (2004) demonstrated the benefit 
of a high contextual interference practice schedule for learning handwritten symbols 
(letters from the phonemic alphabet and cursive letters).  In the first experiment, 44 first 
grade students were taught three novel symbols in high and low interleaved conditions.  
The findings of the first experiment converged with previous research in other domains 
with regards to both retention and acquisition.  Students in the high interleaved condition 
performed worse during acquisition and better in retention. The subsequent experiments 
extended the findings of experiment 1 by introducing a 24-hour retention assessment, and 
a measure of transfer.  Interleaved practice lead to better scores on both the 24-hour 
retention and transfer measures (Ste-Marie et al., 2004).  The series of three experiments 
provides converging evidence for the robust nature of the contextual interference effect. 
A group of studies co-authored by Dough Rohrer (Rohrer et al., 2014, 2015; 
Taylor & Rohrer, 2010) provide substantial evidence for the retention benefits of an 
interleaved practice schedule in the context of middle school mathematics tasks.  In the 
first study (Taylor & Rohrer, 2010) the authors asked students (n=24) to solve for various 
aspects of prisms (edge, angle, face, and corners). Throughout the acquisition phase, 
performance in the interleaved condition was worse than in the repetitive condition 
(Cohen’s d = 2.02, 1.20, and 1.01).  However, in a retention test, there was an accuracy 
benefit for students in the interleaved practice condition (Cohen’s d = 1.21).  In Rohrer et 
al. (2014), students (n=140) were asked to solve equations and word-problem based 
proportion problems, as well as problems asking students to draw lines for slopes and 
solve for slopes.  The authors reported a significant retention benefit for interleaved 
practice (p < .001, d = 1.05).    Rohrer et al. (2015) used two related, but different types 
30 
 
of math problems: drawing a line on a graph based on a slope equation, and finding the 
slope of a line on a graph.  Participants (n= 126) were asked to solve math problems and 
were assessed at 1 or 30 day delays.  The authors found a retention benefit for the 
interleaved practice condition on 1- and 30-day delayed retention tests.  Effect sizes for 
these results ranged from d=.47 to d=.87.  These findings are consistent with previous 
research on interleaved practice schedules. Interleaved practice appears to be associated 
with improved retention, though not necessarily with faster, more accurate acquisition. 
Ostrow et. al (2015) studied interleaved practice schedules within the context of a 
computer-based tutoring system by randomly assigning 146 seventh-grade students into 
two groups.  One group practiced skills related to geometry and probability in a repetitive 
schedule.  The other group studied the same concepts in an interleaved schedule.  The 
authors split the students into low- and high-skill groups for the analysis.  In an 
interaction between individual learner differences and practice effect, the interleaved 
practice group displayed a higher (p< .05, Hedge’s g = .60) posttest score than the 
repetitive practice group for the low-skill students, an effect that was not found for the 
high skill students.  This finding indicates that individual differences could play an 
important role in the effectiveness of interleaved practice for retention. 
Rau et al. (2013) examined the interleaving of graphical representations (GR) of 
fractions.  In this study, students (n=230) were randomly assigned into one of four 
conditions: Blocked, Moderate, Fully Interleaved, and Increased.  The conditions referred 
to the degree to which the graphical representations were interleaved.  The researchers’ 
two hypotheses were: (1) students would improve from pretest to posttest on all 
measures, and (2) students in the interleaved condition would outperform students in the 
31 
 
blocked condition. They determined that students benefited from the tutoring system on 
each of the four areas measured, regardless of the practice condition.  While the 
researchers found no significant effect of practice condition, they did find a significant 
interaction between pretest score and practice condition.  Students with different skill sets 
at the outset of the learning had differential retention rates in different practice schedules.  
Specifically, students who scored below 25% on the pretest received more benefit from a 
fully interleaved practice schedule on the conceptual transfer measure.  This finding 
further demonstrates the importance of individual difference and relative task difficulty.   
Another aspect of this study that begs further investigation stems from interleaved 
practice schedules having a deep body of literature demonstrating the benefit they have 
on delayed retention (Broadbent, Causer, Ford, & Mark Williams, 2014; Lee & Magill, 
1983; Magill & Hall, 1990; Rohrer et al., 2014; Rohrer & Taylor, 2007; C. H. Shea et al., 
1990; Ste-Marie et al., 2004; Taylor & Rohrer, 2010).  This study did not find the same 
effect, other than the pretest skill-level interaction mentioned above.  This failure to find 
converging results could be an artifact of the dimensions interleaved in this study.  By 
blocking task type in groups of six while varying the level of contextual interference as a 
predictor variable, there could have been an unforeseen dimensional interaction.  It could 
also mean that the main effect of interleaved practice on retention is not as robust as other 
literature would seem to indicate, and that further research is warranted. 
Another Rau et al. (2013) study investigated whether interleaving should take 
place along task or representational dimensions.  One hundred one fifth- and sixth-
graders used a web-based intelligent tutoring program to study fractions. Representation 
was varied by depicting fractions as line segments, partitions of a circle, or sets.  Task 
32 
 
type varied by what students were asked to do: identifying fractions, comparing fractions, 
adding fractions, and so on.  Students were randomly assigned to two groups in which 
either representation or task type were interleaved. The authors predicted that interleaving 
on the task type dimension would lead to a combined effect of “more effective 
representational knowledge (hypothesis 1a); more efficient representational knowledge 
(hypothesis 1b); more effective operational knowledge (hypothesis 2a); and more 
efficient operational knowledge (hypothesis 2b)” (Rau et al., 2013, p. 101). They 
concluded that evidence supported hypotheses 1a and 1b, but not hypotheses 2a and 2b. 
In other words, interleaving led to improvements in representational knowledge, but not 
operational knowledge.  It is difficult to discern why there was an effect for 
representational, but not operational knowledge.  One possible explanation could be 
related to some of the learner characteristics that have been mentioned.  For instance, it is 
possible that the students were at a stage of their learning that was more optimal for 
representational learning than operational learning. 
Discussion 
The body of literature that extends the findings of interleaved practice schedules 
into the domain of K-12 academic skills is small.  Despite the relative dearth of research, 
the strength of the evidence is impressive.  The research presented above demonstrates an 
effect that is maintained across writing (Ste-Marie et al., 2004) and math (Ostrow et al., 
2015; Rau et al., 2013; Rau et al., 2013; Rohrer et al., 2014, 2015; Taylor & Rohrer, 
2010).  Given that here is only one writing study, replication in that area would 
strengthen the evidence for interleaved practice as a strategy that is robust across 
33 
 
domains.  The research has also demonstrated that the effect persists across skills that are 
at least as dissimilar as those used in Rohrer et al. (2014), and has made inroads regarding 
the possible benefit of interleaving on multiple dimensions (Rau, et al., 2013; Rau et al., 
2014).  Finally, the researchers who authored these studies have started to ask important 
questions related to interactions between individual learner differences and the 
mechanisms and parameters that influence the effect of interleaved practice.   
The studies in this review had a clear focus on retention as the target aspect of 
learning.  Each of the seven studies reviewed used a retention measure.  Only three (Rau 
et al., 2013; Ste-Marie et al., 2004; Taylor & Rohrer, 2010) directly addressed 
acquisition, and one (Ste-Marie et al., 2004) mentioned transfer.  The studies comparing 
interleaved practice to repetitive practice (Rau et al., 2013; Rohrer et al., 2014, 2015; Ste-
Marie et al., 2004; Taylor & Rohrer, 2010) found results consistent with the literature in 
cognitive psychology and motor learning.  Interleaved practice tends to lead to better 
retention of the target skills.  Studies addressing acquisition (Rau et al., 2013; Ste-Marie 
et al., 2004; Taylor & Rohrer, 2010) yielded results that were mixed.  Some results 
(Taylor & Rohrer, 2010) were consistent with the background literature and found higher 
accuracy as a result of repetitive practice, while others ( Rau et al., 2013; Ste-Marie et al., 
2004) showed an acquisition benefit for repetitive practice, or no significant difference.  
The only study to address transfer (Ste-Marie et al., 2004) found a benefit for students 
who studied with an interleaved schedule. 
One article did briefly address the idea of the discrete versus continuous nature of 
tasks, and also task complexity.  The authors in the Rau et al. (2013) posited that the lack 
of between-group differences in operational outcomes may be due to practice schedules 
34 
 
having greater impact on conceptual knowledge than on procedural knowledge.  
Considering that quite a bit of practice schedule research comes from the field of motor 
learning (D. I. Anderson, Magill, & Sekiya, 2001; Guadagnoli & Lee, 2004; Lee & 
Magill, 1983; C. H. Shea et al., 1990; Wulf & Schmidt, 1997), where it could be argued 
that procedural skills are the primary area of focus, it raises the question of whether 
procedural knowledge in motor learning is qualitatively different than in other cognitive 
areas. The authors suggest that task complexity is inversely related to the effectiveness of 
interleaved practice schedules (M. A. Rau et al., 2013).  If so, future research examining 
that relation, and how task complexity might vary relative to student skill and task type, 
would be useful.   
Limitations of the Review 
 This review is limited by the lack of literature focusing on interleaved practice in 
K-12 academic settings.  I was only able to find seven studies addressing the topic, and 
six of them addressed the same academic area.  While the results are consistent, both 
within the set and with the foundational work in other fields, the implications for practice 
must be couched in general terms.  Interleaving appears to be a beneficial practice, at 
least for mathematics, and has promise for the classroom.  The studies reviewed 
demonstrated effects in geometry, fractions, and solving equations.  Questions remain 
about the utility of interleaved schedules for more simple mathematics skills.  Fact 
fluency, for example, might be quite amenable to such a schedule, and is an important 
skill for later math success. There is also some evidence that students with lower skill 
levels might benefit more from interleaving, but there is still a lot of work that should be 
35 
 
done regarding how to conceptualize what low skill means, and how skill level interacts 
with practice effects, task characteristics, and dimensions of interleaving.   
Very little attention was paid to task characteristics.  Task characteristics and 
dimensions are not really addressed in the literature addressing K-12 academics.  Given 
the paucity of published literature in the area, this lack of attention could be an artifact of 
researchers making initial inroads with broad strokes before focusing on fine details.  It 
may also be that the task characteristics of similarity and discriminability are difficult to 
manipulate or quantify in academic skills.  Further, while Rau et al. (2013) addressed 
which dimension to interleave, there was very little attention paid to level or dimension 
of interleaving outside of that reference.  See Table 4 for a breakdown of tasks addressed 
and levels and dimensions of interleaving.  The experiment described in the following 
chapters does not manipulate task characteristics, but does stay within the arena of 
simple, similar, discrete, and discriminable, about which more will be said in Chapter 3. 
 
 
 
 
 
 
 
 
 
 
36 
 
Table 4 
Summary of Task Characteristics and How Studies Interleaved 
Study Task 
Characteristics 
Subject Dimension of 
Interleaving 
Level of 
Interleaving 
(Rohrer et 
al., 2014) 
 
Discrete and 
similar 
 
Math 
 
Task Type 
 
Problem 
(Rohrer et 
al., 2015) 
 
Discrete and 
similar 
 
Math 
 
Task Type 
 
Problem 
(Taylor & 
Rohrer, 
2010) 
 
Discrete and 
similar 
 
Math 
 
Task Type 
 
Problem 
(Ostrow et 
al., 2015) 
 
Discrete and 
similar 
 
Math 
 
Task Type 
 
Problem 
(Rau et al., 
2013) 
 
Discrete and 
similar 
 
Math 
 
Task Type and Task 
Representation 
 
Problem 
 
(Ste-Marie et 
al., 2004) 
 
Discrete and 
similar 
 
Writing 
 
Task Type 
 
Problem 
 
(Rau et al., 
2013b) 
 
Discrete and 
similar 
 
Math 
 
Task Type and Task 
Representation 
 
Problem 
 
37 
 
Methodologically, these studies were generally sound, though no study is perfect.  
For example, more attention could be paid to operationally defining what is being 
interleaved and how.  Also, there was an apparent error in the analysis for one study 
(Rohrer et al., 2014) in which the authors did match not the unit of assignment to 
condition (classroom level) with the unit of analysis (individual level), threatening 
internal validity (Shadish et al., 2002).  It is difficult to discern the actual t and p values 
for the analysis, as standard deviations among the classes were not reported.  However, 
this is a young field in which definitions are still being formed, the direction of the results 
is consistent across studies, and the papers should be considered a relevant part of the 
body of evidence pertaining to interleaved practice in academic settings.   
Future Research 
 Future research on interleaved practice schedules has a solid foundation upon 
which it can expand.  Moving forward, I envision two main research tracks that interact 
as they evolve.  The first track could be called a Basic Research Track (BRT).  Along this 
path, research will clarify questions surrounding mechanisms and parameters of 
interleaved schedules.  What are the limits of “similar” tasks as described by Carvhalo 
(2014a, 2014b, 2015)?  The same questions could be asked of discriminability (Zulkiply 
& Burt, 2013).  How do those two concepts interact?   What learner traits influence the 
effects of task dimension and level of interleaving?   Where does prior knowledge factor 
in, and how does the underlying structure of the task interact with all of those factors?  In 
short, the BRT will be aimed at operationalizing the dimensions and facets that might 
influence the possible effects of interleaved practice.   
38 
 
The second track could be called a Translational track.  The translational track 
would take the theory and results produced by the BRT and transform it into something 
that can be implemented in actual learner environments.  The translational track may be 
more focused on specific, real-world task characteristics and how they can be mapped 
onto the theories developed by the BRT.  Further, the translational track will need to find 
ways to scale up procedures in ways that are practical for schools, software, and learners.  
For both programs of research, researchers may benefit from using a mixed effects 
framework in their experimental design and analysis.  Using a multi-level approach 
would allow researchers to measure rates of learning and retention effects while 
accounting for correlation of data points within subjects.  For example, given the 
evidence presented above, it is easy to imagine a model in which Level-1 main effects of 
time and practice condition (interleaving vs repetitive) interacting with a Level-2 variable 
like learner skill, and accounting for random effects.   
 There is a lot of variety in both learners and the skills they are trying to acquire.  
The current state of the literature seems to indicate that interleaved practice as a useful 
tool for those students.  However, as useful as interleaved practice schedules may be, 
there is a lot of room to build on the research foundation that has been created thus far.  
In the future, researchers and practitioners may be able to use what they know about 
learners and their target skills to build individualized practice schedules that leverage the 
strengths of an interleaved schedule. 
 
 
39 
 
Chapter 3 
 
METHODS 
 
Research Questions Revisited 
 The broad purpose of this project was to compare an interleaved practice schedule 
to an established improved practice method (incremental rehearsal), and a control 
(repetitive practice).  Further, the goal was to compare performance in a meaningful and 
practical context.   To those ends, I compared the acquisition, retention, and efficiency of 
learning single-digit math facts in a sample population of third and fourth-grade students 
across three practice schedules.  Specific Research Questions are as follows. 
1) How is acquisition of target math facts influenced by practice schedule (repetitive, 
incremental rehearsal, and interleaved)? 
2) Does retention of target math facts differ by practice schedule? 
3) Does efficiency of learning differ by practice schedule in terms of targets learned 
per unit of time? 
Setting and Participants 
 This study was conducted at a charter school in an urban Midwestern city.  The 
school serves pre-kindergarten through sixth grade with approximately 250 students. 
Because working with educational software and getting extra math help is something that 
is part of their typical school day, informed consent occurred via parental notification and 
40 
 
an opt-out procedure as per IRB #STUDY00000162. The sample for this project is 74 
third- (n = 34) and fourth-grade (n = 40) students from a charter school in an urban 
district in a Midwestern metropolitan area. All students were African-American.  Special 
education and free and reduced lunch status were not provided about individual students, 
however, the school has 76.2% of their students on free and reduced lunches, and 16% of 
students have IEPs.  Table 5 presents a specific demographic breakdown of participants 
by sex and grade.   
Table 5 
Participant Characteristics 
 Male Female 
Third Grade 19 15 
Fourth Grade 18 22 
Total 37 37 
 
 The range of effect sizes of the studies in the literature review was from .09 to 
1.21 (with a weighted mean of .41) for the studies that reported an effect size at the 
immediate posttest, and from .21 to 1.05 (with a weighted mean of .72) for studies that 
reported an effect size at the delayed posttest. Power analyses using trends from the 
literature as guidelines for effect sizes (Ostrow et al., 2015; M. Rau et al., 2010, 2014; 
Rohrer & Taylor, 2007; Taylor & Rohrer, 2010) were conducted with conducted with G-
Power (Faul, Erdfelder, Buchner, & Lang, 2009) and Optimal Design (Spybrook et al., 
2011) software.   Setting effect size at .41, alpha set at .05, and power at .8, G-Power 
returns a sample size of 39.  Setting effect size at .72, alpha set at .05, and power at .8, G-
Power returns a sample size of 14.  A sample of 74 should be sufficient to detect a 
difference between interleaved and repetitive practice at either an immediate or delayed 
41 
 
posttest.  There are no published studies comparing Interleaved and incremental rehearsal 
groups, so estimating effect sizes for this comparison is difficult.   
Minimum sample size guidelines for mixed-effects models follow a general rule 
of 20 clusters with five observations per cluster (Raudenbush & Bryk, 2002).  In this 
study, observations (up to 405) are clustered within students (74).  Given the nature of the 
analyses (described below), the relative lack of risk and burden on the part of the 
participants, and the benefits of technology regarding ease of data collection, 74 
participants is a reasonable sample size.  Seventy-four participants is more than three 
times the rule of thumb mentioned above, and more than double the number suggested to 
detect a retention difference between repetitive and interleaved practice.  
Design 
 This study employed a crossover, within-subjects design in which an attempt was 
made to expose each participant to every practice condition.  Every permutation of 
practice condition order was implemented with six groups of students in each grade.  
Each student was randomly assigned to one of the six permutations. See Table 6 for a 
depiction of the Latin Square and Table 7 for a depiction of the student experience.  
Students had three targets to practice seven times per session across six sessions in each 
practice condition.  The pretest followed by six practice sessions and two posttests is 
referred to as a bundle in this study.   Students took a pretest, and then three problems 
that they answered incorrectly during the pretest were assigned as learning targets for a 
bundle.  The students were then administered six practice sessions, followed by an 
immediate posttest of all items from the pretest, and then a delayed posttest administered 
42 
 
at least 10 days after the last practice session. The plan for the study was for each student 
to proceed through three practice schedules across three consecutive bundles.  Each target 
was practiced 42 times within a bundle in addition to exposures at pretest, immediate 
posttest, and delayed posttest, for a total of  45 exposures per target per student per 
practice schedule, 135 exposures per student per practice schedule, and 405 exposures per 
student if a student was administered all practice sessions and took the pretests and both 
posttests for each session. 
Table 6 
Latin Square for Counterbalancing 
1 Rep IL IR 
2 Rep IR IL 
3 IR Rep IL 
4 IR IL Rep 
5 IL IR Rep 
6 IL Rep IR 
 
Table 7 
Student Experience of Study 
Bundle 1 Bundle 2 Bundle 3 
Pretest A Pretest B Pretest C 
Practice Schedule 1 Session 1 Practice Schedule 2 Session 1 Practice Schedule 3 Session 1 
Practice Schedule 1 Session 2 Practice Schedule 2 Session 2 Practice Schedule 3 Session 2 
Practice Schedule 1 Session 3 Practice Schedule 2 Session 3 Practice Schedule 3 Session 3 
Practice Schedule 1 Session 4 Practice Schedule 2 Session 4 Practice Schedule 3 Session 4 
Practice Schedule 1 Session 5 Practice Schedule 2 Session 5 Practice Schedule 3 Session 5 
Practice Schedule 1 Session 6 Practice Schedule 2 Session 6 Practice Schedule 3 Session 6 
Immediate Posttest Immediate Posttest Immediate Posttest 
Delayed Posttest Delayed Posttest Delayed Posttest 
 
 To address research questions 1 and 2 it was necessary to designate each exposure 
to a learning target into either an acquisition trial or a retention trial.  An acquisition trial 
43 
 
was any exposure in which the participant had seen the target that day.  A retention trial 
was any exposure in which the participant had not seen the target for at least two days. 
Table 8 shows how acquisition and retention trials were distributed among the pretest, 
posttests, and six practice sessions as the study was planned.  Due to school closures, 
student absences, and other factors, the spacing between practice was often different from 
planned (See Table 9).  However, the days since practice was a part of several candidate 
models, and these variations were accounted for.  Note that the schedules depicted in 
Tables 8 and 9 are for one target only.  Depending on schedule, these exposures would be 
a) followed by another target in the same pattern (repetitive schedule), interleaved with 
other targets (interleaved schedule), or interspersed with known targets (incremental 
rehearsal schedule).  All possible items for a bundle are presented during the pretests and 
posttests. 
Table 8  
Breakdown and Planned Spacing of Acquisition vs Retention Trials Across Practice 
Sessions 
Day 1 Day 3 Day 5 Day 7 Day 9 Day 11 Day 25 
 
Pretest 
 
Session 1 
 
Session 2 
 
Session 3 
 
Session 4 
 
Session 5 
 
Session 6 
Immediate 
Posttest 
Delayed 
Posttest 
3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 
 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5   
 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5   
 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5   
 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5   
 3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
  
Note: Items is bold are retention trials while items in italics are acquisition trials. 
 
 
44 
 
Table 9 
Breakdown and Example of Altered Spacing of Acquisition vs Retention Trials Across 
Practice Sessions 
Day 1 Day 3 Day 10 Day 12 Day 22 Day 24 Day 34 
 
Pretest 
 
Session 1 
 
Session 2 
 
Session 3 
 
Session 4 
 
Session 5 
 
Session 6 
Immediate 
Posttest 
Delayed 
Posttest 
3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 
 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5   
 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5   
 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5   
 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5   
 3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
3 x 5 
  
Note: Items is bold are retention trials while items in italics are acquisition trials. 
 
 
Materials 
 This project was implemented in collaboration with FastBridge Learning (FBL) 
using software that is under development to help experimenters build and run studies.  
FBL is an educational software company that specializes in assessment and progress 
monitoring.  Using the FBL infrastructure, a new experimenter user interface was 
developed with the purpose of investigating new instructional methods for students.  This 
interface was used to create the practice conditions for students.  The student interface 
mirrored typical item interfaces within the FastBridge system.   
Student interface. The student interface is based on math fact items already 
developed for the FBL systems. Students access the interface using tablet devices (in this 
study, they used classroom iPads). In each session, students were presented with an 
introductory set of two problems with instructional narration.  They were then instructed 
to click a start button when they were ready to begin their practice session.  Each problem 
45 
 
was presented, and after each response, students received feedback about whether their 
response was correct or incorrect.  If their response was incorrect, they were shown the 
correct answer.  See Figure 2 for examples of the problem presentation and feedback 
received by the students. This basic interface has been used by thousands of students 
around the country as a part of the FBL suite of assessments.   
 
Figure 2. Example of student interface. 
  
Learning targets. As mentioned in Chapter 1, single digit addition and 
multiplication facts were chosen as the learning targets for this study.  All third-grade 
students were given addition problems and all fourth-grade students were given 
multiplication problems.  The operator split was based on consultation with a math expert 
before the start of the study. The single digit addition and multiplication problems were 
split into three problem sets each that were of approximately similar difficulty.  See Table 
10 for lists of problems by pretest.  All students regardless of schedule encountered the 
problems in the same order, Set A followed by B, and then C.  Items were presented in 
46 
 
the order given in Table 10.  Students received the appropriate set as a pretest, as an 
immediate posttest, and as a delayed posttest. 
 
 
 
 
Table 10 
Problem Organization 
Multiplication  Addition 
A  B  C   A  B  C  
0x0 0x1 0x2  0+0 0+1 0+2 
0x3 0x5 0x4  0+3 0+5 0+4 
0x6 0x7 0x8  0+6 0+7 0+8 
0x9 1x3 1x2  0+9 1+3 1+2 
1x1 1x6 1x5  1+1 1+6 1+5 
1x4 1x9 1x8  1+4 1+9 1+8 
1x7 2x3 2x2  1+7 2+3 2+2 
2x4 2x5 2x7  2+4 2+5 2+7 
2x6 2x8 3x5  2+6 2+8 3+5 
2x9 3x4 3x8  2+9 3+4 3+8 
3x3 3x6 4x4  3+3 3+6 4+4 
4x5 3x7 4x6  4+5 3+7 4+6 
4x7 3x9 4x8  4+7 3+9 4+8 
5x7 4x9 5x5  5+7 4+9 5+5 
5x9 5x6 6x7  5+9 5+6 6+7 
6x6 5x8 6x9  6+6 5+8 6+9 
6x8 7x7 7x8  6+8 7+7 7+8 
8x8 7x9 9x9  8+8 7+9 9+9 
8x9    8+9   
 
Social validity questionnaire. A two-question questionnaire was administered as 
a social validity measure.  This questionnaire was not aligned to a specific research 
47 
 
question but was used to gauge student perceptions of the intervention.  The questions 
were (1) How helpful was this practice for learning math? and (2) How fun was this math 
practice? 
Procedure 
 Students used an iPad for each session.  In their first session, students took the 
pretest, which included one third of the possible addition or multiplication items. For the 
purposes of this experiment, the terms “known” and “unknown” are used to refer to items 
to which students responded correctly or incorrectly respectively.  This naming 
convention is consistent with the incremental rehearsal literature (Burns, 2005; Varma & 
Schleisman, 2014).  The software tracked correct and incorrect responses, and randomly 
selected three unknown targets for the subsequent practice bundle.  The software also 
selected seven known items for use in the incremental rehearsal schedule.  In cases where 
there were not seven known items, known items were recycled as needed.  In cases where 
there were fewer than three unknown items, targets were recycled. 
Students had six seconds to respond to each item.  After six seconds the trial was 
scored incorrect.  Other math fact practice research (Burns, 2005) has used a two-second 
response time for oral responses.  The four extra seconds in this procedure were added to 
allow for extra time needed to respond via typing on the iPad, which can be cumbersome.  
During each exposure, students were given feedback about the correctness of their 
response.  If incorrect, students were shown the correct answer.   
A bundle included a pretest, six practice sessions in which the student was 
exposed to each target seven times in the appropriate schedule, a posttest immediately 
48 
 
following the last practice session and a delayed posttest.  Ideally, these sessions would 
be evenly spaced over two weeks; however, student absences, school schedules, and 
other factors necessitated altered schedules.  Some students received the intervention 
each school day.  In other cases, the interval between sessions was longer than the 
intended delay for the delayed retention test.  For example, due to inclement weather 
many students did not interact with the experiment between January 23rd and February 4th 
(or later).  Data were collected each day class was in session and all students who were 
available and willing were asked to participate.  The same process was repeated for each 
bundle.  A questionnaire was attempted at the end of each bundle.   
It should be noted that the intention in this study was to hold the total practice 
opportunities equal across students and practice schedules.  However, there were some 
cases in which students accidently navigated away from the intervention, or the system 
froze.  In those cases, the practice session was restarted. The nature of the system is such 
that students were started from the beginning of the session, and thus had more practice 
opportunities than students who did not have issues.  Further, when a student only had 
one or two potential targets to choose, the software chose one (or two) at random and 
substituted it/them for missing targets in the schedules.  This inadvertent over exposure is 
controlled in the analysis with the practice opportunities predictor variable.  
Measures 
Outcome variables. Accurate responses are the main outcomes used for the first 
and second research questions.   The log-likelihood of an accurate response on any given 
observation is modeled in the case of acquisition, and log-likelihood of an accurate 
49 
 
response on retention observations is modeled in the case of retention.  Retention 
observations are defined as any observations that are at least two days after the previous 
practice opportunity. 
Originally, the plan was to model only responses on the delayed posttest.  
However, a combination of many canceled school days and student absences during data 
collection periods led to gaps of greater than 10 days between practice sessions within a 
bundle.  Thus, I decided to change the definition of a retention observation as described 
above. Specifically, a retention trial is any trial of a target in which the student has not 
seen that target for at least two days. This change allows for models that include number 
of practice opportunities and days since last practice opportunity.  Thus, there were a 
varying number of retention observations for each student.  Further, attrition (typically in 
the form of unwillingness to participate further, but also in lack of attendance) led to not 
every student being exposed to every practice condition, or receiving all practice or 
posttest sessions for a bundle.  The change in retention observation definition allows 
observations from those students to be used in the models. 
 For Research Question 3, a rate was calculated for each set of math facts 
associated with a particular practice schedule for each student.  This rate was the number 
of target facts learned at the immediate or delayed posttests divided by the amount of 
time the student spent in the intervention (including the pretest and posttests).  For 
example, if a student learned all three target facts in the incremental rehearsal bundle, and 
spent an hour doing so,, then the outcome rate for that student for that bundle was 1 target 
learned per 20 minutes of practice.   
50 
 
 Predictor variables. For Research Question 1, predictor variables of interest 
include pretest score, practice schedule, and number of exposures to the target (referred 
to as practice opportunities.  Interactions between predictor variables were also included 
in the models.  Predictor variables of interest for Research Question 2 include those for 
Research Question 1, as well as the number of days since the last practice of the target.  
Predictor variables of interest for Research Question 3 include pretest score and practice 
schedule.   
 The number of exposures to the target, or practice opportunities, accounts in some 
ways for the Total Time Hypothesis (Cooper & Pantle, 1967; Underwood, 1970). Pretest 
score is included to determine if the effects of practice or practice schedule interact with 
student skill level before the intervention (Ostrow, Heffernan, Heffernan, & Peterson, 
2015; Rau, Aleven, & Rummel, 2013).   Pretest score is the percent correct on the pretest 
for a bundle.  The role of Practice Schedule as a predictor variable is clear, as it is the 
driving feature of the primary research questions.  The number of days since practice 
should give an idea of retention over differing lengths of time.   
 Previous research in interleaved practice (Magill & Hall, 1990) indicates that an 
asymptote is a possibility; as such, a quadratic form of the number of exposures of the 
target was included in the modeling process.  Also, there is a potential effect of day of 
intervention, which corresponds to the number of calendar days since the start of the 
study.  While there is an expectation that day of intervention will share a lot of variance 
with number of exposures, there is a possibility of some anomaly related to the day of a 
51 
 
practice session (e.g., related to the school calendar, the day of the week, and so forth).  
To account for such a possibility, day of intervention was included as a predictor. 
Analysis Plan 
 First, I decided whether to analyze third-grade (addition) and fourth-grade 
(multiplication) students with the same or separate analyses.  This decision was based on 
whether third- and fourth-grade students performed significantly differently on the first 
pretest of the study. If they did, then they should be analyzed separately, because they 
respond differently enough to be treated as two different groups. Once that decision was 
made, the following analysis plan was followed for each research question.  Table 11 lists 
variables of interest for answering all three research questions, additional details of which 
are described below.  Appendix A lists candidate models for research questions 1 and 2. 
Research Question 1. How is acquisition of target math facts influenced by 
practice schedule (repetitive, incremental rehearsal, and interleaved)? 
In this study multiple observations were made for each student.  Thus, the 
assumption of independence of observations was violated.  A mixed-effects regression 
model accounts for autocorrelation between within-student observations (Long, 2012; 
Raudenbush & Bryk, 2002).  Further, this question has a binary (0 = incorrect and 1 = 
correct) outcome variable and cannot have normally distributed residuals. Logistic 
regression uses a general linear model and maximum likelihood estimations to model the 
log-likelihood of a response of 1 (Hilbe, 2009).  A logistic mixed-effects regression 
model is able to model the log-likelihood of the binary outcome with nested data. Thus, it 
is a suitable tool for approaching Research Question 1.  Unfortunately, the mixed effects 
52 
 
models would not converge, and could not be used.  Using a logistic model that does not 
include an estimate of random effects was deemed not appropriate, because it would not 
be able to account for interdependence of the repeated measures data, violating the 
assumption of independence.  Thus, this research question could only be addressed with 
descriptive statistics. 
53 
 
 
Table 11 
Variable List 
Variable Type Variable 
Research 
Question 
Operationalization 
Predictor Pretest score 1, 2, and 3 Percent correct on first pretest 
Predictor Number of practice opportunities 1 and 2 Number of times target is seen 
Predictor Practice schedule 1, 2, and 3 IL, IR, or Rep 
Predictor Days since practice 2 Number of days since target has appeared 
Predictor Day of Intervention 1 and 2 Calendar days since start of study 
Predictor 
Practice Schedule x Number of 
opportunities 1 and 2 
Two-way Interaction 
Predictor Practice Schedule x Pretest score 1, 2, and 3 Two-way Interaction 
Predictor Practice Schedule x Grade 1, 2, and 3 Two-way Interaction 
Predictor 
Practice Schedule x Number of 
opportunities x Pretest Score 1 and 2 
 
Three-way Interaction 
Predictor 
Practice Schedule x Days since 
practice x Pretest Score 1 and 2 
 
Three-way Interaction 
Predictor 
Practice Schedule x Days since 
practice 2 
Interaction 
Outcome 
Accuracy of response at acquisition 
observation 1 
Correct or incorrect response at observations in which 
the target has been seen that day 
Outcome 
Accuracy of response at retention 
observations 2, 3 
Correct or incorrect response at observations in which 
the target has not been seen for at least two days 
Outcome Time spent in intervention 3 
Sum of reaction times on trials to date (including on 
pretests and posttests) 
Note: Quadratic and cubic forms of practice opportunities will be included as well to capture asymptotes and reversals of trends. .  IL, IR, and 
Rep refer to interleaved, incremental rehearsal, and repetitive schedules respectively. 
54 
 
Research Question 2. Does retention of target math facts differ by practice 
schedule?  
A mixed-effects logistic regression model is the most appropriate analytical tool 
for this question.  Unlike research question 1, the mixed effects models converged.  Only 
one random effect was estimated for two reasons: (1) models with more than one random 
parameter were not able to converge, and (2) output summaries for models with more 
than on random parameter assigned a small proportion of the variance to parameters other 
than the random intercept.  See Table A1 for a list of candidate models for research 
question 2. 
Further, an information-theoretic (IT) model selection framework (D. Anderson, 
2008; Burnham & Anderson, 2002; Burnham, Anderson, & Huyvaert, 2011) was used to 
make a selection of a final model.  With this approach, a set of variables of interest are 
used to create an a priori set of candidate models.  All models are run, and information 
criteria scores (Akaike Information Criteria [AIC], Bayesian Information Criteria [BIC].) 
are calculated based on the log-likelihood of the models and compared using both the 
criteria scores and the Akaike weights.  An Akaike weight is the probability that a model 
is the best model in the candidate set given the data (D. Anderson, 2008).  A model with 
a weight of .9 has a 90% probability of being the best model given the data.    Given the 
data and the calculated information criterion, a probability weight is calculated, and the 
model with the lowest chosen information criterion and highest weight is chosen as the 
final model.  In this case, given the sample size involved, the corrected Akaike 
Information Criterion (AICc) was used to pick the best fitting model.  Parameter 
55 
 
estimates were interpreted through a lens of Beta averaging (Burnham & Anderson, 
2002) in which the parameter (Beta) estimates for each parameter are multiplied by the 
weight of the model and summed across models.  Further details about Beta averaging 
are presented in the results section. 
Research Question 3. Does efficiency of learning differ by practice schedule in 
terms of targets learned per unit of time? 
For this question, I ran three sets of two linear regression models.  Models were 
run for the immediate and delayed posttest for each of the following outcomes: total 
targets answered correctly on the posttest, total time spent in a specific practice condition 
in the intervention by the end of each posttest, and targets correct per 20 minutes spent in 
the intervention.  Predictors included practice schedule and centered pre-test score.  
These three sets of models allowed for comparisons of the practice schedules in terms of 
total targets learned and time spent doing the intervention individually, as well as the 
matter of interest, efficiency. 
The decision to divide the number of correct responses to targets at posttest by 20 
minute intervals (rather than milliseconds, seconds, minutes, or hours) was made for 
three primary reasons. First, it was a number that resulted in a reasonably interpretable 
result.  For example, the referent group averaged about 1.8 correct responses per 20 
minutes of practice.  That is easier to interpret than the referent group averaging about .09 
correct responses per minute.  Second, each schedule took at least 20 minutes to complete 
on average. And third, only one practice condition lasted for greater than an hour, so the 
hour unit makes little sense for the other two groups. 
56 
 
Model Checking 
Research question 1 could not be addressed via the planned modeling approach, 
so there were no models to check.  Research question 2 was addressed with a binary 
mixed-effects logistic regression model.  A normal distribution of residuals is impossible 
with a binary outcome.  Further, overdispersion is not a concern with a binary outcome 
(Hilbe, 2009); thus, the main check on model adequacy was via a visual inspection of 
extreme residuals in a half-normal plot and uniform variation of deviance residuals 
plotted against a linear predictor (Faraway, 2005).  Visual inspections of half-normal and 
deviance residual plots are discussed in the results sections in reference to model 
checking and model adequacy.  Distributions of numeric predictors were checked for 
normality.  Distributions are noted in the results section. 
 Research question 3 employs ordinary least squares linear regression models to 
determine learning efficiency differences between the three practice schedules.  In this 
case residuals were plotted on a frequency distribution to check for normally distributed 
residuals and a scatterplot with a loess smoother to check the appropriateness of a linear 
model.  
57 
 
Chapter 4 
 
RESULTS 
 
Random Assignment 
 Due to absences, snow days and extreme cold days on which school was 
cancelled, student refusal, and other unforeseen events, not every student received every 
condition.  However, because all conditions were counterbalanced, and all students 
randomly assigned, each condition had a similar number of participants within each 
grade/operation and in total.  See Table 12 for the number of participants in each 
condition.  There were 34 unique students in 3rd grade and 38 unique students in 4th 
grade.  A total of 72 students of the original 74 were included in these analyses.  Two 
fourth grade students were not included because they scored perfect on the pretests. 
Table 12 
Number of Participants in Each Condition 
 Addition Multiplication Total 
Interleaved 29 22 51 
Incremental Rehearsal 27 22 52 
Repetitive 30 23 53 
 
Combining Addition and Multiplication for the Analysis 
Recall that if third- and fourth-grade students performed similarly on the first 
addition and multiplication pretests, respectively, then they could be combined in the 
final analyses.  Table 13 shows descriptive statistics for pretest scores for each group.  A 
two-sample t-test was run to determine if there was a statistically significant different 
between pretest scores for addition and multiplication.  A modified Levene’s test was 
used to make sure assumptions of homoscedasticity were met to run the two-sample t-
58 
 
test.  The null hypothesis for the Levene’s test was not rejected and indicated that the 
variances could be treated as equal ( 𝐹1,70 = 1.76, p = .19), and the t-test was run.  The 
null hypothesis for the t-test was not rejected (𝑡66= 1.62, p = .11), indicating that there 
did not appear to be a statistically significant difference between the performance of the 
two groups on their respective pretests.  However, the effect size (g = 0.37) for the 
difference between the two groups was large enough to warrant controlling for group 
differences in the models (What Works Clearinghouse, 2017, pg 14).  Student scores on 
the first pretest were used as a predictor in the candidate models. 
Table 13 
Addition and Multiplication First Pretest Descriptive Statistics for Proportion Correct 
 N Mean SD Median Min Max Skew Kurtosis 
Addition 34 .70 .13 .74 .42 .84 -0.67 -0.94 
Multiplication 38 .63 .19 .68 .16 .89 -0.72 -0.30 
 
Research Question 1 
How is acquisition of target math facts influenced by practice schedule (repetitive, 
incremental rehearsal, and interleaved)? 
 The analyses performed for research question 1 were completed with data from 
acquisition trials.  An observation that is an acquisition trial is an observation of a target 
in which the participant has been exposed to that target on that day.  Any practice 
opportunity in a practice session that is not the first exposure is an acquisition trial (with 
the exception of the first practice session in which the participant has seen the target in 
the pretest).  The original analysis plan called for logistic mixed effects regression models 
59 
 
to be built to account for the binary outcome variable and autocorrelation of the data 
structure.  However, even relatively simple logistic mixed effects regression models 
would not converge.  This failure to converge could be an artifact of a lack of variation in 
responses.  Acquisition observations included in the model were only observations in 
which the participant had already seen the problem and the accuracy across all 
observations was quite high (79%).  There might not have been enough variance to model 
random effects.  Descriptive statistics are outlined below. 
 The accuracy for acquisition trials were 75%, 80%, and 81% for interleaved 
practice, incremental rehearsal, and repetitive practice respectively.  The accuracy rate 
was between 72% and 84% across both addition and multiplication for all acquisition 
trials and practice conditions.  See Table 14 for percent of correct responses broken down 
by operation and practice schedule.   
Table 14  
Accuracy Rates by Operation and Practice Schedule 
 Addition Multiplication Total 
Interleaved 72% 79% 75% 
Incremental Rehearsal 79% 81% 80% 
Repetitive 79% 84% 81% 
Total 77% 81% 79% 
 
 While the proportion of correct responses is generally higher for incremental 
rehearsal than interleaved practice, and generally higher for repetitive practice than 
60 
 
incremental rehearsal,  these rates tend to be high across all schedules.  Incremental 
rehearsal does match repetitive practice for addition.  A similar pattern can be seen in a 
histogram of responses by practice opportunity (Figure 3).  In general, accuracy greatly 
increases within the first several acquisition trials, and maintains a high rate throughout 
the remaining practice.  These data show that acquisition trials tend to asymptote 
relatively quickly and maintain a high level of accuracy across practice.  In acquisition 
trials, student errors reduce dramatically and stay low.  While I hypothesized such a 
pattern for the repetitive schedule, it appears that for math facts, the trend is the same for 
the incremental rehearsal and interleaved practice schedules as well.  The pattern is 
similar for both addition and multiplication.  In the figure there are bins for practice 
opportunities beyond the 45 expected in the design.  This is due to the issue of session 
restarts and duplicated targets mentioned in the methods section.  Rather than truncate the 
figure based on an ideal experimental situation that did not exist, all observations were 
graphed. 
 
61 
 
 
Figure 3. Histogram of correct and incorrect response frequencies by Incremental Rehearsal (IR), 
Interleaved (IL), and Repetitive (Rep) practice schedules. 
 
Research Question 2 
Does retention of target math facts differ by practice schedule? 
 Retention observations were defined as any trials in which the student had not 
seen that item for at least two days.  These outcomes were modeled with logistic mixed 
effects regression.  Models with more than one random effect estimated would not 
converge, so models were run with only a random intercept estimated.  
 Numerical predictors for these models included practice opportunities, score at 
first pretest, and days since practice.  All three numeric predictors had approximately 
normal distributions.  See Table 15 for skew and kurtosis for the three predictors. 
 
62 
 
Table 15  
Skew and Kurtosis for Three Numeric Predictors Used in Models for Research Question 
2 
 Mean SD Min Max Skew Kurtosis 
Practice Opportunities 18 15 1 65 .55 -.91 
Pretest Score .64 .17 .16 .95 -.52 -.48 
Days Since Practice 13 11 2 59 1.9 4.6 
n = 1852 observations of 72 students 
 
Forty models were fit to attempt to answer research question 2.  These models 
were created from various combinations of the variables of interest listed above, and can 
be found Appendix A.  An AICc weight of .005 was used as a conservative cut-off for 
models listed in these results.  A model with an AICc weight lower than .005 has less 
than a .5% chance of being the best model given the data.  Nine models in the retention 
analysis converged and had an AICc weight of .005 or higher and are the only models 
discussed further in this analysis.  The cumulative AICc weight for all models in this 
analysis through the .005 cut-off round to 1.00 at three significant digits.  See Table 16 
for a comparison of the nine selected models by number of parameters estimated and 
AICc metrics.   
Beta averaging is a way to weight parameter estimates by model likelihood and 
get a holistic picture of the relative influence of variables estimated by the models 
examined within the information theoretic framework (Burnham & Anderson, 2002).  To 
obtain beta averages, parameter estimates are multiplied by the AICc weight of the model 
63 
 
and summed across models.  Table 17 displays the parameter estimates for each of the 
nine selected models and the beta averages for each parameter.  It also shows the odds 
ratio for each beta average, which is the exponentiation of the log-likelihood average for 
each parameter.  Repetitive practice is the referent group for this analysis, so all 
intercepts are for the repetitive practice condition at time point zero for a student who got 
no items correct on the pretest. 
Table 16 
Summary of the Nine Selected Retention Models Ordered by AICc Weight 
 Model Parameters AICc AICc 
Weight 
1 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) +
𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 
𝛾80(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) +
𝛾90(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗  
11 2288 0.436 
 
2 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅)
+ 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗  
8 2289 0.274 
 
3 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝑢0𝑗
+ 𝑟𝑖𝑗  
6 2291 0.109 
 
4 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) +
𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 
𝛾80(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
11 2291 0.089 
 
5 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅)
+ 𝛾50(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗  
7 2293 0.045 
 
6 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦)  + 𝛾20(𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗  
4 2295 0.013 
 
7 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 
4 2295 0.013 
 
8 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂)
+ 𝛾30(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗  
5 2297 0.006 
 
9 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)
+ 𝑢0𝑗 + 𝑟𝑖𝑗 
5 2297 0.005 
Note: Abbreviations above are as follows: Practice Opportunities (PO), Interleaved (IL), Incremental 
Rehearsal (IR), and Repetitive (Rep). 
64 
 
 Table 17 
Parameter Estimates for the Nine Selected Models Ordered by AICc Weight With Odds Ratios Calculated from Beta Averages 
 
 
 
 Model (Weight) 
 
 
 
 Model 1  Model 2  Model 3   Model 4  Model 5   Model 6   Model 7  Model 8  Model 9   
 Beta 
Average 
Odds 
Ratios 
𝐼𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 (𝑟𝑒𝑝𝑒𝑡𝑖𝑡𝑖𝑣𝑒) 0.26 0.23 0.05 -0.32 -0.18 0.25 0.25 0.27 0.07  0.15 1.16 
𝐷𝑎𝑦 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01  -0.01 .99 
𝐼𝐿 0.15 0.16 0.431 0.15 0.43 NA NA NA NA  0.19 1.21 
𝐼𝑅 -0.20 -0.17 0.17 -0.16 0.16 NA NA NA NA  -0.12 .89 
𝑃𝑂 0.01 0.01 0.02 0.01 0.02 0.02 0.02 0.02 0.02  0.02 1.02 
𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 -0.06 NA NA 0.90 0.34 NA NA NA 0.27  0.07 1.07 
𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 NA NA NA 0.02 NA NA NA 0.00 NA  0.00 1.00 
𝐼𝐿 𝑥 𝑃𝑂 0.00 0.02 NA 0.02 NA NA NA NA NA  0.01 1.01 
𝐼𝑅 𝑥 𝑃𝑂 -0.05 0.02 NA 0.02 NA NA NA NA NA  -0.02 .98 
𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒  
𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 
NA NA NA -0.05 NA NA NA NA NA 
 
0.00 1.00 
𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 0.02 NA NA NA NA NA NA NA NA  0.01 1.01 
𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 0.10 NA NA NA NA NA NA NA NA  0.04 1.07 
Note: all values rounded to two digits and  abbreviations above are as follows: Practice Opportunities (PO), Interleaved (IL), Incremental Rehearsal (IR), and 
Repetitive (Rep). 
65 
 
 One way to consider the relative importance of different predictors in candidate 
models is to look at the relative probability of the models that include that predictor 
compared to models that do not (Anderson, 2008; Burnham & Anderson, 2002).  Practice 
schedule variables do not appear in 4 of the nine top models, however, the combined 
model weight of the models that include practice schedule variables is .953.  Thus, there 
is a greater than 95% probability that one of those five models is the best model given the 
data.  By comparison, the combined model weights of the models that do not include 
practice schedule variables is .037, indicating that models that include practice schedule 
variables are about 26 times more likely than models that do not.  
Another way to consider the relative effects of the variables is to examine the 
odds ratios.  There are some variables that are part of the top models that do not appear to 
appreciably influence outcomes.  These variables, including the Days Since Practice term 
for example, had odds ratios at 1.00 indicating that the number of days since practice did 
not increase or decrease the likelihood of a correct response after accounting for other 
variables in the model.  Also, parameter estimates were largely stable across models.  The 
variable with the largest magnitude odds ratio is the increased probability of a correct 
response that occurs in the presence of an interleaved practice schedule.  Specifically, 
there is a 13.6% greater chance of getting a response correct when using an interleaved 
schedule than when using a repetitive schedule, and a 23% greater chance of a correct 
answer with an interleaved scheduled compared to incremental rehearsal.  Conversely, 
there is an 8.5% lower chance of getting an item correct when using an incremental 
rehearsal schedule than when using a repetitive schedule.   These relative probabilities 
 
                                             66 
 
change over time, as demonstrated by the interaction variables.  Students using an 
incremental rehearsal schedule with high pretest scores have increased odds of a correct 
response over time when compared to students using a repetitive schedule.   
Visual analysis of predicted probability of a correct response on retention trials 
model 1 makes the differences between practice schedules clearer.  See Figure 4 for 
predicted probabilities faceted across four quartiles based on the first pretest score.  The 
model includes pretest accuracy as a continuous variable with the results presented in 
quartiles in the figure for ease of interpretation.  
The largest difference is evident in the lowest quartile in which predicted 
probability of retention accuracy approaches 80% for only the interleaved practice 
condition.  In that group of lower performing students, there is also a pattern of repetitive 
practice leading to a higher predicted probability of a correct response than incremental 
rehearsal across all practice opportunities.  In the second and third quartiles interleaved 
and incremental rehearsal conditions are more similar.  
The predicted probability of a correct response in repetitive practice schedules is 
stable across all ability levels.  Incremental rehearsal schedules are associated with the 
sharpest change in predicted probability of a correct response across ability levels.  Low 
ability students have a very flat trajectory with incremental rehearsal practice, while 
higher ability students reach a high predicted accuracy in the same or fewer exposures 
than interleaved practice.  The pattern of predicted accuracy for students in interleaved 
schedules appears quite stable across ability levels. 
 
                                             67 
 
Another pattern that emerged was that the retention benefit for interleaved 
practice appears relatively early in the practice and grows steadily as practice 
opportunities accumulate.  Conversely, incremental rehearsal practice appears to have a 
lower retention benefit early in the intervention before accelerating more than the other 
schedules later in the intervention for the highest ability learners. 
The predicted probabilities of a correct response in interleaved schedules when 
compared to repetitive schedules across all ability levels is consistent with the original 
hypothesis.  The relative similarities in the predicted probability of correct responses of 
students in interleaved and incremental rehearsal schedules is consistent with what was 
expected among students in the higher quartiles.  The disparity between interleaved and 
incremental rehearsal schedules in the lower two quartiles was not expected.   
 
                                             68 
 
 
Figure 4. Graph of probability of a correct response on retention trials by practice 
schedule (Interleaved (IL), Incremental Rehearsal (IR), and Repetitive (Rep)) based on 
model 1 and split into four quartiles from the first pretest. 
 
Receiver Operator Characteristic (ROC) curves can be used to quantify and 
visualize the specificity and sensitivity of a classification model (Hilbe, 2009).  Binary 
logistic models are classification models, as the model predicts the likelihood of a 
response falling into one of two categories (correct or incorrect).  The ROC analysis 
looks at the model predictions, compares them to reality, and plots a curve along the X 
and Y axes (specificity and sensitivity, respectively).  The curve is compared to a model 
with no predictive value, and area under the curve is evaluated for predictive fit.  An area 
under the curve of 1.0 is a perfect fit.  A value of .5 predicts no better than chance, and a 
value of less than .5 predicts worse than chance.  The pROC (Robin et al., 2011) package 
 
                                             69 
 
in R was used to run ROC curve analyses.  The area under the ROC curve for the top 
retention model was .65.  The ROC curve is shown in Figure 5. 
 
Figure 5. Model 1 retention ROC Curve. 
 
Graphing the deviance residuals for the top retention model (Figure 6) shows 
similar variation across the linear predictor.  It also shows approximately equal scatter 
above and below the 0 line which indicates that a linear model is appropriate. This 
distribution supports the top model as an adequate one (Faraway, 2016).  Graphing the 
data on a half-normal plot (Figure 7) shows that there is no need to be concerned about 
extreme or unusual cases, as there are no points that are split far out from the others 
(Faraway, 2016). 
 
                                             70 
 
 
Figure 6. Deviance residuals for retention model. 
 
Figure 7. Half-normal plot for retention model. 
 
 
                                             71 
 
Research Question 3 
 
Does efficiency of learning differ by practice schedule in terms of targets learned per unit 
of time? 
 Sixty-one individual students took an immediate post-test in at least one practice 
schedule, and 52 took a delayed post-test in at least on practice schedule.   Breakdown of 
the number of students in each practice condition at the immediate and delayed posttests 
are below in Table 18. Three sets of models were run to look at learning efficiency.  First, 
models were run to compare the three practice schedules at immediate and delayed 
posttests with respect to the number of targets answered correctly.  Second, models were 
run to compare the three practice schedules based on the amount of time students spent in 
practice.  Finally, models were run comparing the practice schedules based on a learning 
efficiency rate.  All models included practice schedule and mean centered pretest score as 
covariates.  Pretest scores were mean centered to aid in interpretation.  A per condition 
learning rate was calculated for a student by computing the total time a student spent 
across all trials for a condition, determining the number of 20-minute intervals 
represented by the total time, and dividing the total number of learning targets answered 
correctly at the posttest (immediate or delayed) by the number of 20-minute intervals.  
This process gives an items learned per 20 minutes of practice outcome that can be 
modeled and predicted by practice schedule.   
   
 
 
                                             72 
 
Table 18 
Number of students in each Practice Schedule at Immediate and Delayed Posttests 
 Immediate Posttest Delayed Posttest 
Repetitive 37 27 
Interleaved 47 21 
Incremental Rehearsal 33 24 
Total 117 72 
 
Targets Correct at Immediate and Delayed Posttests 
 
 The linear model predicting the number of items correct at the immediate posttest 
by practice schedule indicates that the effect of practice schedule is significant at α = .05 
(𝐹 5,111= 2.89, p = .017, 𝑅
2 =  .08), however does not account for much variance.  See 
Table 19 for summary of model estimates.  The linear model predicting the number of 
items correct at the delayed posttest by practice schedule indicates that the effect of 
practice schedule was not significant at α = .05 (𝐹 5,66= 1.02, p = .41, 𝑅
2 =  .07).  See 
Table 20 for a summary of model estimates for the delayed posttest.  At the immediate 
posttest, there is no significant difference in targets learned between the three practice 
schedules, and only a small interaction between interleaved practice and pretest score.  A 
main effect of interleaved practice approaches significance at the posttest, and at an 
average increase of .6 targets correct, has a meaningful magnitude. 
 
 
                                             73 
 
Table 19 
Summary Table for Model Predicting Correct Targets by Practice Schedule at Immediate 
Posttest  
Parameter β SE t p 
Intercept (repetitive) 1.74 0.13 13.43 <.001 
 
Interleaved Schedule 
 
 
0.02 
 
0.20 
 
0.12 
 
.91 
Incremental Rehearsal Schedule 
 
0.49 0.20 2.40 .19 
Pretest (Mean Centered) 
 
0 0.01 -0.13 .90 
Pretest x Interleaved 
 
0.02 0.01 -2.00 .05 
Pretest x Incremental Rehearsal -0.01 0.01 -0.90 .37 
Adj. 𝑅2= .08,       𝐹5,111 = 2.89,      p=.017 
 
 
Table 20 
Summary Table for Model Predicting Correct Targets by Practice Schedule at Delayed 
Posttest  
Parameter β SE t p 
Intercept (repetitive) 1.45 0.21 6.86 <.001 
 
Interleaved Schedule 
 
 
0.60 
 
0.32 
 
1.88 
 
.06 
Incremental Rehearsal Schedule 
 
0.16 0.31 0.51 .61 
Pretest (Mean Centered) 
 
0 0.01 0.26 .79 
Pretest x Interleaved 
 
0 0.02 -0.20 .42 
Pretest x Incremental Rehearsal .01 0.02 0.81 .84 
Adj. 𝑅2= .07,       𝐹5,66 = 1.02,      p=.41 
 
 
                                             74 
 
 
Figures 8 and 9 show boxplots of the number of targets learned by students in 
each condition faceted on pretest quartiles.  In the figures, the solid line represents the 
median while the dashed line is the mean.  The figures show that there is a lot of overlap 
between practice schedules in the highest ability learners. Also, interleaved practice 
appears to be associated with more targets learned among the lower ability learners at the 
delayed posttest.  Specifically, more than half of students in the interleaved condition in 
the lower two quartiles remembered all targets at the delayed posttest.  This was also true 
of the middle two quartiles at the immediate posttest.   
Another important pattern is the difference between the immediate and delayed 
posttest for incremental rehearsal practice.  At the immediate post-test, over half the 
participants in the lower two quartiles scored perfect on learning targets in the posttest.  
That was not true of any quartile at the delayed posttest. 
 
 
                                             75 
 
 
Figure 8. Targets correct at immediate posttest  
 
 
Figure 9. Targets correct at delayed posttest 
 
                                             76 
 
 
Time in Practice by Practice Condition 
 Models of time in practice at immediate and delayed posttest are both 
significant at α = .05 (Adj. 𝑅2= .80,       𝐹5,111 = 93.78,      p=<.001 and Adj. 𝑅
2= .79,       
𝐹5,66 = 55.37,      p=<.001 respectively). Given the nature of the practice conditions, it is 
no surprise that both the immediate posttest and delayed posttest models show a 
statistically significant effect of practice schedule on time in practice (see Tables 21 and 
22).  Note that average time in practice is different at the immediate and delayed 
posttests.  The time students spent taking the posttests was included in the total time 
students spent in practice.  The rational for this inclusion is first, the format for individual 
exposures is identical between pretests, posttests, and practice.  Students received 
feedback as to the correctness of their response, and corrective feedback in the case of 
incorrect responses.  Second, the posttests were bundled with the practice and are a part 
of the practice experience.  Incremental rehearsal took much longer for both models.  
There was not a statistically significant difference between repetitive and interleaved 
practice at either the immediate nor delayed posttests.  There was a significant difference 
between repetitive and incremental rehearsal practice at both the immediate and delayed 
posttests.  Incremental practice took an average of approximately 80 minutes longer than 
repetitive practice at the immediate posttest, and 81 minutes longer than repetitive 
practice at the delayed posttest.  In addition to being a clear example of the much higher 
average time spent, Figures 10 and 11 also show how much more spread there is in the 
amount of time spent in incremental rehearsal practice.  In the figures, the solid line 
 
                                             77 
 
represents the median while the dashed line is the mean.  Pretest score does not appear to 
appreciably influence the amount of time students are spending in practice.  At both 
immediate and delayed posttests, a 10% improvement in pretest accuracy above the mean 
translates to less than a 30 second difference in practice time. 
Table 21 
Summary Table for Model Predicting Minutes in Practice by Practice Schedule and 
Pretest at Immediate Posttest  
Parameter β SE t p 
Intercept (repetitive) 23.06 2.75 8.37 <.001 
 
Interleaved Schedule 
 
 
0.62 
 
4.15 
 
0.15 
 
.88 
Incremental Rehearsal Schedule 
 
80.63 4.30 18.75 <.001 
Pretest (Mean Centered) 
 
-0.26 0.16 -1.78 .78 
Pretest x Interleaved 
 
0 0.23 0.02 .99 
Pretest x Incremental Rehearsal -1.56 .29 -5.47 <.001 
Adj. 𝑅2= .80,       𝐹5,111 = 93.78,      p=<.001 
 
 
 
 
 
 
 
                                             78 
 
Table 22  
Summary Table for Model Predicting Minutes in Practice by Practice Schedule and 
Pretest at Delayed Posttest 
Parameter β SE t p 
Intercept (repetitive) 27.61 3.84 7.19 <.001 
 
Interleaved Schedule 
 
 
-0.08 
 
5.81 
 
-0.01 
 
.99 
Incremental Rehearsal Schedule 
 
81.14 5.67 14.31 <.001 
Pretest (Mean Centered) 
 
-0.37 0.21 -1.79 .07 
Pretest x Interleaved 
 
0.17 0.36 0.47 .63 
Pretest x Incremental Rehearsal -1.57 0.34 -4.52 <.001 
Adj. 𝑅2= .79,       𝐹5,66 = 55.37,      p=<.001 
 
 
Figure 10. Time in practice at immediate posttest 
 
 
                                             79 
 
 
Figure 11. Time in practice at delayed posttest 
 
Correct Targets per 20 Minutes of Practice 
 
 There was a significant effect of practice schedule in both the immediate posttest 
and delayed posttest models.  In both cases correct responses per 20 minutes of practice 
were similar for repetitive and interleaved practice, and were much lower for incremental 
rehearsal practice.  See Tables 23 and 24 for summaries of model estimates.  A large 
amount of variance was accounted for by the models at both immediate and delayed 
posttests (. 𝑅2= .51 and  𝑅2= .37 respectively).   
 
 
                                             80 
 
Table 23  
Summary Table for Model Predicting Targets Learned per 20 Minutes of Practice by 
Practice Schedule and Pretest at Immediate Posttest 
Parameter β SE t p 
Intercept (repetitive) 1.86 0.13 14.06 <.001 
 
Interleaved Schedule 
 
 
0.07 
 
0.20 
 
0.37 
 
.71 
Incremental Rehearsal Schedule 
 
-1.35 0.21 -6.51 <.001 
Pretest (Mean Centered) 
 
0.03 0.01 4.00 <.001 
Pretest x Interleaved 
 
0.03 0.01 2.50 .01 
Pretest x Incremental Rehearsal -.018 0.01 -1.27 .21 
Adj. 𝑅2= .51,       𝐹5,111 = 25.03,      p=<.001 
 
 
Table 24  
Summary Table for Model Predicting Targets Learned per 20 Minutes of Practice by 
Practice Schedule and Pretest at Delayed Posttest 
Parameter β SE t p 
Intercept (repetitive) 1.48 0.17 8.74 <.001 
 
Interleaved Schedule 
 
 
0.31 
 
0.26 
 
1.23 
 
.22 
Incremental Rehearsal Schedule 
 
-1.13 0.25 -4.56 <.001 
Pretest (Mean Centered) 
 
0.03 0.01 3.49 <.001 
Pretest x Interleaved 
 
-0.02 0.02 -1.44 .15 
Pretest x Incremental Rehearsal -0.02 0.02 -1.42 .16 
Adj. 𝑅2= .37,       𝐹5,111 = 9.31,      p=<.001 
 
 
                                             81 
 
 The large difference in the time it takes to complete each practice schedule has an 
important influence on the efficiency of the schedules.  It is clear from both the model 
outputs, and from Figures 12 and 13 that interleaved and repetitive practice schedules are 
much more efficient than incremental rehearsal schedules.  In the figures, the solid line 
represents the median while the dashed line is the mean.  For students in the lower 
quartile of pretest scores, the three schedules are the most even at the immediate posttest.  
However, as student ability at pretest increases, the gap between incremental rehearsal 
and the other schedules widens.  Also, the variability in post-test scores decreases, and 
the predicted effects interleaved practice begin to separate and look more efficient than 
repetitive practice. 
 At the delayed posttest, interleaved practice appears to have a retention benefit 
that is larger among students who scored poorly on the pretest, and as student ability 
increases, repetitive practice begins to look more like interleaved practice when learning 
math facts.  In general, the number of targets learned per 20 minutes of practice is quite 
similar between interleaved and repetitive practice.  The similarity between repetitive and 
interleaved practice with respect to efficiency is interesting in that it appears to run 
counter to much of the established literature on the effects of interleaving practice 
(Magill & Hall, 1990; Rohrer et al., 2014; Rohrer & Taylor, 2007; Shea et al., 1990).  As 
with the total items correct outcome, it would be interesting to see if there would be some 
more differentiation if there were more potential targets to learn. 
 
                                             82 
 
 
Figure 12. Number of Targets Learned per 20 Minutes of Practice 
 
 
Figure 13. Number of targets learned per 20 minutes of practice. 
 
                                             83 
 
 
Model Checking for Research Question 3 
Residuals for these models were approximately symmetrical and normally 
distributed.  Table 25 shows descriptive statistics for all six model residuals.  Overall 
assumptions for normal distribution of model residuals appear acceptable.  See Figure 14 
for density plots of residuals for all six models graphed against a normal distribution and 
Figure 15 for a scatter plot of the residuals with a loess smoother.  These figures indicate 
that the residuals are approximately normally distributed and that a linear model is 
appropriate.  Model residuals for the models predicting time in practice are more 
leptokurtic than is ideal; however, all models were generally symmetrical, and did not 
show any problematic skew. 
Table 25 
Descriptive Statistics for Research Question 3 Model Residuals 
Model N Mean SD Median Skew Kurtosis 
Targets Correct       
Immediate 117 0 .87 .23 -.19 -1.08 
Delayed 72 0 1.06 -.03 -.16 -1.1 
Time in Practice       
Immediate 117 0 18.46 .33 .56 3.88 
Delayed 72 0 19.21 1.49 1.01 4.29 
Targets Correct 
per 20 minutes 
of Practice 
      
Immediate 117 0 .89 -.06 .69 2.31 
Delayed 72 0 .85 -.02 .15 .23 
 
 
84 
 
 
 
  
  
Figure 14. Density plot of residuals from models used to address research question 3. 
                                              85 
 
 
 
Figure 15. Scatter plot of residuals from models used to address research question 3 with a loess smoother
86 
 
Social Validity Measure and Observations 
 The survey that was distributed was returned with little variation and a negative 
skew in the responses.  See Table 26 for descriptive statistics of both survey questions.  
Both questions were asked after the immediate posttest for a practice schedule.  
Questions were framed on a Likert type scale ranging from 1 (Not at all helpful/fun) to 7 
(Extremely helpful/fun).  Based on the mean rating for each survey question, students 
tended to find the practice to be helpful and fun across all practice schedules.  See Figure 
9 and 10 for histograms of the proportion of responses in each of the 7 response 
categories for each question.  For both survey items more than 70% of the responses were 
5 or higher. Linear models were used to look for differences in student responses 
predicted by practice schedule.  The models did not account for significant variance in 
student responses (p = .28 and p = .94 for helpful and fun respectively). 
Table 26 
Descriptive Statistics for Survey Questions 
 N Mean SD Median Skew Kurtosis 
Helpful 112 5.79 1.82 7 -1.58 1.46 
Fun 112 5.51 2.07 7 -1.18 -.02 
 
 
  87 
 
 
Figure 16. Proportion of responses in each of seven responses for first survey question: “How 
helpful was this practice?” 
 
Figure 17. Proportion of responses in each of seven responses for second survey question: “How 
fun was this practice?” 
 
  88 
 
Anecdotally, students generally enjoyed the math practice activity.  They tended to be 
on-task and cooperative throughout data collection.  Some students liked the activity less 
than others did.  On occasion some students would scream and hide under desks to avoid 
participating on a given day.  This kind of avoidance behavior appeared to be more 
typical during incremental rehearsal practice despite survey results and was fairly rare.  
Teachers generally had positive things to say about the program and what they saw their 
students doing.  Specifically, teachers commented on the high level of engagement (in 
general) and the need for more math fact drill practice.  Throughout the two months of 
the experiment, teachers typically continued their lessons as scheduled, but also took 
several opportunities to watch their students engaging with the intervention.  They 
reported that they liked the interface, speed of presentation, and feedback mechanisms. 
89 
 
Chapter 5 
 
DISCUSSION 
 
Goal and Design 
Learning math facts is an important component of mathematics preparation 
(National Research Council, 2001).  To ensure that students have opportunities to 
practice math facts in the most efficient and effective way, teachers need to know how to 
optimize this practice time.  Researchers have demonstrated that the distribution of 
practice is superior to massed practice (Cepeda et al., 2006; Pashler, Bain, et al., 2007; 
Pashler, Rohrer, et al., 2007).  Further, interleaving learning targets has been shown to 
convey benefits to retention beyond what would be expected from the distribution 
inherent in the schedule (Kang & Pashler, 2012; Lee & Magill, 1983; Magill & Hall, 
1990; Rohrer, 2012; Taylor & Rohrer, 2010).  
The goal of this project was two-fold:  First, to determine whether a promising 
practice schedule (interleaved practice), which has been shown to be effective for motor 
learning (Magill & Hall, 1990), generalizes to a skill such as math facts; and second, to 
compare that promising schedule to a dosage control schedule (repetitive practice) and to 
a practice schedule that has more support in the literature and use in schools (incremental 
rehearsal).  This project is the first experimental comparison of interleaved and 
incremental rehearsal practice, and one of only a handful to examine interleaved practice 
in an academic context. 
 The design of this study was counterbalanced and within subjects.  Students were 
exposed to multiple practice schedules across practice bundles.  The order of schedule 
  90 
 
presentation was randomized, and a direct comparison of schedules was possible without 
being obscured by order effects.  Analyses included logistic regression, logistic mixed-
effects regression, and linear regression.  In the following sections, I review each research 
question along with a brief statement about the results.  Following a review of the 
research questions are a discussion about the limitations of the study, implications for 
practice, and a discussion of the potential for future research. 
Research Question 1 
Question:  How is acquisition of target math facts influenced by practice schedule 
(repetitive, incremental rehearsal, and interleaved)?   
Hypothesis: Likelihood of a correct response will increase at a faster rate for the 
repetitive schedule, but will asymptote over the course of several sessions.  Likelihood of 
a correct response will increase the next most quickly for incremental rehearsal.  
interleaved practice will yield the slowest change.  Repetitive practice has demonstrated 
a link with fast acquisition across the literature.   
 Across all three schedules, accuracy increased towards an asymptote relatively 
quickly and was maintained throughout practice.  There was less deviation between 
practice schedules than was predicted in the hypothesis.  The acquisition pattern for 
repetitive practice was consistent with what is seen in motor learning and academic skills 
literature (Magill & Hall, 1990; Taylor & Rohrer, 2010). Accuracy for both incremental 
rehearsal and interleaved practice closely mirrored the pattern seen in repetitive practice.  
This finding is not congruent with what is typically seen with interleaved practice in the 
motor learning literature, but is not very different from studies of novel symbol and letter 
  91 
 
writing (Ste-Marie et al., 2004).  It may be that the similarity between memorizing and 
reproducing letter symbols and memorizing and responding to single digit math problems 
have characteristics that serve to moderate the acquisition benefit typically found in 
repetitive schedules.  It may also be that the binary nature of the outcomes in this study 
are not sensitive enough to capture any differences.  More research is needed to 
determine if the similarity of acquisition across schedules is an artifact of random error or 
the nature of the task chosen for the experiment. 
Research Question 2 
Question:  Does retention of target math facts differ by practice schedule?   
Hypothesis: Likelihood of a correct response at retention trials will be highest for 
incremental rehearsal and interleaved practice and will be almost indistinguishable 
between the two. 
 The difference in retention rates between the interleaved and incremental 
rehearsal conditions among the lower scoring students was an important finding of this 
study. As these schedules have not been compared before, it was difficult to predict the 
relative effects of the two schedules, and the difference in the averaged odds ratios when 
compared to the referent group (1.21 for interleaved and .89 for incremental rehearsal) 
indicates that, efficiency aside, interleaved practice can be a more effective way to learn 
math facts than incremental rehearsal. Particularly in the lowest quartile, incremental 
rehearsal showed lower retention accuracy predicted than repetitive practice.  Among 
students who scored average and above on the pretest, the results reflected the 
hypothesis. Predictions for accuracy for both interleaved and incremental rehearsal 
  92 
 
conditions were quite similar, with the repetitive condition performing much worse.  This 
finding is consistent with the extant literature that indicates that distributed and 
interleaved practice tends to lead to improved retention outcomes compared to repetitive 
practice (Benjamin & Tullis, 2010; Magill & Hall, 1990; Varma & Schleisman, 2014). 
Research Question 3 
Question:  Does efficiency of learning differ by practice schedule in terms of time 
investment per math fact? 
Hypothesis: Interleaved practice should be associated with a much better efficiency rate 
than incremental rehearsal. 
 In general, incremental rehearsal and interleaved practice were associated with 
better retention outcomes than repetitive practice, which is consistent with the bulk of the 
literature available (Joseph, 2006; Magill & Hall, 1990; Rohrer et al., 2014; Rohrer & 
Taylor, 2007; J. B. Shea & Morgan, 1979; Taylor & Rohrer, 2010; Varma & Schleisman, 
2014).  The largest difference between the practice conditions emerges in the results from 
the analysis related to learning efficiency.  The mean targets answered correctly per 20 
minutes of practice was about 1.86 for the repetitive condition, 1.93  for the interleaved 
condition, and approximately .52 for the incremental rehearsal condition at the immediate 
posttest, and approximately 1.48, 1.78, and .34 at the delayed posttest.  Ability, as 
measured by pretest score, also seemed to have an impact on efficiency.  These 
differences are not artifacts of accuracy alone, as indicated by the models that predict the 
number of targets answered correctly regardless of time, but are a result of the 
  93 
 
incremental rehearsal schedule taking approximately 80 minutes longer than the 
repetitive or interleaved schedules.   
These results generally follow the predictions made in the hypothesis, with the 
notable exception that retention for students in the repetitive condition appeared to be 
closer to retention in the other schedules than might be expected.  Three potential 
explanations for the lack of differentiation between schedules come to mind.  First, it may 
be that the number of practice trials at the posttest was sufficient to retain the target skills 
regardless of schedule.  The results from research question two that model data collected 
throughout the experiment certainly seem to indicate that there is a tangible difference 
between the three schedules on the accuracy of students during retention trial.  Second, 
practice was delivered over six sessions that were at least a day apart.  It may be that 
there was sufficient distribution of practice to impart some extra retention benefit to 
students engaged in a repetitive schedule.  Third, it may be that the ceiling of three 
potential correct responses per practice bundle at each pretest did not allow for the 
demonstration of a substantial difference. 
There is no precedent in the literature for comparing the efficiency of interleaved and 
incremental rehearsal practice. The result that interleaved practice is more efficient than 
incremental rehearsal was expected given that the logistics of incremental rehearsal 
necessitate a longer schedule.  While researchers have examined differing efficiency 
between different ratios of incremental rehearsal schedules (Swehla et al., 2016), this is 
the first study that compares efficiency for incremental rehearsal to interleaved practice. 
The discovery of the efficiency benefit for interleaved practice over incremental rehearsal 
  94 
 
is an important addition to the literature as well as being directly applicable for teachers 
in the classroom. 
Limitations 
 Findings of this study should be interpreted in light of the following limitations. 
First, the sample recruited for this project was not diverse nor representative of students 
from the broader population of interest (mid-elementary).  The students who participated 
were a very specific subset of the general population: African-American students in an 
urban charter school setting, many of whom were on free and reduced lunch plans.  The 
field would benefit from similar experiments conducted with more diverse participant 
samples to determine if the findings replicate. 
 Second, dosage was not controlled as tightly as originally planned.  Due to the 
limitations of the technology used, students with interrupted sessions were restarted 
within the session and had more trials than others.  Fortunately, the practice opportunity 
variable accounted for dosage in the model and allowed for comparison controlling for 
dosage. 
 Third, it was not possible to run the desired model type for the acquisition data.  
Technically, clustered data with a binary response should be modeled with logistic 
mixed-effects regression.  Logistic mixed-effects models would not converge.  Because 
technical best practices are important (Odenkirk & Ervin, 2000), descriptive statistics 
were used to provide some insight about the influence of practice schedules rather than 
reporting suspect models or violating assumptions of independence.  A future analysis 
might define acquisition differently to take advantage of the more appropriate modeling 
  95 
 
technique.  For example, a researcher might model reaction times for acquisition 
responses and use accuracy as a predictor variable. 
 Finally, this study used single digit addition and multiplication facts as the 
learning targets.  From the results obtained, how different practice schedules might 
perform within this specific context is now clearer, but it is unclear whether these 
findings would generalize to other academic skills.  This study is not intended to provide 
a comprehensive statement on the appropriateness of an interleaved scheduled when 
compared to incremental rehearsal or repetitive practice in all practice situations.  It 
simply provides another piece of evidence that suggests the potential for interleaved 
practice to help optimize academic temporal resources, particularly in drill and 
memorization scenarios.  Further research is warranted to continue to explore the 
conditions in which these results hold true, and what mechanisms might drive practice 
decisions in different scenarios. 
Implications for Practice 
Within the context of providing practice opportunities for students to learn single 
digit math facts, the results of this study can be used to derive some clear expectations for 
practitioners.  First, both incremental rehearsal and interleaved practice would be 
expected to lead to high predicted rates of accuracy over similar timelines.  Second, 
predicted accuracy for those schedules is higher than for repetitive practice.  Thus, not all 
practice is equal.  Third, for the same outcome, interleaved practice can be implemented 
for less than one third of the temporal resources needed for incremental rehearsal.  Thus, 
  96 
 
all things considered, practitioners might strongly consider using interleaved practice to 
support single-digit math fact learning. 
As instructional technology is developed to drill math facts, creators can start 
tailoring tools to maximize retention for a given amount of instructional time.  
Developers of instructional software educational games and intervention tools can 
leverage the results of this study by organizing practice in a way that interleaves learning 
targets. Further, although this project was implemented with a technology component, the 
same principles can be applied with low-tech instructional tools.  A teacher could 
administer a pretest and, based on the results, give each student a set of three unknown 
math fact flash cards.  Approximately two minutes of individual practice per day would 
closely replicate the conditions of this study, with a high probability of leading to long-
term retention of the target facts. 
Implications for Future Research 
 The study described in this paper can be used as a genesis for both further 
analyses of the data collected, as well as for future studies.  As with most research, this 
study has highlighted some promising results that require further investigation to fully 
illuminate the potential costs, benefits, and implications of the constructs examined. 
Data collected throughout this experiment include reaction times and responses to 
non-target items, both within incremental rehearsal practice and at pretests and posttests.  
These data could be used to examine the relation between accuracy and reaction time 
with a potential of creating a metric for mastery that reflects both.  It may also be worth 
  97 
 
looking at skill transfer by using student responses to non-target items in the pretest and 
posttests.  Does practice of math facts lead to increased accuracy of non-practiced facts 
that use some of the same numbers?  Do skills transfer beyond similar facts?  Does 
number of facts retained at the first posttest predict scores on the next pretest, or future 
learning rates?  How does practice condition influence these relationships? 
  The results described above also create a jumping off point for future studies.  
Direct replications with different sample populations, or target skills could continue the 
generalization of interleaved practice research results and perhaps outline boundaries of 
effectiveness.  Extending research to include different dimensions of interleaving are also 
natural avenues for further examination.  This study interleaved individual targets within 
a very specific skill.  Could there be a benefit to interleaving between operations or 
different mathematical processes entirely?   
A novel line of research could look at the blending of practice schedules.  Is it 
possible that practice that starts repetitive (or incrementally rehearsed) and shifts to 
interleaved works over a broader range of students?  Could student characteristics dictate 
which schedule or blend of schedules should be used or what dimension is interleaved? 
Practice schedule manipulation and practice optimization is a fertile field for 
future follow-up.  There is great potential along these lines of inquiry to improve 
classroom outcomes both related to absolute learning outcomes and regarding learning 
efficiency.   
 
  98 
 
Conclusion 
 The purpose of this study was to compare an interleaved practice schedule to an 
established practice technique and a dosage control along the dimensions of acquisition 
accuracy, retention accuracy, and learning efficiency.  Results support the use of 
interleaved practice in single digit math fact drill practice as a method that leads to a high 
probability of accuracy in retention trials while conserving temporal resources.  This 
study has taken steps to generalize previous findings related to interleaved practice to this 
specific domain.  It has also introduced a novel comparison into the scientific corpus.  
Future research should continue efforts of generalization and also dive deeper into the 
mechanisms of interleaving that may allow for further differentiation and utility. 
 
99 
 
REFERENCES 
 
Anderson, D. (2008). Model based inference in the life sciences : a primer on evidence. 
New York; London: Springer. 
Anderson, D. I., Magill, R. a, & Sekiya, H. (2001). Motor Learning as a Function of KR 
Schedule and Characteristics of Task-Intrinsic Feedback. Journal of Motor 
Behavior, 33(1), 59–66. https://doi.org/10.1080/00222890109601903 
Benjamin, A. S., & Tullis, J. (2010). What makes distributed practice effective? 
Cognitive Psychology, 61(3), 228–247. 
https://doi.org/10.1016/j.cogpsych.2010.05.004 
Birnbaum, M. S., Kornell, N., Bjork, E. L., & Bjork, R. a. (2013). Why Interleaving 
Enhances Inductive Learning: The Roles of Discrimination and Retrieval. Memory 
& Cognition, 41(3), 392–402. https://doi.org/10.3758/s13421-012-0272-7 
Blandin, Y., Proteau, L., & Alain, C. (1994). On the cognitive processes underlying 
contextual interference and observational learning. Journal of Motor Behavior, 
26(1), 18–26. https://doi.org/10.1080/00222895.1994.9941657 
Blasiman, R. N. (2017). Distributed Concept Reviews Improve Exam Performance. 
Teaching of Psychology, 44(1), 46–50. https://doi.org/10.1177/0098628316677646 
Booth, J. L., Cooper, L. A., Donovan, M. S., Huyghe, A., Koedinger, K. R., & Paré-
Blagoev, E. J. (2015). Design-Based Research Within the Constraints of Practice: 
AlgebraByExample. Journal of Education for Students Placed at Risk (JESPAR), 
  100 
 
20(1–2), 79–100. https://doi.org/10.1080/10824669.2014.986674 
Broadbent, D. P., Causer, J., Ford, P. R., & Mark Williams,  a. (2014). Contextual 
Interference Effect in Perceptual-Cognitive Skills Training. Medicine and science in 
sports and exercise. https://doi.org/10.1249/MSS.0000000000000530 
Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: 
A Practical Information-Theoretic Approach Second Edition (Second). New York: 
Springer. 
Burnham, K. P., Anderson, D. R., & Huyvaert, K. P. (2011). AIC model selection and 
multimodel inference in behavioral ecology: some background, observations, and 
comparisons. Behavioral Ecology and Sociobiology, 65(1), 23–35. Retrieved from 
http://journals.sagepub.com/doi/10.1177/0888406417730112 
Burns, M. K. (2005). Using Incremental Rehearsal to Increase Fluency of Single-Digit 
Multiplication Facts With Children Identified as Learning Disabled in Mathematics 
Computation. Education and Treatment of Children, 28(3), 237–249. Retrieved 
from 
https://pantherfile.uwm.edu/dermer/public/courses/620/Articles/peter_platten_first.p
df 
Carpenter, S. K., Cepeda, N. J., Rohrer, D., Kang, S. H. K., & Pashler, H. (2012). Using 
Spacing to Enhance Diverse Forms of Learning: Review of Recent Research and 
Implications for Instruction. Educational Psychology Review, 24(3), 369–378. 
https://doi.org/10.1007/s10648-012-9205-z 
  101 
 
Carr, M., & Alexeev, N. (2011). Fluency, accuracy, and gender predict developmental 
trajectories of arithmetic strategies. Journal of Educational Psychology, 103(3), 
617–631. https://doi.org/10.1037/a0023864 
Carter, C. E., & Grahn, J. A. (2016). Optimizing music learning: Exploring how blocked 
and interleaved practice schedules affect advanced performance. Frontiers in 
Psychology, 7(AUG), 1–10. https://doi.org/10.3389/fpsyg.2016.01251 
Carvalho, P. F., & Goldstone, R. L. (2014a). Effects of Interleaved and Blocked Study on 
Delayed Test of Category Learning Generalization. Frontiers in Psychology, 
5(AUG). https://doi.org/10.3389/fpsyg.2014.00936 
Carvalho, P. F., & Goldstone, R. L. (2014b). Putting category learning in order: Category 
structure and temporal arrangement affect the benefit of interleaved over blocked 
study. Memory and Cognition, 42, 481–495. https://doi.org/10.3758/s13421-013-
0371-0 
Carvalho, P. F., & Goldstone, R. L. (2015). The benefits of interleaved and blocked 
study: different tasks benefit from different schedules of study. Psychonomics 
Bulletin Review, 22, 281–288. 
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed 
Practice in Verbal Recall Tasks : A Review and Quantitative Synthesis. 
Psychological Bulletin, 132(3), 354–380. https://doi.org/10.1037/0033-
2909.132.3.354 
Codding, R. S., Archer, J., & Connell, J. (2010). A systematic replication and extension 
  102 
 
of using incremental rehearsal to improve multiplication skills: An investigation of 
generalization. Journal of Behavioral Education, 19(1), 93–105. 
https://doi.org/10.1007/s10864-010-9102-9 
Committee, N. R. C. & M. L. S. (2001). Adding It Up. National Academies Press. 
https://doi.org/10.17226/9822 
Cooper, E. H., & Pantle, A. J. (1967). THE TOTAL-TIME HYPOTHESIS IN VERBAL 
LEARNING. Psychological Bulletin, 68(4), 221–234. 
Desmottes, L., Maillart, C., & Meulemans, T. (2017). Mirror-drawing skill in children 
with specific language impairment: Improving generalization by incorporating 
variability into the practice session. Child Neuropsychology, 23(4), 463–482. 
https://doi.org/10.1080/09297049.2016.1170797 
Faraway, J. (2005). Linear models in R, 56(5). 
Faraway, J. (2016). Extending the Linear Model with R Generalized Linear, Mixed 
Effects and Nonparametric Regression Models (Second). Boca Raton: CRC Press. 
Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses 
using G*Power 3.1: tests for correlation and regression analyses. Behavior Research 
Methods, 41(4), 1149–1160. https://doi.org/10.3758/BRM.41.4.1149 
Fishman, E. J., Keller, L., & Atkinson, R. C. (1968). Massed versus distributed practice 
in computerized spelling drills. Journal of Education & Psychology, 59(4), 290–296. 
https://doi.org/10.1037/h0020055 
  103 
 
Fuchs, L. S., Fuchs, D., & Malone, A. S. (2017). The Taxonomy of Intervention 
Intensity. Teaching Exceptional Children, 50(1), 35–43. 
https://doi.org/10.1177/0040059917703962 
Geary, D. C. (2005). Role of cognitive theory in the study of learning disability in 
mathematics. Journal of Learning Disabilities, 38(4), 305–307. 
https://doi.org/10.1177/00222194050380040401 
Gersten, R., Beckmann, S., Clarke, B., Foegen, A., Marsh, L., Star, J. R., & Witzel, B. 
(2009). Assisting Students Struggling with Mathematics:Response to Intervention 
(RtI) for elementary and middle schools. What Works Clearinghouse, 1–98. 
https://doi.org/10.1016/j.jhazmat.2011.04.026 
Gettinger, M., Bryant, N. D., & Fayne, H. R. (1982). Designing Spelling Instruction for 
Learning-Disabled Children: An Emphasis on Unit Size, Distributed Practice, and 
Training for Transfer. The Journal of Special Education, 16(4), 439–448. 
https://doi.org/10.1177/002246698201600407 
Guadagnoli, M. a, & Lee, T. D. (2004). Challenge Point: a Framework for 
Conceptualizing the Effects of Various Practice Conditions in Motor Learning. 
Journal of Motor Behavior, 36(2), 212–224. 
https://doi.org/10.3200/JMBR.36.2.212-224 
Hausman, H., & Kornell, N. (2014). Mixing topics while studying does not enhance 
learning. Journal of Applied Research in Memory and Cognition, 3(3), 153–160. 
https://doi.org/10.1016/j.jarmac.2014.03.003 
  104 
 
Healy, A. F., Kole, J. A., & Bourne, L. E. (2014). Training principles to advance 
expertise. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2014.00131 
Hilbe, J. M. (2009). Logistic Regression Models (1st ed.). New York: Chapman and 
Hall/CRC. https://doi.org/https://doi.org/10.1201/9781420075779 
Joseph, L. M. (2006). Incremental Rehearsal: A Flashcard Drill Technique for Increasing 
Retention of Reading Words. The Reading Teacher, 59(8), 803–807. 
https://doi.org/10.1598/RT.59.8.8 
Kang, S. H. K., & Pashler, H. (2012). Learning Painting Styles : Spacing is 
Advantageous when it Promotes Discriminative Contrast. Applied Cognitive 
Psychology, 103(May 2011), 97–103. 
Kornell, N., & Bjork, R. a. (2008). Learning concepts and categories: is spacing the 
“enemy of induction”? Psychological Science, 19(6), 585–592. 
https://doi.org/10.1111/j.1467-9280.2008.02127.x 
Kornell, N., Castel, A. D., Eich, T. S., & Bjork, R. a. (2010). Spacing as the friend of 
both memory and induction in young and older adults. Psychology and Aging, 25(2), 
498–503. https://doi.org/10.1037/a0017807 
Kulasegaram, K., Min, C., Howey, E., Neville, A., Woods, N., Dore, K., & Norman, G. 
(2015). The mediating effect of context variation in mixed practice for transfer of 
basic science. Advances in Health Sciences Education : Theory and Practice, 20(4), 
953–968. https://doi.org/10.1007/s10459-014-9574-9 
  105 
 
Landin, D., & Hebert, E. P. (1997). Comparison of Three Practice Schedules along the 
Contextual Interference Continuum. Research Quarterly for Exercise and Sport, 
68(4), 357–361. https://doi.org/10.1080/02701367.1997.10608017 
Lee, T. D., & Magill, R. a. (1983). The Locus of Contextual Interference in Motor-Skill 
Acquisition. Journal of Experimental Psychology: Learning, Memory, and 
Cognition, 9(4), 730–746. https://doi.org/10.1037//0278-7393.9.4.730 
Long, J. D. (2012). Longitudinal Data Analysis for the Behavioral Sciences Using R. Los 
Angeles: Sage. 
MacQuarrie, L. L., Tucker, J. a., Burns, M. K., & Hartman, B. (2002). Comparison of 
Retntion Rates Using Traditional, Drill Sanwich, and Incremental Rehearsal Flash 
Card Methods. School Psychology Review. 
Magill, R. A., & Hall, K. G. (1990). A Review of the Contextual Interference Effect in 
Motor Skill Acquisition. Human Movement Science, 9, 241–289. 
Mitchell, C., Nash, S., & Hall, G. (2008). The intermixed-blocked effect in human 
perceptual learning is not the consequence of trial spacing. Journal of Experimental 
Psychology: Learning, Memory, and Cognition, 34(1), 237–242. 
https://doi.org/10.1037/0278-7393.34.1.237 
Morehead, K., Rhodes, M. G., & DeLozier, S. (2016). Instructor and student knowledge 
of study strategies. Memory, 24(2), 257–271. 
https://doi.org/10.1080/09658211.2014.1001992 
  106 
 
National Mathematics Advisory Panel. (2008). The Final Report of the National 
Mathematics Advisory Panel. Foundations, 37(9), 595–601. 
https://doi.org/10.3102/0013189X08329195 
Odenkirk, B. (Writer), & Ervin, M. (Director).  (2000, April 2).  How Hermes 
 Requisitioned His Groove Back. [Television Series Episode] In D. Cohen, M. 
 Groening & C. Katz (Producers), Futurama. Los Angeles. 
Ostrow, K., Heffernan, N., Heffernan, C., & Peterson, Z. (2015). Blocking vs. 
Interleaving: Examining Single-Session Effects Within Middle School Math 
Homework. In International Conference on Artificial Intelligence in Education (pp. 
338–347). https://doi.org/10.1007/978-3-319-19773-9 
Pashler, H., Bain, P. M., Bottge, B. A., Graesser, A., Koedinger, K., McDaniel, M., & 
Metcalfe, J. (2007). Organizing Instruction and Study to Improve Student Learning. 
US Department of Education National Center for Education Research. 
Pashler, H., Rohrer, D., Cepeda, N., & Carpenter, S. (2007). Enhancing learning and 
retarding forgetting : Choices and consequences. Psychonomic Bulletin & Review, 
14(2), 187–193. 
Pollatou, E., Kioumourtzoglou, E., Agelousis, N., & Mavromatis, G. (1997). Contextual 
Interference Effects in Learning Novel Motor Skills. Perceptual and Motor Skills, 
84, 487–496. 
Powell, S. R., Fuchs, L. S., & Fuchs, D. (2013). Reaching the mountaintop: Addressing 
the common core standards in mathematics for students with mathematics 
  107 
 
difficulties. Learning Disabilities Research and Practice, 28(1), 38–48. 
https://doi.org/10.1111/ldrp.12001 
Rau, M. A., Aleven, V., & Rummel, N. (2013). Interleaved practice in multi-dimensional 
learning tasks: Which dimension should we interleave? Learning and Instruction, 
23(1), 98–114. https://doi.org/10.1016/j.learninstruc.2012.07.003 
Rau, M., Aleven, V., & Rummel, N. (2010). How to Schedule Multiple Graphical 
Representations? A Classroom Experiment with an Intelligent Tutoring System for 
Fractions. In Intelligent Tutoring Systems (pp. 413–422). 
Rau, M., Aleven, V., & Rummel, N. (2013). How to use multiple graphical 
representations to support conceptual learning? research-based principles in the 
fractions tutor. Lecture Notes in Computer Science (Including Subseries Lecture 
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7926 LNAI, 
762–765. https://doi.org/10.1007/978-3-642-39112-5-107 
Rau, M., Aleven, V., Rummel, N., & Pardos, Z. (2014). How Should Intelligent Tutoring 
Systems Sequence Multiple Graphical Representations of Fractions? A Multi-
Methods Study. International Journal of Artificial Intelligence in Education, 24(2), 
125–161. https://doi.org/10.1007/s40593-013-0011-7 
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and 
Data Analysis Methods. SAGE Publications. Retrieved from 
https://books.google.com/books?id=uyCV0CNGDLQC&pgis=1 
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchex, J.-C., & Muller, M. 
  108 
 
(2011). pROC : an open-source package for R and S+ to analyze and compare ROC 
curves. BMC Bioinformatics. 
Rohrer, D. (2012). Interleaving Helps Students Distinguish among Similar Concepts. 
Educational Psychology Review, 24(3), 355–367. https://doi.org/10.1007/s10648-
012-9201-3 
Rohrer, D., Dedrick, R., & Burgess, K. (2014). The benefit of interleaved mathematics 
practice is not limited to superficially similar kinds of problems. Psychonomic 
Bulletin & Review. Retrieved from http://link.springer.com/article/10.3758/s13423-
014-0588-3 
Rohrer, D., Dedrick, R. F., & Stershic, S. (2015). Interleaved practice improves 
mathematics learning. Journal of Educational Psychology, 107(3), 900–908. 
https://doi.org/10.1037/edu0000001 
Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves 
learning. Instructional Science, 35(6), 481–498. https://doi.org/10.1007/s11251-007-
9015-8 
Sana, F., Yan, V. X., & Kim, J. A. (2017). Study sequence matters for the inductive 
learning of cognitive concepts. Journal of Educational Psychology, 109(1), 84–98. 
https://doi.org/10.1037/edu0000119 
Schmidt, R. A., & Lee, T. D. (2011). Motor control and learning : a behavioral 
emphasis. Human Kinetics. 
  109 
 
Schutte, G. M., Duhon, G. J., Solomon, B. G., Poncy, B. C., Moore, K., & Story, B. 
(2015). A Comparative Analysis of Massed vs Distributed Practice on Basic Math 
Fact Fluency Growth Rates. Journal of School Psychology, 53, 149–159. 
Shea, C. H., Kohl, R., & Indermill, C. (1990). Contextual interference: Contributions of 
practice. Acta Psychologica, 73(2), 145–157. https://doi.org/10.1016/0001-
6918(90)90076-R 
Shea, J. B., & Morgan, R. L. (1979). Contextual interference effects on the acquisition, 
retention, and transfer of a motor skill. Journal of Experimental Psychology: Human 
Learning & Memory, 5(2), 179–187. https://doi.org/10.1037//0278-7393.5.2.179 
Simon, D. A., Lee, T. D., & Cullen, J. D. (2008). Win-shift, lose-stay: contingent 
switching and contextual interference in motor learning. Percept Mot Skills, 107(2), 
407–418. https://doi.org/10.2466/pms.107.2.407-418 
Sorensen, L. J., & Woltz, D. J. (2016). Blocking as a friend of induction in verbal 
category learning. MEMORY & COGNITION, 44(7), 1000–1013. 
https://doi.org/10.3758/s13421-016-0615-x 
Spybrook, J., Bloom, H., Congdon, R., Hill, C., Martinez,  a, & Raudenbush, S. (2011). 
Optimal design for longitudinal and multilevel research: Documentation for the 
“Optimal Design” software. Survey Research …, 1–215. 
https://doi.org/10.1037/h0065543 
Stambaugh, L. a. (2011). When Repetition Isn’t the Best Practice Strategy: Effects of 
Blocked and Random Practice Schedules. Journal of Research in Music Education, 
  110 
 
58(4), 368–383. https://doi.org/10.1177/0022429410385945 
Ste-Marie, D. M., Clark, S. E., Findlay, L. C., & Latimer, A. E. (2004). High levels of 
contextual interference enhance handwriting skill acquisition. Journal of Motor 
Behavior, 36(1), 115–126. https://doi.org/10.3200/JMBR.36.1.115-126 
Swehla, J., Burns, M., Zaslofsky, A., Hall, M., Varma, S., & Volpe, R. (2016). 
Examining the Use of Spacing Effect to Increase the Efficiency of Incremental 
Rehearsal. Psychology in the Schools, 53(4), 404–415. 
https://doi.org/10.1002/pits.21909 
Taylor, K., & Rohrer, D. (2010). The Effects of Interleaved Practice. Applied Cognitive 
Psychology, 848(July 2009), 837–848. https://doi.org/10.1002/acp 
Underwood, B. J. (1970). A breakdown of the total-time law in free-recall learning. 
Journal of Verbal Learning and Verbal Behavior, 9(5), 573–580. 
https://doi.org/10.1016/S0022-5371(70)80104-9 
Vakil, E., & Heled, E. (2016). The effect of constant versus varied training on transfer in 
a cognitive skill learning task: The case of the Tower of Hanoi Puzzle. Learning and 
Individual Differences, 47, 207–214. https://doi.org/10.1016/j.lindif.2016.02.009 
Varma, S., & Schleisman, K. B. (2014). The Cognitive Underpinnings of Incremental 
Rehearsal. School Psychology Review, 43(2), 222–228. 
What Works Clearinghouse. (2017). Standards Handbook (4th ed.). Princeton, NJ. 
Retrieved from http://ies.ed.gov/ncee/wwc 
  111 
 
Wulf, G., & Schmidt, R. a. (1997). Variability of practice and implicit motor learning. 
Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(4), 
987–1006. https://doi.org/10.1037//0278-7393.23.4.987 
Zulkiply, N., & Burt, J. S. (2013). The exemplar interleaving effect in inductive learning: 
moderation by the difficulty of category discriminations. Memory & Cognition, 
41(1), 16–27. https://doi.org/10.3758/s13421-012-0238-9 
112 
 
Appendices 
 
Appendix A: Candidate Models for Research Question 2 
Table A1 
Candidate Models for Research Question 2 
Model 
?̂?10 = 𝛾00 + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝑢0𝑗
+ 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅)
+ 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) +
𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒)+ 
𝛾80(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾90(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅)
+ 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) +
𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 
𝛾80(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝛾90(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) +
𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾80(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) +
𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾80(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾10 0(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) +
𝛾11 0(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂)+ 𝛾12 0(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂) +
𝛾13 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠)
+  𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝑢0𝑗 + 𝑟𝑖𝑗 
  113 
 
 
 
 
Table A1 (continued) 
Model 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠)
+ 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠)
+ 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅)
+ 𝛾60(𝐼𝐿 𝑥 𝑃𝑂
2) + 𝛾70(𝐼𝑅 𝑥 𝑃𝑂
2) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂
2) +
𝛾70(𝐼𝑅 𝑥 𝑃𝑂
2) + 𝛾80(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒)+ 𝛾90(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾10 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠)
+ 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅)
+ 𝛾60(𝐼𝐿 𝑥 𝑃𝑂
2) + 𝛾70(𝐼𝑅 𝑥 𝑃𝑂
2) + 𝛾80(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂
2) +
𝛾70(𝐼𝑅 𝑥 𝑃𝑂
2) + 𝛾80(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾90(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
2) +
𝛾10 0(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
2) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂
2) +
𝛾70(𝐼𝑅 𝑥 𝑃𝑂
2) + 𝛾80(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾10 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂
2) +
𝛾70(𝐼𝑅 𝑥 𝑃𝑂
2) + 𝛾80(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾10 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾11 0(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
2) +
𝛾12 0(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥𝑃𝑂
2)+ 𝛾13 0(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂
2) +
𝛾14 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂
2) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
  114 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠)  
+ 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3)
+ 𝑢0𝑗 + 𝑟𝑖𝑗 
Table A1 (continued) 
Model 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2)
+ 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2)
+ 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂
3)
+ 𝛾80(𝐼𝑅 𝑥 𝑃𝑂
3) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) +
𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂
3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂
3) +
𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒)+ 𝛾10 0(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾11 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2)
+ 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂
3)
+ 𝛾80(𝐼𝑅 𝑥 𝑃𝑂
3) + 𝛾90(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) +
𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂
3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂
3) +
𝛾90(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾10 0(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
3) + 𝛾110(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
3) +
𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) +
𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂
3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂
3) +
𝛾90(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾10 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾11 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) +
𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂
3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂
3) +
𝛾90(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾10 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾11 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾12 0(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
3) +
𝛾13 0(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥𝑃𝑂
2)+ 𝛾14 0(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂
3) +
𝛾15 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂
3) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗
+ 𝑟𝑖𝑗 
 
 
 
  115 
 
 
Table A1 (continued) 
Model 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗
+ 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 
𝛾40(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 
𝛾40(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾50(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30( 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂)+ 
𝛾40(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂) 
 
?̂?𝑖𝑗 = (𝛾00 + 𝑢10) + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠)
+ 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾30(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗
+ 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠)
+ 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾50(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
2) +
𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾50(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾60(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40( 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
2)+ 
𝛾50(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂
2) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠)
+ 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2)
+ 𝛾50(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
 
  116 
 
 
Table A1 (continued) 
Model 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠)
+ 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3)
+ 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) +
𝛾50(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾60(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
2) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) +
𝛾50(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾60(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) +
𝛾70(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 
 
?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) +
𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠
3) +
𝛾50( 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂
2)+ 𝛾60(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂
2) + 𝑢0𝑗 + 𝑟𝑖𝑗