Examination Of Three Practice Schedules for Single Digit Math A Dissertation SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA BY Kyle Wagner IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Dr. Kristen McMaster, Advisor September 2019 Copyright page Kyle B. Wagner 2019 © i Acknowledgements This dissertation represents an important waypoint in my own personal and professional adventure, and like many epic quests would have been doomed to failure if undertaken alone. I would like to offer my sincerest thanks to some of the guides companions, and fellow travelers who have helped me get here. First, my advisor, Kristen McMaster has been a better mentor than I could have imagined. She has shaped my approach to research and professional conduct, and has pushed me to expect more of myself and my work. I am a better researcher and professional because of her. Second, I would also like to thank my committee. Their input and support has been incredibly valuable throughout this project. This is a better dissertation because of their questions, recommendations, and time. Third, I would like to thank my wife, Hallie. Her example, reassurance, and support has kept me on track and focused. She has encouraged me to follow my passion, and has helped me build confidence in my strengths. Along with my wife, my parents have been in my corner and incredibly understanding of a son who can’t seem to avoid going back to school just one more time. Fourth, this project owes a lot to the team at FastBridge. Their software was the backbone of this design and allowed me to focus on the science rather than burying myself in “how to” books for software design. Finally, I need to thank and acknowledge all of my family, friends, colleagues, professors, office staff, and everyone who has supported me and contributed to who I am. My time in this program has been amazing. I have grown more than I could have imagined, and I couldn’t have done it without help from all of you. ii Dedication This dissertation is dedicated to my daughters, Penelope and Minerva. Follow your passion. Thank you for inspiring me to follow mine. iii Abstract The primary goal of this project is to expand and generalize the literature base for interleaved practice. This study compares interleaved practice to repetitive practice and incremental rehearsal within the context of learning single digit math facts. Third grade (n = 34) and fourth grade (n = 40) students learned target single digit math facts in one of three practice schedules. Using a within-subjects counterbalanced and crossed design, students were exposed to three different learning conditions. Comparisons were made regarding accuracy of responses during acquisition trials and retention trials, as well as learning efficiency. Results indicated very few differences between practice conditions regarding acquisition accuracy, increased accuracy during retention trials for interleaved and incremental rehearsal practice, and higher learning efficiency for interleaved practice when compared to incremental rehearsal. Student pretest accuracy moderated effects of practice schedule and opportunities to practice resulting in different outcomes for students with different levels of mastery at the outset of the intervention. This study is the first comparison of interleaved and incremental rehearsal practice, and the results suggest that interleaved practice is the most efficient schedule for drilling math facts. iv v Table of Contents List of Tables ................................................................................................................................ vii List of Figures .............................................................................................................................. viii Chapter 1 INTRODUCTION ....................................................................................................... 1 Problem Statement..................................................................................................................... 5 Learning Targets ........................................................................................................................ 6 Study Purpose and Research Questions................................................................................... 9 Hypotheses ................................................................................................................................ 10 Structure of Dissertation ......................................................................................................... 11 Chapter 2 LITERATURE REVIEW ......................................................................................... 12 Conceptual Framework and Relevant Constructs ................................................................ 13 Background and Groundwork ................................................................................................ 16 Benefits of interleaving. ....................................................................................................... 16 Task characteristics and levels of interleaving. ................................................................. 18 Synthesis of the background. .............................................................................................. 24 Purpose of the Present Review ................................................................................................ 25 Method ...................................................................................................................................... 25 Results ....................................................................................................................................... 26 Discussion ................................................................................................................................. 32 Limitations of the Review .................................................................................................... 34 Future Research ................................................................................................................... 37 Chapter 3 METHODS ................................................................................................................ 39 Research Questions Revisited ................................................................................................. 39 Setting and Participants .......................................................................................................... 39 Design ........................................................................................................................................ 41 Materials ................................................................................................................................... 44 Student interface. ................................................................................................................. 44 Social validity questionnaire. .............................................................................................. 46 Procedure .................................................................................................................................. 47 Measures ................................................................................................................................... 48 Outcome variables. .............................................................................................................. 48 Predictor variables. .............................................................................................................. 50 vi Analysis Plan ............................................................................................................................ 51 Research Question 1. ........................................................................................................... 51 Research Question 2. ........................................................................................................... 54 Research Question 3. ........................................................................................................... 55 Model Checking ....................................................................................................................... 56 Chapter 4 RESULTS .................................................................................................................. 57 Random Assignment ................................................................................................................ 57 Combining Addition and Multiplication for the Analysis .................................................... 57 Research Question 1 ................................................................................................................ 58 Research Question 2 ................................................................................................................ 61 Research Question 3 ................................................................................................................ 71 Targets Correct at Immediate and Delayed Posttests ...................................................... 72 Time in Practice by Practice Condition ............................................................................. 76 Correct Targets per 20 Minutes of Practice ...................................................................... 79 Model Checking for Research Question 3 ......................................................................... 83 Chapter 5 DISCUSSION ............................................................................................................ 89 REFERENCES ............................................................................................................................. 99 Appendices .................................................................................................................................. 112 Appendix A: Candidate Models for Research Question 2 ..................................................... 112 vii List of Tables Table 1 Examples of Practice Schedules ......................................................................................... 4 Table 2 Examples of Interleaving at Different Levels and Dimensions ........................................ 22 Table 3 Summary of Results for Interleaved Practice Schedule Studies ....................................... 27 Table 4 Summary of Task Characteristics and How Studies Interleaved ...................................... 36 Table 5 Participant Characteristics ................................................................................................ 40 Table 6 Latin Square for Counterbalancing .................................................................................. 42 Table 7 Student Experience of Study ............................................................................................. 42 Table 8 Breakdown and Planned Spacing of Acquisition vs Retention Trials Across Practice Sessions .......................................................................................................................................... 43 Table 9 Breakdown and Example of Altered Spacing of Acquisition vs Retention Trials Across Practice Sessions ............................................................................................................................ 44 Table 10 Problem Organization ..................................................................................................... 46 Table 11 Variable List ................................................................................................................... 53 Table 12 Number of Participants in Each Condition ..................................................................... 57 Table 13 Addition and Multiplication First Pretest Descriptive Statistics for Proportion Correct ....................................................................................................................................................... 58 Table 14 Accuracy Rates by Operation and Practice Schedule .................................................... 59 Table 15 Skew and Kurtosis for Three Numeric Predictors Used in Models for Research Question 2 ...................................................................................................................................... 62 Table 16 Summary of the Nine Selected Retention Models Ordered by AICc Weight ................. 63 Table 17 Parameter Estimates for the Nine Selected Models Ordered by AICc Weight With Odds Ratios Calculated from Beta Averages .......................................................................................... 64 Table 18 Number of students in each Practice Schedule at Immediate and Delayed Posttests ..... 72 Table 19 Summary Table for Model Predicting Correct Targets by Practice Schedule at Immediate Posttest ......................................................................................................................... 73 Table 20 Summary Table for Model Predicting Correct Targets by Practice Schedule at Delayed Posttest ........................................................................................................................................... 73 Table 21 Summary Table for Model Predicting Minutes in Practice by Practice Schedule and Pretest at Immediate Posttest ......................................................................................................... 77 Table 22 Summary Table for Model Predicting Minutes in Practice by Practice Schedule and Pretest at Delayed Posttest ............................................................................................................. 78 Table 23 Summary Table for Model Predicting Targets Learned per 20 Minutes of Practice by Practice Schedule and Pretest at Immediate Posttest ..................................................................... 80 Table 24 Summary Table for Model Predicting Targets Learned per 20 Minutes of Practice by Practice Schedule and Pretest at Delayed Posttest ......................................................................... 80 Table 25 Descriptive Statistics for Research Question 3 Model Residuals ................................... 83 Table 26 Descriptive Statistics for Survey Questions .................................................................... 86 Table A1 Candidate models for Research Question 2 ........................................................... 112 viii List of Figures Figure 1. Representation of task dimensions. ................................................................................ 20 Figure 2. Example of student interface. ......................................................................................... 45 Figure 3. Histogram of correct and incorrect response frequencies by Incremental Rehearsal (IR), Interleaved (IL), and Repetitive (Rep) practice schedules. ............................................................ 61 Figure 4. Graph of probability of a correct response on retention trials by practice schedule (Interleaved (IL), Incremental Rehearsal (IR), and Repetitive (Rep)) based on model 1 and split into four quartiles from the first pretest. ........................................................................................ 68 Figure 5. Model 1 retention ROC Curve........................................................................................ 69 Figure 6. Deviance residuals for retention model. ......................................................................... 70 Figure 7. Half-normal plot for retention model. ............................................................................ 70 Figure 8. Targets correct at immediate posttest ............................................................................. 75 Figure 9. Targets correct at delayed posttest .................................................................................. 75 Figure 10. Time in practice at immediate posttest ......................................................................... 78 Figure 11. Time in practice at delayed posttest ............................................................................. 79 Figure 12. Number of Targets Learned per 20 Minutes of Practice .............................................. 82 Figure 13. Number of targets learned per 20 minutes of practice. ................................................. 82 Figure 14. Density plot of residuals from models used to address research question 3. ................ 84 Figure 15. Scatter plot of residuals from models used to address research question 3 with a loess smoother ......................................................................................................................................... 85 Figure 16. Proportion of responses in each of seven responses for first survey question: “How helpful was this practice?” ............................................................................................................. 87 Figure 17. Proportion of responses in each of seven responses for second survey question: “How fun was this practice?” ................................................................................................................... 87 1 Chapter 1 INTRODUCTION Students’ acquisition, retention, and transfer of academic skills are critical goals of our education system. Acquisition is conceptualized as gaining facility in a task in the short term (Kornell, Castel, Eich, & Bjork, 2010; Pashler, Rohrer, Cepeda, & Carpenter, 2007; Sorensen & Woltz, 2016), retention is maintenance of that facility over some period of time, and transfer is applying increased facility in one skill to another skill (Healy, Kole, & Bourne, 2014). Within the context of multi-tiered systems of supports, students who struggle to acquire and retain skills at the same rates as their classmates are specifically monitored and receive intervention to increase the trajectory of their learning. A core concept of an intensive intervention is that of engaging in instructional practices that are more efficient and effective than those provided within a standard core instructional approach. Teachers often direct students to practice specific skills with the belief that more practice leads to acquisition and retention. The conceptualization of the relation between practice and acquisition and retention has evolved from simple ideas such as the Total Time Hypothesis (Cooper & Pantle, 1967), in which the only important variable is presumed to be the number of practice opportunities (exposures to the target). Subsequent investigation has highlighted the added benefit of the distribution of practice (Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006), showing that there are ways to modify practice to take better advantage of finite resources such as time, attention, and staffing. Changing the schedule of practice to increase effectiveness and efficiency of classroom time is closely tied to the overarching 2 idea of intensive intervention. This project is designed to further expand a literature base that can be used to inform instructional practices and increase the efficiency of learning in K-12 settings. This project aims to take a practice schedule with some promise (interleaved practice) and compare it to practice schedules that are currently in use and to a dosage control. To that end, this study is focused on the comparison of three practice schedules. Repetitive practice can serve as a business as usual dosage control. This is a simple practice schedule and involves the learner repeating a single target skill for a set number of trials before switching to the next target. Within an academic context a repetitive schedule might involve a student who is learning transcription skills writing a letter several times before moving to another letter, or a student studying math facts answering the same target problem a number of times before moving to the next problem. In this study, repetition is being used to represent the total time hypothesis mentioned above. It is a no-frills example of dosage. Incremental rehearsal practice schedules are evidence that the field has moved beyond the total time hypothesis. They are a type of distributed practice (Varma & Schleisman, 2014) in which the learner inserts gradually increasing numbers of known skills in between unknown (target) skills. If a student has a set of math facts that have been mastered, they would be interspersed between exposures of a target fact. Incremental rehearsal has been chosen because it is a practice schedule that has a solid foundation in the literature and has been widely adopted in classrooms. Searches for incremental rehearsal return links to interventioncentral.org, school and district 3 webpages, and other education resources sites. It has been established as an effective tool for acquiring and retaining academic skills (Burns, 2005; Joseph, 2006). Finally, an interleaved practice schedule has the learner switch between target skills. In a classroom setting, a teacher might select three target math facts for a student and mix them together during a practice session. An interleaved schedule can repeat the same pattern or have a pseudo-random pattern (ABC-ABC-ABC vs ACB-BCB-ACA). Interleaved schedules are much less prevalent in the scientific literature and the vocabulary of practitioners. Despite a deep background in motor learning and cognitive literature (Carvalho & Goldstone, 2015; Magill & Hall, 1990; C. H. Shea, Kohl, & Indermill, 1990), these schedules have yet to have a large impact in educational psychology literature. This study will focus on examining the utility of interleaved practice for learning single digit math facts. See Table 1 for examples of all three schedules. An example of these three schedules taken out of a classroom context might look at improving target basketball skills. Imagine someone attempting to improve performance on shooting free throws, lay-ups, and a baseline three point shot in basketball. If the player is instructed to shoot 10 shots from each place in a repetitive schedule they would shoot 10 lay-ups, then 10 free throws, and finally 10 baseline shots. In an incremental rehearsal schedule, a player would shoot a free throw, then make a chest pass (assuming that a chest pass is a mastered skill), then a free throw, then a chest pass and a bounce pass (again, assuming that a bounce pass is a mastered skill), and so on until the player has shot 10 free throws. The process would then be repeated for the other 4 two target skills. A player practicing in an interleaved schedule would shoot a free throw, then a lay-up, and then a baseline shot. The player will then switch skills each trial until 10 shots from each position have been attempted. Table 1 Examples of Practice Schedules Incremental Rehearsal Repetitive Interleaved Targets 4 x 6 4 x 6 7 x 9 7 x 9 8 x 3 8 x 3 Targets 9 x 2 Targets 3 x 5 4 x 6 1 x 1 1 x 1 1 x 1 1 x 1 1 x 1 1 x 1 9 x 2 9 x 2 3 x 5 2 x 8 7 x 9 4 x 6 2 x 3 7 x 9 2 x 3 8 x 3 2 x 3 7 x 7 9 x 2 2 x 8 7 x 9 8 x 3 1 x 1 7 + 3 1 x 1 7 + 3 1 x 1 7 + 3 4 x 3 9 x 2 7 x 9 3 x 5 2 x 3 8 + 1 2 x 3 8 + 1 2 x 3 8 + 1 9 x 2 3 x 5 4 x 6 1 x 9 7 x 9 1 x 9 8 x 3 1 x 9 9 x 2 7 x 9 1 x 1 6 x 2 1 x 1 6 x 2 1 x 1 6 x 2 9 x 2 2 x 8 2 x 3 5 + 5 2 x 3 5 + 5 2 x 3 5 + 5 9 x 2 2 x 8 7 + 3 4 x 6 7 + 3 7 x 9 7 + 3 8 x 3 9 x 2 7 x 9 4 x 6 1 x 1 7 x 9 1 x 1 8 x 3 1 x 1 7 x 7 3 x 5 1 x 1 2 x 3 1 x 1 2 x 3 1 x 1 2 x 3 7 x 7 7 x 9 2 x 3 7 + 3 2 x 3 7 + 3 2 x 3 7 + 3 7 x 7 3 x 5 7 + 3 8 + 1 7 + 3 8 + 1 7 + 3 8 + 1 7 x 7 7 x 9 8 + 1 1 x 9 8 + 1 1 x 9 8 + 1 1 x 9 7 x 7 2 x 8 4 x 6 6 x 2 7 x 9 6 x 2 8 x 3 6 x 2 7 x 7 3 x 5 1 x 1 5 + 5 1 x 1 5 + 5 1 x 1 5 + 5 7 x 7 2 x 8 2 x 3 4 + 4 2 x 3 4 + 4 2 x 3 4 + 4 7 x 7 7 x 9 7 + 3 4 x 6 7 + 3 7 x 9 7 + 3 8 x 3 7 x 7 3 x 5 8 + 1 1 x 1 8 + 1 1 x 1 8 + 1 1 x 1 4 x 3 7 x 9 1 x 9 2 x 3 1 x 9 2 x 3 1 x 9 2 x 3 4 x 3 2 x 8 4 x 6 7 + 3 7 x 9 7 + 3 8 x 3 7 + 3 4 x 3 7 x 9 1 x 1 8 + 1 1 x 1 8 + 1 1 x 1 8 + 1 4 x 3 2 x 8 2 x 3 1 x 9 2 x 3 1 x 9 2 x 3 1 x 9 4 x 3 2 x 8 7 + 3 6 x 2 7 + 3 6 x 2 7 + 3 6 x 2 4 x 3 3 x 5 8 + 1 5 + 5 8 + 1 5 + 5 8 + 1 5 + 5 4 x 3 2 x 8 1 x 9 4 + 4 1 x 9 4 + 4 1 x 9 4 + 4 4 x 3 7 x 9 6 x 2 3 + 8 6 x 2 3 + 8 6 x 2 3 + 8 4 x 3 3 x 5 5 Problem Statement The primary goal of this project is to expand and generalize the literature base for interleaved practice. This dissertation describes the effects of interleaving target skills within academic practice compared to a dosage control (repetitive) and to familiar and effective (incremental rehearsal) schedules. The practice of interleaving has a well- established base in cognitive and motor learning literature (Birnbaum, Kornell, Bjork, & Bjork, 2013; Carvalho & Goldstone, 2014a; Magill & Hall, 1990; J. B. Shea & Morgan, 1979; Zulkiply & Burt, 2013). Research in these areas has demonstrated that participants who practice in an interleaved schedule tend to retain target skills at a higher rate than those who practice in repetitive blocks. A foundational study that demonstrated that effect compared learning and performance on barrier knockdown tasks (J. B. Shea & Morgan, 1979). Participants in a high contextual interference, or interleaved, condition retained acquired movement patterns better than did participants who practiced in a repetitive schedule. Subsequent research in motor learning (Magill & Hall, 1990), cognitive (Carvalho & Goldstone, 2014b; Zulkiply & Burt, 2013), and academic (Rohrer, Dedrick, & Stershic, 2015) contexts has reinforced the original findings. Thus far, only seven studies have extended the literature base for interleaved practice manipulations into K-12 academic settings (Booth et al., 2015; Ostrow, Heffernan, Heffernan, & Peterson, 2015; Rau, Aleven, & Rummel, 2013; Rau, Aleven, Rummel, & Pardos, 2014; Rohrer, Dedrick, & Burgess, 2014; Rohrer et al., 2015; Taylor & Rohrer, 2010). While small, this group of studies has demonstrated the positive effect associated with an interleaved practice schedule across writing, fractions, and geometry, 6 with effect sizes ranging from d = .20 to as high as d = 2.02. Overall, the evidence presented in these seven studies suggests that employing an interleaved practice schedule has potential to have a demonstrable and meaningful effect on the efficiency of learning in classrooms. The studies referenced above have demonstrated the promise of interleaved practice schedules, but have some limitations. For example, most of these studies focused on comparisons between repetitive schedules and interleaved practice and their contribution to the retention of target skills. Exceptions are studies by Rau (2013a & 2013b), who examined interleaving on different dimensions, specifically task type and task presentation format. Additional research is needed to compare interleaved practice to other schedules that have been associated with strong retention benefits, such as incremental rehearsal (Burns, 2005; MacQuarrie, Tucker, Burns, & Hartman, 2002; Varma & Schleisman, 2014). Specifically, while incremental rehearsal is associated with better retention when compared to traditional drill and repetitive practice, it is necessarily a lengthy procedure. Students must be exposed to 45 non-target trials to be exposed to a target just nine times. If an interleaved schedule can lead to similar, or better, retention through a simple switch in practice schedule, there is potential to reduce practice time by a factor of 6. This study provides the opportunity to compare the effect of different practice schedules on acquisition and efficiency, as well as retention. Learning Targets Adequate mathematics preparation is important for personal success and the success of a technical society (National Mathematics Advisory Panel, 2008). Students 7 with more math preparation earn more, and countries are increasingly dependent on a workforce that is mathematically literate (National Mathematics Advisory Panel, 2008). To that end, an important goal of the American educational institution should be giving students tools they will need to build their mathematical skills. Math is a deep and complicated topic. Math skills range from basic concepts of numeracy and the determination of whether one set has more of something than another, to the complexities of calculus, matrix algebra, and geometry. Teaching math is also a broad and deep endeavor. Instruction must match student assets and needs, and engage the learner to form a conceptual understanding of the topic (Carr & Alexeev, 2011; National Research Council & Mathematics Learning Study Committee, 2001; Geary, 2005; Gersten et al., 2009; National Mathematics Advisory Panel, 2008; Powell, Fuchs, & Fuchs, 2013). The National Research Council (2001) breaks math proficiency into five intertwined strands: Conceptual Understanding, Procedural Fluency, Strategic Competence, Adaptive Reasoning, and Production Disposition. No strand can be taught in isolation, and each strand is further comprised of more component parts. This project is focused on math facts, an aspect of the Procedural Fluency strand. The National Research Council (2001) describes Procedural Fluency as “skill in carrying out procedures flexibly, accurately, efficiently, and appropriately.” Fast and accurate access to single digit addition and multiplication facts is an important part of that overarching Procedural Fluency strand. Although building fluency in single digit addition and multiplication facts should not be the sole focus of math instruction, several sources underscore the importance of 8 fluent access to those skills (Gersten et al., 2009; National Mathematics Advisory Panel, 2008; Powell et al., 2013). The literature indicates that increased fluency with one-digit math facts facilitates learning of more complex processes. To that end, the goal of this project is to compare the learning of single-digit math facts across three practice schedules. This comparison is intended to add to a body of literature that will help practitioners make teaching decisions that are more effective and efficient for their students. Single digit math facts play an important role in the context of math instruction. Within the context of this project, they are also relevant as a target skill in that they are discreet, important for learners, and have a finite set (and thus have a learning endpoint)—all properties of a specific skill that can be learned through practice. As mentioned, the interleaved literature has a deep base in cognitive psychology and motor learning. The foundational Shea and Morgan (1979) study used barrier knockdown tasks that were discreet, had specific solutions, drew on a deeper latent coordination skill, and had potential to transfer to other related tasks. Similarly, single digit math facts are discreet, benefit from a latent numeracy skill, and are useful in their application to more complex and diverse mathematics skills. A more efficient path to accurate and fluent math facts frees time for instruction, and increases the opportunity for learning more complex skills (National Research Council & Mathematics Learning Study Committee, 2001). Single digit math facts are also a part of the Common Core State Standards (2010) for grades 1-3. For these reasons, single digit math facts are an ideal skill for this extension of the interleaved practice literature base. 9 Single digit addition and multiplication facts are useful within the context of this study because of some important attributes. First, there is a finite and convenient number of single digit addition and multiplication facts. From 0:0 to 9:9 (leaving out reciprocals) there are 55 single digit addition and 55 single digit multiplication problems. Sets of 55 are easy to divide into subsets, and an exhaustive item base can be used within those parameters. Second, each of those 110 items has a sum or product that is a positive integer. Subtraction and division of single digits can result in negative and fractional numbers respectively. As the goal of this study is a simple generalization and comparison of interleaved practice, an item set that is wieldy and results in simple responses is most desirable. Study Purpose and Research Questions The goal of the proposed research is to compare acquisition, retention, and efficiency in learning across three practice schedules: 1) a repetitive schedule that acts as a control, 2) an incremental rehearsal schedule that has been demonstrated to improve retention of learning targets, and 3) an interleaved practice schedule, that is just recently being examined in K-12 academic learning. For the purposes of this study, acquisition is defined as accuracy with a single practice session, retention is defined as accuracy at an assessment opportunity outside of a practice session, and efficiency is defined as the amount of practice divided by the amount of time spent in the intervention (average of x problems retained per y minutes of practice). 10 Specifically, my research questions are: 1) How is acquisition of target math facts influenced by practice schedule (repetitive, incremental rehearsal, and interleaved)? 2) Does retention of target math facts differ by practice schedule? 3) Does efficiency of learning differ by practice schedule in terms of time investment per math fact? Hypotheses Research Question 1: Likelihood of a correct response will increase at a faster rate for the Repetitive schedule, but will asymptote over the course of several sessions. Likelihood of a correct response will increase the next most quickly for incremental rehearsal. Interleaved practice will yield the slowest change. Repetitive practice has demonstrated a link with fast acquisition across the literature (Magill & Hall, 1990). Research Question 2: Likelihood of a correct response at retention trials will be highest for incremental rehearsal and interleaved practice and will be almost indistinguishable between the two. Interleaved and incremental rehearsal practice are both associated with established records of high retention. Research Question 3: Interleaved practice should be associated with a much better efficiency rate than incremental rehearsal. The nature of the schedules dictates this difference. Seven exposures to three targets each requires 21 trials in an interleaved schedule. The same number of exposures in an incremental rehearsal schedule requires 105 trials. If 42 trials (seven trials in each of six bundles) is enough to achieve high rates 11 of delayed retention with a Repetitive schedule, then it should have an efficiency score similar to interleaved practice. Structure of Dissertation This paper describes a study that examines acquisition, retention, and student efficiency in learning single digit math facts. The analytical framework is that of mixed- effects logistic regression models for the first two research questions, and linear regression modeling for the third. No published articles have used this analytical framework in the context of comparing practice schedules in K-12 academic skills. Chapter 2 provides a context for interleaving in cognitive literature followed by a systematic literature review of interleaved practice in K-12 settings. Chapter 3 describes, in detail, the methods alluded to above, and outlines reasoning for using those methods. Results of the study are in Chapter 4, and a discussion of the results, limitations of the project, and future research directions are in Chapter 5. 12 Chapter 2 LITERATURE REVIEW Time and practice are requisite for learners to acquire skills (Cooper & Pantle, 1967). Over time, the science of learning has developed and the Total Time Hypothesis—the idea that only the amount of practice influences skill acquisition—has been discarded as research into practices such as distributed learning, or increasing the amount of time between practice opportunities, have been explored (Cepeda et al., 2006). The finite nature of the temporal resources available to teachers underscores a need for the development of still more efficient methods of practice. Within special education, specifically, finding ways to intensify instruction that address specific student needs is an important pursuit (Fuchs, Fuchs, & Malone, 2017). While there is a clear foundation in the literature for various methods of increasing the effectiveness of student practice—for example, using distributed practice or incremental rehearsal (Benjamin & Tullis, 2010; Burns, 2005; Codding, Archer, & Connell, 2010; Fishman, Keller, & Atkinson, 1968; Gettinger, Bryant, & Fayne, 1982; Schutte et al., 2015; Varma & Schleisman, 2014)—this review will focus on interleaved practice, and the specific benefits and parameters associated with it. Interleaved practice, defined as the interleaving of a target skill among other tasks, has been shown to produce a retention benefit compared to traditional repetitive practice above and beyond the benefit of distributed practice (Kang & Pashler, 2012; Lee & Magill, 1983; Magill & Hall, 1990; Rohrer, 2012; Taylor & Rohrer, 2010). The goal of this review is to describe the context for research on interleaved practice, discuss possible parameters that limit or 13 enhance the effect of interleaved practice, review current literature related to the effectiveness of interleaved practice in academic skills, and suggest a program of research for the future. Conceptual Framework and Relevant Constructs For the purposes of this review, I adopt a model in which learning is broken into three outcomes: acquisition (sometimes called induction) (Kornell, Castel, Eich, & Bjork, 2010; Pashler, Rohrer, Cepeda, & Carpenter, 2007; Sorensen & Woltz, 2016), retention, and transfer (Healy et al., 2014). Acquisition is conceptualized as gaining facility in a task in the short term. Retention is maintenance of that facility over some period of time. Transfer is applying some increased facility in one skill to another. This model can be illustrated with the following example: Someone new to the game of golf might practice to improve in all three aspects of learning. At the driving range, this novice golfer may take lessons from an instructor, and over the course of an hour improve a swing such that the ball consistently (80% of the time) lands within 15 feet of a target at 50 yards. The golfer has demonstrated acquisition of the skill of accurately hitting a golf ball at a 50 yard target. However, upon returning to the range later, the novice’s accuracy has fallen back to the pre-lesson level of 5%. After more lessons, this golfer might be able to maintain 80% accuracy across multiple sessions on the range, demonstrating retention. If the burgeoning golfer is able to apply this newly acquired and retained skill to other tasks, such as hitting targets at 20 yards or 100 yards, he or she has demonstrated transfer. 14 Research related to interleaved practice tends to focus on retention (Magill & Hall, 1990). When acquisition is measured and compared between practice schedules, researchers have often found that interleaved practice does not lead to more efficient skill acquisition (Blandin, Proteau, & Alain, 1994; Magill & Hall, 1990; Pollatou, Kioumourtzoglou, Agelousis, & Mavromatis, 1997; C. H. Shea et al., 1990; J. B. Shea & Morgan, 1979). The acquisition detriment can appear paradoxical when compared to the retention benefit, because it seems odd that practice that leads to higher accuracy in fewer trials is also associated with poorer retention. Transfer is less often addressed in the literature. The focus on retention, rather than transfer, in the literature is reflected in the studies covered in the rest of this review. Transfer is a topic that is certainly worthy of examination; however, the dearth of extant literature addressing the effects of interleaved practice on transfer means that such a review will have to wait. Within the contextual interference and interleaved practice literature, researchers have often compared blocked practice schedules with interleaved, or high contextual interference, practice schedules (Birnbaum et al., 2013; Kang & Pashler, 2012; Lee & Magill, 1983; Magill & Hall, 1990; Taylor & Rohrer, 2010). Blocked practice refers to a practice schedule in which the same task is repeated until all practice trials for that task have been completed before moving to another task (Magill & Hall, 1990; C. H. Shea et al., 1990). An interleaved or high contextual interference schedule introduces distractor tasks between target tasks to create interference between iterations of the target task (Kang & Pashler, 2012; Magill & Hall, 1990; J. B. Shea & Morgan, 1979). Many times the distractor tasks differ from the target task in some way. For example, in Shea and Morgan’s (1979) study, participants learned three similar, but different, tasks that 15 required them to knock down small barriers set up in different patterns. In another study (Landin & Hebert, 1997) participants practiced basketball shots from different positions on the court. For the purposes of this paper, practice schedules that are high in contextual interference or in which a target task is interleaved with other tasks will be called interleaved schedules. A high contextual interference schedule may not always be strictly interleaved; many researchers, particularly in the motor learning literature, have used pseudo-random schedules that might place two identical trials next to each other (Blandin et al., 1994; Lee & Magill, 1983; Magill & Hall, 1990; Pollatou et al., 1997; J. B. Shea & Morgan, 1979). However, the characteristics of interest are that interleaved schedules provide built-in distribution of practice trials, and create an environment in which different tasks are presented together. Further, to avoid confusion around the terms blocked, blocking, block, and repetitive, any practice schedule in which the main feature is that each trial of a target task is practiced at one time before shifting to another task will be referred to as a repetitive or repeating schedule. Some researchers (Kulasegaram et al., 2015; Sorensen & Woltz, 2016) refer to the control condition as blocked when stimuli that are different, but highly similar, are presented together, while interleaved practice refers to the interleaving of items from different categories. Other authors (J. B. Shea & Morgan, 1979; Stambaugh, 2011) use the term blocked to refer to identical stimuli presented one after the other. Again, to reduce confusion, these schedules will be referred to as repetitive as they repeat the salient dimension of the task. 16 Background and Groundwork There is a robust foundation of interleaved practice in motor learning literature (Blandin et al., 1994; Lee & Magill, 1983; Magill & Hall, 1990; Pollatou et al., 1997; J. B. Shea & Morgan, 1979). There is also a body of literature that examines the benefits of interleaved practice in learning to discriminate between artists (Kornell & Bjork, 2008), bird identification (Birnbaum et al., 2013), learning clarinet music (Carter & Grahn, 2016), and mirror-drawing (Desmottes, Maillart, & Meulemans, 2017). The literature represents convincing evidence for the utility of interleaved practice for improving retention when compared to blocked or repetitive practice. Benefits of interleaving. In a foundational motor learning study, Shea and Morgan (1979) used three different barrier knockdown tasks, and compared the results of participants in an interleaved practice condition with the performance of participants in a repetitive practice condition. The authors found that, during acquisition, the interleaved schedule appeared to lead to worse performance than the repetitive schedule; however, it led to a retention benefit for the random practice schedule on the tasks that were learned. They also found a transfer benefit for tasks that were similar, but not identical, to the tasks targeted in practice. This benefit is derived in part from the natural spacing of the target tasks, a benefit conveyed by any distributed practice schedule (Benjamin & Tullis, 2010), but has been shown to convey an additional benefit, which has been attributed in part to the learner’s increased opportunity to compare the tasks that have been presented (Carpenter, Cepeda, Rohrer, Kang, & Pashler, 2012), and, in motor skills, the opportunity to reconstruct the action plan of the target tasks (Magill & Hall, 1990). 17 One possible explanation for the retention benefit of interleaved practice over repetitive practice is that trials that interleave target tasks have a built-in distribution of practice. The benefits of distributed practice schedules to retention are well documented (Benjamin & Tullis, 2010; Burns, 2005; Codding et al., 2010; Fishman et al., 1968; Gettinger et al., 1982; Schutte et al., 2015; Varma & Schleisman, 2014). While increasing the time between study instances (the inter-study interval) can lead to better retention (Cepeda et al., 2006), there is evidence of a benefit of interleaved practice above and beyond the distribution of practice that occurs as an artifact of interleaving (Mitchell, Nash, & Hall, 2008). Mitchell et al. (2008) controlled for trial spacing and found a retention benefit for interleaved practice in a discrimination task that asked participants (n=24 undergraduate students) to engage in pattern recognition. Their second experiment (with n=32 undergraduate students) added an interrupter condition in which the inter-study interval was filled by a distractor task. Participants in the interleaved condition performed better even when the intra-task interval was held constant between conditions. This finding suggests there is something about interleaving similar tasks that provides a learning benefit beyond the mechanisms of distribution. These findings were replicated in a later study (Zulkiply & Burt, 2013) that also controlled for the temporal interval between exposures. To explain the retention disparity between interleaved and distributed practice, Birnbaum et al. (2013) describe a “discriminative-contrast hypothesis,” in which part of the benefit of interleaved practice is derived from comparing two similar but different stimuli that are presented consecutively. In a pair of studies, participants (n=102) recruited through Amazon’s Mechanical Turk were asked to learn to pair pictures of birds 18 with their species via either a contiguous interleaving schedule or an interleaving schedule that interspersed unrelated trivia questions. The discriminative-contrast hypothesis would predict that the contiguous schedule would lead to better performance on a retention task than the schedule with interspersed trivia conditions. Because the contiguously interleaved condition led to better performance, the authors concluded that the discriminative-contrast hypothesis was supported. Birnbaum et al. (2013) raised questions related to how task characteristics influence the effect of interleaved practice on learning outcomes. The transfer benefit of variable practice was further explored in a study that uses the Tower of Hanoi (ToH) puzzle (Vakil & Heled, 2016). In this study, 84 participants practiced the puzzle with either the same ToH configuration or a combination of ToH configurations. Skill acquisition through practice for both conditions was quite similar. Transfer cost, both in terms of number of moves needed to solve and time before first move, was lower for the variable condition. This finding provides promising evidence for the benefit of an interleaved practice schedule in learning abstract, procedural tasks. Task characteristics and levels of interleaving. Some researchers have focused on the nature of the target task. Within the field of motor learning, in which interleaved practice has its foundation, tasks can have various characteristics (Schmidt & Lee, 2011). For example, walking is a continuous task in that it links a specific right foot/left foot cycle over and over again (you would likely not consider a two-step sequence to be walking outside of some very specific circumstances), whereas throwing a ball at a target is a discreet task. Cognitive and academic tasks can be broken down into various dimensions. Several studies (Birnbaum et al., 2013; Carvalho & Goldstone, 2014a, 19 2014b, 2015; Kornell & Bjork, 2008) used discrimination tasks where participants were asked to distinguish between artists, shapes, birds, or patterns. Other studies (Rohrer et al., 2014; Rohrer & Taylor, 2007; Sana, Yan, & Kim, 2017) focused on procedural skills like solving area problems in geometry. Another dimension to consider is the presentation of the task. Fractions can be presented as numbers or shaded portions of a shape (Rau, Aleven, & Rummel, 2010; Rau, et al., 2013). Whether procedural or discriminatory, tasks can fall along a continuum of similarity (Carvalho & Goldstone, 2014b, 2014a, 2015) where one could conceivably interleave different multiplication math facts (very similar) or interleave finding the area of three-dimensional shapes with naming the capitals of African countries (very different). Further, it may be helpful to consider task characteristics. While the literature hasn’t formalized such a model, tasks seem to exist along continua in four dimensions: similar to dissimilar, easy to difficult discriminability, simple to complex, and discrete to continuous (see Figure 1). The next two paragraphs describe tasks along the similarity and discriminability dimensions. 20 Figure 1. Representation of task dimensions. Task similarity appears to have an influence on the interleaved practice effect. Carvalho and Goldstone (2014a, 2014b, 2015) conducted a series of studies in which they asked participants (n=290 undergraduates, n=241 undergraduates, and n=211 undergraduates, respectively) to discriminate between stimuli that varied on two levels. Shapes were generated for the studies. All shapes were at least slightly different from each other (level 1), but might have features in common within a category (level 2). Thus, several shapes from category A would differ only slightly from themselves, but differ greatly from all the shapes in category B. From this series of studies, the authors concluded that there was a retention benefit for interleaved practice when the participants were studying similar stimuli; however, that benefit disappeared when the stimuli were dissimilar. There was also a transfer benefit for interleaved practice with high similarity stimuli when participants were asked to learn novel stimuli. The authors (Carvalho & Goldstone, 2015) also included conditions in which participants had to generate the 21 category to which the stimuli belonged. In this study, participants in the generative condition performed better when using an interleaved schedule than when using a repetitive schedule. The opposite was true of the passive condition in which participants were told the category. Another facet of task representation is discriminability. Zulkiply and Burt (2013) varied discriminable load by increasing the number of distractors present in a discrimination task. The more distractors present on a stimulus, the less discriminable it is. They found that participants (n=125 undergraduates) performed better on a task with interleaved practice when the discriminable load was low (several distractors), and the retention benefit was reversed when there was high discriminability present. Within a practice session, tasks can be arranged such that interleaving can take place at multiple levels or dimensions. For example, interleaving could happen at the individual task level, with clusters of tasks, at task type, or with practice sessions. Interleaving could also take place in the presentation of the material in that the same task type, or even the same basic task, could be presented in several ways. Table 2 depicts examples of how interleaving can occur across dimensions such as task representation or task type, as well as across levels of individual tasks, clusters or practice sessions. Even within the levels described, one could interleave different subjects. A spelling task could be interleaved with a math problem and a novel sight word. Given the discriminative- contrast hypothesis, intersubject interleaving may not afford much benefit above a distributed schedule, but the example is still useful in conceptualizing the many ways interleaving can be applied. 22 Table 2 Examples of Interleaving at Different Levels and Dimensions Dimension Example Task representation 1 2 + 1 4 .5 + .25 1 2 + 1 4 .5 + .25 Task Type 1 2 + 1 4 15 + 17 11 – 4 1 2 + 1 4 15 + 17 11 - 4 Level Individual Task 1 2 + 1 4 15 + 17 11 - 4 1 2 + 1 4 15 + 17 11 - 4 Task Cluster 1 2 + 1 4 1 3 + 2 6 3 8 + 1 4 2 x 12 9 x 6 21 x 2 Practice Session 10min math 10min reading 10min writing 10min math 10min reading 10min writing Kulasegaram et al. (2015) extended the idea of mixing repetitive and interleaved practice into yet another application. They had 42 undergraduate students study physiological concepts in either an interleaved schedule (all concepts read before practicing) or repetitive schedule (practice for a concept followed the reading). The second level of the design manipulated whether the practice itself was repetitive or interleaved. Participants completed practice problems that pertained to one (repetitive) or two (interleaved) different organs. Results showed no main effect of practice type or number of organs practiced. Results of near and far transfer tests revealed a benefit of 23 studying multiple organs. Results of far transfer tests revealed benefits of both interleaved learning and practicing multiple organs. The article “Mixing topics while studying does not enhance learning” (Hausman & Kornell, 2014) is another example of the importance of attending to the level of interleaving in a practice schedule. This article describes a series of experiments that asked participants (n=55, n=79, n=77, and n=133, respectively) to learn English translations of Indonesian words and anatomical definitions. There was no significant benefit for interleaved practice in the first two experiments, as well as the fourth. A retention benefit was found for the repetitive condition in experiment 3. These results appear to run counter to the evidence described above. However, in this study, the repetitive condition interleaved material at the individual task level, while the mixed condition interleaved at the cluster level. The relative retention benefit or detriment of the schedule depends in part on the similarity of the task. Hausman and Kornell’s (2014) findings parallel those of Carvalho and Goldstone (2014a, 2014b, 2015). Sorensen and Woltz (2016) asked participants (n=160) to memorize which non- words were associated with which non-word categories, and varied the amount of repetitive and interleaved practice for the participants. The authors found acquisition and retention benefits for the most repetitive schedule. The authors had four non-words in each of six categories. The most repetitive schedule presented all four words in a given category in sequence before moving to the next category (“𝐴1𝐴2𝐴3𝐴4𝐵1𝐵2𝐵3𝐵4 …” (Sorensen & Woltz, 2016)) while the high interleaved schedule combined single exemplars from each of the categories together in a practice block. Looking at that 24 practice schedule through the lens of the Carvalho and Goldstone (2014a, 2014b, 2015) studies, it appears that Sorensen and Woltz (2016) replicated the finding that interleaved practice is ideal for studying different, but highly similar, stimuli. Synthesis of the background. Interleaved practice has a growing body of literature with boundaries and parameters that are gaining definition. Interleaved practice is fertile ground for research into remaining questions. A consideration that was not addressed in any studies described above is the importance of the underlying structure of tasks and practice. Sorensen and Woltz’s (2016) findings appear counter to much of the foundational research in this area, perhaps due to the level of interleaving, as I discussed above. Their contradictory findings may also be related to the nature of the task. Sorensen and Woltz (2016) asked participants to learn which non-words belonged in specific categories that were named with other non-words. Questions remain as to whether it matters that everything about the task was arbitrary, and whether the similarity dimension interacted with learners’ prior knowledge of the “language.” A series of experiments by Carvalho and Goldston (2014a, 2014b, 2015) placed great emphasis on task similarity, indicating that a complete lack of structure may influence any effect interleaved practice has on learning outcomes. It may be important to consider the complexity of the task (Blasiman, 2017), and how complexity, and these other task dimensions, might interact with individual differences in the learner. Another important consideration may be the combination and degree of interleaving. A constant throughout the literature on interleaved practice is the relative acquisition benefit of repetitive practice. Given the apparent acquisition benefit of a repetitive schedule, and retention and transfer benefits of interleaved schedules (at least 25 within the parameters of tasks that benefit from interleaved practice), it may be that there is a way to leverage both schedules. The preceding introduction into interleaved practice provides some context for the focus of the remainder of this review, which examines the extension of interleaved practice research into academic learning. The beginning of this paper alluded to the importance of finding more efficient methods of learning so they can be applied to the specific environment of a classroom. As described below, interleaving during practice provides a promising path towards developing more efficient learning environments for students. Purpose of the Present Review Despite its potential for improving retention as demonstrated above, interleaved practice is not well known among educators (Morehead, Rhodes, & DeLozier, 2016). This section will describe several studies that explored the potential for/ interleaved practice in academic settings. The purpose of this review is to summarize current literature pertaining to the use of interleaved practice schedules in academic skills, and provide some brief methodological comments with the aim of guiding future research. Method I searched for articles for this review in PsychInfo and ERIC electronic databases, using the following parameters: variations on “contextual interference,” “interleave,” “contingent switching,” or “win-shift, lose-stay” (the last two to capture specific variations of high contextual interference schedules (Simon, Lee, & Cullen, 2008), combined with variations on “math,” “writing,” and “reading.” Articles about interleaved practice in academic skills, specifically the application of an interleaved practice schedule 26 to learning materials in a K-12 classroom, were included. Articles that referenced academic skills in post-secondary environments were excluded, as were articles that focused on K-12 academic skills but did not use an interleaved practice schedule. References from articles obtained in the search were added to the review if they pertained to interleaved practice schedules with academic skills. Results My search returned seven studies that examined the effect of interleaved practice on academic skills with K-12 populations. Of those, six studies targeted math (Ostrow, Heffernan, Heffernan, & Peterson, 2015; Rau, Aleven, Rummel, & Pardos, 2013; Rau et al., 2013; Rohrer et al., 2014; Rohrer, Dedrick, & Stershic, 2015; Taylor & Rohrer, 2010), and one focused on handwriting (Ste-Marie, Clark, Findlay, & Latimer, 2004). An additional study evaluated a math intervention in which interleaved practice was a feature but was not examined specifically (Booth et al., 2015). See Table 3 for a breakdown of participants, measures and results of the studies cited below. 27 Table 3 Summary of Results for Interleaved Practice Schedule Studies Participants Learned Skill Measure Acquisition Results Retention Results (brief delay) Retention Results (Long Delay) Ste-Marie et al., 2004 44 1st grade students from three classrooms in two schools Three novel symbols Reproducing three novel symbols scored on 3 point scale Acquisition benefit for Blocked practice p.08. Random practice did not improve to level of Blocked practice Benefit for Random practice p=.0467 N/A 50 6-7 year old students from two schools Three novel symbols Reproducing three novel cursive letters scored Benefit for Random practice in trial blocks 2, 3, and 4 Interleaved benefit for a and h. repetitive benefit for y. Interleaved benefit for a and y. Repetitive benefit for h. 68 5.5-7 year old students from five schools Three novel symbols Reproducing three novel cursive letters scored Main effect for trial block Benefit for Random practice p=.10, d=.65 Benefit for Random practice d= 1.03 Rohrer et al., 2014 140 12 year old students taught by three teachers in eight classes Four different kinds of mathematics problems Two week delay test of three novel problems of each of the four types N/A N/A Retention benefit for Interleaved group. t(139)10.49, p<.001, d=1.05 Rohrer et al., 2015 126 middle school students Math problems related to graphing and slope 1 and 30 day delayed retention tests N/A Benefit for interleaving p=.02, d=.42 Benefit for interleaving p<.001, d=.79 Taylor and Rohrer, 2010 22 fourth graders from Florida Solving four types of math problems Tests of each problem type Practice benefit for Blocked group. t(22)=4.94, p<.01 d=2.02 Retention benefit for Interleaved group. t(22)2.96, p<.01, d=1.21 No significant benefit t(22)=1.19 Ostrow et al., 2015 146 High and Low skill seventh grade students Geometry Posttest at 2-5 days N/A N/A Interleaved main effect for the Low skill group p<.05, g=.6, but not High group (p>.05) Rau et al., 2013 230 4th and 5th grade students Fraction related math skills Computer based math assessment No benefit for any group during acquisition phase. No main effect of condition No main effect of condition 28 Table 3 (continued) Participants Learned Skill Measure Acquisition Results Retention Results (brief delay) Retention Results (Long Delay) Rau et al., 2013 (Operational Results) 101 5th and 6th grade students Fractions Computer based math assessment N/A No significant effects for efficiency or effectiveness No significant effects for efficiency or effectiveness Rau et al., 2013 (Representational Effectiveness) 101 5th and 6th grade students Fractions Computer based math assessment N/A Benefit for interleaved types t(100)=2.03, p<.05, d=.09 Benefit for interleaved types t(100)=4.74, p<.01, d=.21 Rau et al., 2013 (Representational Efficiency) 101 5th and 6th grade students Fractions Computer based math assessment N/A Benefit for interleaved types t(100)2.34, p<.05, d.37 Benefit for interleaved types t(100)=5.55, p<.01, d=.88 29 In a series of three experiments Ste-Marie et al. (2004) demonstrated the benefit of a high contextual interference practice schedule for learning handwritten symbols (letters from the phonemic alphabet and cursive letters). In the first experiment, 44 first grade students were taught three novel symbols in high and low interleaved conditions. The findings of the first experiment converged with previous research in other domains with regards to both retention and acquisition. Students in the high interleaved condition performed worse during acquisition and better in retention. The subsequent experiments extended the findings of experiment 1 by introducing a 24-hour retention assessment, and a measure of transfer. Interleaved practice lead to better scores on both the 24-hour retention and transfer measures (Ste-Marie et al., 2004). The series of three experiments provides converging evidence for the robust nature of the contextual interference effect. A group of studies co-authored by Dough Rohrer (Rohrer et al., 2014, 2015; Taylor & Rohrer, 2010) provide substantial evidence for the retention benefits of an interleaved practice schedule in the context of middle school mathematics tasks. In the first study (Taylor & Rohrer, 2010) the authors asked students (n=24) to solve for various aspects of prisms (edge, angle, face, and corners). Throughout the acquisition phase, performance in the interleaved condition was worse than in the repetitive condition (Cohen’s d = 2.02, 1.20, and 1.01). However, in a retention test, there was an accuracy benefit for students in the interleaved practice condition (Cohen’s d = 1.21). In Rohrer et al. (2014), students (n=140) were asked to solve equations and word-problem based proportion problems, as well as problems asking students to draw lines for slopes and solve for slopes. The authors reported a significant retention benefit for interleaved practice (p < .001, d = 1.05). Rohrer et al. (2015) used two related, but different types 30 of math problems: drawing a line on a graph based on a slope equation, and finding the slope of a line on a graph. Participants (n= 126) were asked to solve math problems and were assessed at 1 or 30 day delays. The authors found a retention benefit for the interleaved practice condition on 1- and 30-day delayed retention tests. Effect sizes for these results ranged from d=.47 to d=.87. These findings are consistent with previous research on interleaved practice schedules. Interleaved practice appears to be associated with improved retention, though not necessarily with faster, more accurate acquisition. Ostrow et. al (2015) studied interleaved practice schedules within the context of a computer-based tutoring system by randomly assigning 146 seventh-grade students into two groups. One group practiced skills related to geometry and probability in a repetitive schedule. The other group studied the same concepts in an interleaved schedule. The authors split the students into low- and high-skill groups for the analysis. In an interaction between individual learner differences and practice effect, the interleaved practice group displayed a higher (p< .05, Hedge’s g = .60) posttest score than the repetitive practice group for the low-skill students, an effect that was not found for the high skill students. This finding indicates that individual differences could play an important role in the effectiveness of interleaved practice for retention. Rau et al. (2013) examined the interleaving of graphical representations (GR) of fractions. In this study, students (n=230) were randomly assigned into one of four conditions: Blocked, Moderate, Fully Interleaved, and Increased. The conditions referred to the degree to which the graphical representations were interleaved. The researchers’ two hypotheses were: (1) students would improve from pretest to posttest on all measures, and (2) students in the interleaved condition would outperform students in the 31 blocked condition. They determined that students benefited from the tutoring system on each of the four areas measured, regardless of the practice condition. While the researchers found no significant effect of practice condition, they did find a significant interaction between pretest score and practice condition. Students with different skill sets at the outset of the learning had differential retention rates in different practice schedules. Specifically, students who scored below 25% on the pretest received more benefit from a fully interleaved practice schedule on the conceptual transfer measure. This finding further demonstrates the importance of individual difference and relative task difficulty. Another aspect of this study that begs further investigation stems from interleaved practice schedules having a deep body of literature demonstrating the benefit they have on delayed retention (Broadbent, Causer, Ford, & Mark Williams, 2014; Lee & Magill, 1983; Magill & Hall, 1990; Rohrer et al., 2014; Rohrer & Taylor, 2007; C. H. Shea et al., 1990; Ste-Marie et al., 2004; Taylor & Rohrer, 2010). This study did not find the same effect, other than the pretest skill-level interaction mentioned above. This failure to find converging results could be an artifact of the dimensions interleaved in this study. By blocking task type in groups of six while varying the level of contextual interference as a predictor variable, there could have been an unforeseen dimensional interaction. It could also mean that the main effect of interleaved practice on retention is not as robust as other literature would seem to indicate, and that further research is warranted. Another Rau et al. (2013) study investigated whether interleaving should take place along task or representational dimensions. One hundred one fifth- and sixth- graders used a web-based intelligent tutoring program to study fractions. Representation was varied by depicting fractions as line segments, partitions of a circle, or sets. Task 32 type varied by what students were asked to do: identifying fractions, comparing fractions, adding fractions, and so on. Students were randomly assigned to two groups in which either representation or task type were interleaved. The authors predicted that interleaving on the task type dimension would lead to a combined effect of “more effective representational knowledge (hypothesis 1a); more efficient representational knowledge (hypothesis 1b); more effective operational knowledge (hypothesis 2a); and more efficient operational knowledge (hypothesis 2b)” (Rau et al., 2013, p. 101). They concluded that evidence supported hypotheses 1a and 1b, but not hypotheses 2a and 2b. In other words, interleaving led to improvements in representational knowledge, but not operational knowledge. It is difficult to discern why there was an effect for representational, but not operational knowledge. One possible explanation could be related to some of the learner characteristics that have been mentioned. For instance, it is possible that the students were at a stage of their learning that was more optimal for representational learning than operational learning. Discussion The body of literature that extends the findings of interleaved practice schedules into the domain of K-12 academic skills is small. Despite the relative dearth of research, the strength of the evidence is impressive. The research presented above demonstrates an effect that is maintained across writing (Ste-Marie et al., 2004) and math (Ostrow et al., 2015; Rau et al., 2013; Rau et al., 2013; Rohrer et al., 2014, 2015; Taylor & Rohrer, 2010). Given that here is only one writing study, replication in that area would strengthen the evidence for interleaved practice as a strategy that is robust across 33 domains. The research has also demonstrated that the effect persists across skills that are at least as dissimilar as those used in Rohrer et al. (2014), and has made inroads regarding the possible benefit of interleaving on multiple dimensions (Rau, et al., 2013; Rau et al., 2014). Finally, the researchers who authored these studies have started to ask important questions related to interactions between individual learner differences and the mechanisms and parameters that influence the effect of interleaved practice. The studies in this review had a clear focus on retention as the target aspect of learning. Each of the seven studies reviewed used a retention measure. Only three (Rau et al., 2013; Ste-Marie et al., 2004; Taylor & Rohrer, 2010) directly addressed acquisition, and one (Ste-Marie et al., 2004) mentioned transfer. The studies comparing interleaved practice to repetitive practice (Rau et al., 2013; Rohrer et al., 2014, 2015; Ste- Marie et al., 2004; Taylor & Rohrer, 2010) found results consistent with the literature in cognitive psychology and motor learning. Interleaved practice tends to lead to better retention of the target skills. Studies addressing acquisition (Rau et al., 2013; Ste-Marie et al., 2004; Taylor & Rohrer, 2010) yielded results that were mixed. Some results (Taylor & Rohrer, 2010) were consistent with the background literature and found higher accuracy as a result of repetitive practice, while others ( Rau et al., 2013; Ste-Marie et al., 2004) showed an acquisition benefit for repetitive practice, or no significant difference. The only study to address transfer (Ste-Marie et al., 2004) found a benefit for students who studied with an interleaved schedule. One article did briefly address the idea of the discrete versus continuous nature of tasks, and also task complexity. The authors in the Rau et al. (2013) posited that the lack of between-group differences in operational outcomes may be due to practice schedules 34 having greater impact on conceptual knowledge than on procedural knowledge. Considering that quite a bit of practice schedule research comes from the field of motor learning (D. I. Anderson, Magill, & Sekiya, 2001; Guadagnoli & Lee, 2004; Lee & Magill, 1983; C. H. Shea et al., 1990; Wulf & Schmidt, 1997), where it could be argued that procedural skills are the primary area of focus, it raises the question of whether procedural knowledge in motor learning is qualitatively different than in other cognitive areas. The authors suggest that task complexity is inversely related to the effectiveness of interleaved practice schedules (M. A. Rau et al., 2013). If so, future research examining that relation, and how task complexity might vary relative to student skill and task type, would be useful. Limitations of the Review This review is limited by the lack of literature focusing on interleaved practice in K-12 academic settings. I was only able to find seven studies addressing the topic, and six of them addressed the same academic area. While the results are consistent, both within the set and with the foundational work in other fields, the implications for practice must be couched in general terms. Interleaving appears to be a beneficial practice, at least for mathematics, and has promise for the classroom. The studies reviewed demonstrated effects in geometry, fractions, and solving equations. Questions remain about the utility of interleaved schedules for more simple mathematics skills. Fact fluency, for example, might be quite amenable to such a schedule, and is an important skill for later math success. There is also some evidence that students with lower skill levels might benefit more from interleaving, but there is still a lot of work that should be 35 done regarding how to conceptualize what low skill means, and how skill level interacts with practice effects, task characteristics, and dimensions of interleaving. Very little attention was paid to task characteristics. Task characteristics and dimensions are not really addressed in the literature addressing K-12 academics. Given the paucity of published literature in the area, this lack of attention could be an artifact of researchers making initial inroads with broad strokes before focusing on fine details. It may also be that the task characteristics of similarity and discriminability are difficult to manipulate or quantify in academic skills. Further, while Rau et al. (2013) addressed which dimension to interleave, there was very little attention paid to level or dimension of interleaving outside of that reference. See Table 4 for a breakdown of tasks addressed and levels and dimensions of interleaving. The experiment described in the following chapters does not manipulate task characteristics, but does stay within the arena of simple, similar, discrete, and discriminable, about which more will be said in Chapter 3. 36 Table 4 Summary of Task Characteristics and How Studies Interleaved Study Task Characteristics Subject Dimension of Interleaving Level of Interleaving (Rohrer et al., 2014) Discrete and similar Math Task Type Problem (Rohrer et al., 2015) Discrete and similar Math Task Type Problem (Taylor & Rohrer, 2010) Discrete and similar Math Task Type Problem (Ostrow et al., 2015) Discrete and similar Math Task Type Problem (Rau et al., 2013) Discrete and similar Math Task Type and Task Representation Problem (Ste-Marie et al., 2004) Discrete and similar Writing Task Type Problem (Rau et al., 2013b) Discrete and similar Math Task Type and Task Representation Problem 37 Methodologically, these studies were generally sound, though no study is perfect. For example, more attention could be paid to operationally defining what is being interleaved and how. Also, there was an apparent error in the analysis for one study (Rohrer et al., 2014) in which the authors did match not the unit of assignment to condition (classroom level) with the unit of analysis (individual level), threatening internal validity (Shadish et al., 2002). It is difficult to discern the actual t and p values for the analysis, as standard deviations among the classes were not reported. However, this is a young field in which definitions are still being formed, the direction of the results is consistent across studies, and the papers should be considered a relevant part of the body of evidence pertaining to interleaved practice in academic settings. Future Research Future research on interleaved practice schedules has a solid foundation upon which it can expand. Moving forward, I envision two main research tracks that interact as they evolve. The first track could be called a Basic Research Track (BRT). Along this path, research will clarify questions surrounding mechanisms and parameters of interleaved schedules. What are the limits of “similar” tasks as described by Carvhalo (2014a, 2014b, 2015)? The same questions could be asked of discriminability (Zulkiply & Burt, 2013). How do those two concepts interact? What learner traits influence the effects of task dimension and level of interleaving? Where does prior knowledge factor in, and how does the underlying structure of the task interact with all of those factors? In short, the BRT will be aimed at operationalizing the dimensions and facets that might influence the possible effects of interleaved practice. 38 The second track could be called a Translational track. The translational track would take the theory and results produced by the BRT and transform it into something that can be implemented in actual learner environments. The translational track may be more focused on specific, real-world task characteristics and how they can be mapped onto the theories developed by the BRT. Further, the translational track will need to find ways to scale up procedures in ways that are practical for schools, software, and learners. For both programs of research, researchers may benefit from using a mixed effects framework in their experimental design and analysis. Using a multi-level approach would allow researchers to measure rates of learning and retention effects while accounting for correlation of data points within subjects. For example, given the evidence presented above, it is easy to imagine a model in which Level-1 main effects of time and practice condition (interleaving vs repetitive) interacting with a Level-2 variable like learner skill, and accounting for random effects. There is a lot of variety in both learners and the skills they are trying to acquire. The current state of the literature seems to indicate that interleaved practice as a useful tool for those students. However, as useful as interleaved practice schedules may be, there is a lot of room to build on the research foundation that has been created thus far. In the future, researchers and practitioners may be able to use what they know about learners and their target skills to build individualized practice schedules that leverage the strengths of an interleaved schedule. 39 Chapter 3 METHODS Research Questions Revisited The broad purpose of this project was to compare an interleaved practice schedule to an established improved practice method (incremental rehearsal), and a control (repetitive practice). Further, the goal was to compare performance in a meaningful and practical context. To those ends, I compared the acquisition, retention, and efficiency of learning single-digit math facts in a sample population of third and fourth-grade students across three practice schedules. Specific Research Questions are as follows. 1) How is acquisition of target math facts influenced by practice schedule (repetitive, incremental rehearsal, and interleaved)? 2) Does retention of target math facts differ by practice schedule? 3) Does efficiency of learning differ by practice schedule in terms of targets learned per unit of time? Setting and Participants This study was conducted at a charter school in an urban Midwestern city. The school serves pre-kindergarten through sixth grade with approximately 250 students. Because working with educational software and getting extra math help is something that is part of their typical school day, informed consent occurred via parental notification and 40 an opt-out procedure as per IRB #STUDY00000162. The sample for this project is 74 third- (n = 34) and fourth-grade (n = 40) students from a charter school in an urban district in a Midwestern metropolitan area. All students were African-American. Special education and free and reduced lunch status were not provided about individual students, however, the school has 76.2% of their students on free and reduced lunches, and 16% of students have IEPs. Table 5 presents a specific demographic breakdown of participants by sex and grade. Table 5 Participant Characteristics Male Female Third Grade 19 15 Fourth Grade 18 22 Total 37 37 The range of effect sizes of the studies in the literature review was from .09 to 1.21 (with a weighted mean of .41) for the studies that reported an effect size at the immediate posttest, and from .21 to 1.05 (with a weighted mean of .72) for studies that reported an effect size at the delayed posttest. Power analyses using trends from the literature as guidelines for effect sizes (Ostrow et al., 2015; M. Rau et al., 2010, 2014; Rohrer & Taylor, 2007; Taylor & Rohrer, 2010) were conducted with conducted with G- Power (Faul, Erdfelder, Buchner, & Lang, 2009) and Optimal Design (Spybrook et al., 2011) software. Setting effect size at .41, alpha set at .05, and power at .8, G-Power returns a sample size of 39. Setting effect size at .72, alpha set at .05, and power at .8, G- Power returns a sample size of 14. A sample of 74 should be sufficient to detect a difference between interleaved and repetitive practice at either an immediate or delayed 41 posttest. There are no published studies comparing Interleaved and incremental rehearsal groups, so estimating effect sizes for this comparison is difficult. Minimum sample size guidelines for mixed-effects models follow a general rule of 20 clusters with five observations per cluster (Raudenbush & Bryk, 2002). In this study, observations (up to 405) are clustered within students (74). Given the nature of the analyses (described below), the relative lack of risk and burden on the part of the participants, and the benefits of technology regarding ease of data collection, 74 participants is a reasonable sample size. Seventy-four participants is more than three times the rule of thumb mentioned above, and more than double the number suggested to detect a retention difference between repetitive and interleaved practice. Design This study employed a crossover, within-subjects design in which an attempt was made to expose each participant to every practice condition. Every permutation of practice condition order was implemented with six groups of students in each grade. Each student was randomly assigned to one of the six permutations. See Table 6 for a depiction of the Latin Square and Table 7 for a depiction of the student experience. Students had three targets to practice seven times per session across six sessions in each practice condition. The pretest followed by six practice sessions and two posttests is referred to as a bundle in this study. Students took a pretest, and then three problems that they answered incorrectly during the pretest were assigned as learning targets for a bundle. The students were then administered six practice sessions, followed by an immediate posttest of all items from the pretest, and then a delayed posttest administered 42 at least 10 days after the last practice session. The plan for the study was for each student to proceed through three practice schedules across three consecutive bundles. Each target was practiced 42 times within a bundle in addition to exposures at pretest, immediate posttest, and delayed posttest, for a total of 45 exposures per target per student per practice schedule, 135 exposures per student per practice schedule, and 405 exposures per student if a student was administered all practice sessions and took the pretests and both posttests for each session. Table 6 Latin Square for Counterbalancing 1 Rep IL IR 2 Rep IR IL 3 IR Rep IL 4 IR IL Rep 5 IL IR Rep 6 IL Rep IR Table 7 Student Experience of Study Bundle 1 Bundle 2 Bundle 3 Pretest A Pretest B Pretest C Practice Schedule 1 Session 1 Practice Schedule 2 Session 1 Practice Schedule 3 Session 1 Practice Schedule 1 Session 2 Practice Schedule 2 Session 2 Practice Schedule 3 Session 2 Practice Schedule 1 Session 3 Practice Schedule 2 Session 3 Practice Schedule 3 Session 3 Practice Schedule 1 Session 4 Practice Schedule 2 Session 4 Practice Schedule 3 Session 4 Practice Schedule 1 Session 5 Practice Schedule 2 Session 5 Practice Schedule 3 Session 5 Practice Schedule 1 Session 6 Practice Schedule 2 Session 6 Practice Schedule 3 Session 6 Immediate Posttest Immediate Posttest Immediate Posttest Delayed Posttest Delayed Posttest Delayed Posttest To address research questions 1 and 2 it was necessary to designate each exposure to a learning target into either an acquisition trial or a retention trial. An acquisition trial 43 was any exposure in which the participant had seen the target that day. A retention trial was any exposure in which the participant had not seen the target for at least two days. Table 8 shows how acquisition and retention trials were distributed among the pretest, posttests, and six practice sessions as the study was planned. Due to school closures, student absences, and other factors, the spacing between practice was often different from planned (See Table 9). However, the days since practice was a part of several candidate models, and these variations were accounted for. Note that the schedules depicted in Tables 8 and 9 are for one target only. Depending on schedule, these exposures would be a) followed by another target in the same pattern (repetitive schedule), interleaved with other targets (interleaved schedule), or interspersed with known targets (incremental rehearsal schedule). All possible items for a bundle are presented during the pretests and posttests. Table 8 Breakdown and Planned Spacing of Acquisition vs Retention Trials Across Practice Sessions Day 1 Day 3 Day 5 Day 7 Day 9 Day 11 Day 25 Pretest Session 1 Session 2 Session 3 Session 4 Session 5 Session 6 Immediate Posttest Delayed Posttest 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 Note: Items is bold are retention trials while items in italics are acquisition trials. 44 Table 9 Breakdown and Example of Altered Spacing of Acquisition vs Retention Trials Across Practice Sessions Day 1 Day 3 Day 10 Day 12 Day 22 Day 24 Day 34 Pretest Session 1 Session 2 Session 3 Session 4 Session 5 Session 6 Immediate Posttest Delayed Posttest 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 3 x 5 Note: Items is bold are retention trials while items in italics are acquisition trials. Materials This project was implemented in collaboration with FastBridge Learning (FBL) using software that is under development to help experimenters build and run studies. FBL is an educational software company that specializes in assessment and progress monitoring. Using the FBL infrastructure, a new experimenter user interface was developed with the purpose of investigating new instructional methods for students. This interface was used to create the practice conditions for students. The student interface mirrored typical item interfaces within the FastBridge system. Student interface. The student interface is based on math fact items already developed for the FBL systems. Students access the interface using tablet devices (in this study, they used classroom iPads). In each session, students were presented with an introductory set of two problems with instructional narration. They were then instructed to click a start button when they were ready to begin their practice session. Each problem 45 was presented, and after each response, students received feedback about whether their response was correct or incorrect. If their response was incorrect, they were shown the correct answer. See Figure 2 for examples of the problem presentation and feedback received by the students. This basic interface has been used by thousands of students around the country as a part of the FBL suite of assessments. Figure 2. Example of student interface. Learning targets. As mentioned in Chapter 1, single digit addition and multiplication facts were chosen as the learning targets for this study. All third-grade students were given addition problems and all fourth-grade students were given multiplication problems. The operator split was based on consultation with a math expert before the start of the study. The single digit addition and multiplication problems were split into three problem sets each that were of approximately similar difficulty. See Table 10 for lists of problems by pretest. All students regardless of schedule encountered the problems in the same order, Set A followed by B, and then C. Items were presented in 46 the order given in Table 10. Students received the appropriate set as a pretest, as an immediate posttest, and as a delayed posttest. Table 10 Problem Organization Multiplication Addition A B C A B C 0x0 0x1 0x2 0+0 0+1 0+2 0x3 0x5 0x4 0+3 0+5 0+4 0x6 0x7 0x8 0+6 0+7 0+8 0x9 1x3 1x2 0+9 1+3 1+2 1x1 1x6 1x5 1+1 1+6 1+5 1x4 1x9 1x8 1+4 1+9 1+8 1x7 2x3 2x2 1+7 2+3 2+2 2x4 2x5 2x7 2+4 2+5 2+7 2x6 2x8 3x5 2+6 2+8 3+5 2x9 3x4 3x8 2+9 3+4 3+8 3x3 3x6 4x4 3+3 3+6 4+4 4x5 3x7 4x6 4+5 3+7 4+6 4x7 3x9 4x8 4+7 3+9 4+8 5x7 4x9 5x5 5+7 4+9 5+5 5x9 5x6 6x7 5+9 5+6 6+7 6x6 5x8 6x9 6+6 5+8 6+9 6x8 7x7 7x8 6+8 7+7 7+8 8x8 7x9 9x9 8+8 7+9 9+9 8x9 8+9 Social validity questionnaire. A two-question questionnaire was administered as a social validity measure. This questionnaire was not aligned to a specific research 47 question but was used to gauge student perceptions of the intervention. The questions were (1) How helpful was this practice for learning math? and (2) How fun was this math practice? Procedure Students used an iPad for each session. In their first session, students took the pretest, which included one third of the possible addition or multiplication items. For the purposes of this experiment, the terms “known” and “unknown” are used to refer to items to which students responded correctly or incorrectly respectively. This naming convention is consistent with the incremental rehearsal literature (Burns, 2005; Varma & Schleisman, 2014). The software tracked correct and incorrect responses, and randomly selected three unknown targets for the subsequent practice bundle. The software also selected seven known items for use in the incremental rehearsal schedule. In cases where there were not seven known items, known items were recycled as needed. In cases where there were fewer than three unknown items, targets were recycled. Students had six seconds to respond to each item. After six seconds the trial was scored incorrect. Other math fact practice research (Burns, 2005) has used a two-second response time for oral responses. The four extra seconds in this procedure were added to allow for extra time needed to respond via typing on the iPad, which can be cumbersome. During each exposure, students were given feedback about the correctness of their response. If incorrect, students were shown the correct answer. A bundle included a pretest, six practice sessions in which the student was exposed to each target seven times in the appropriate schedule, a posttest immediately 48 following the last practice session and a delayed posttest. Ideally, these sessions would be evenly spaced over two weeks; however, student absences, school schedules, and other factors necessitated altered schedules. Some students received the intervention each school day. In other cases, the interval between sessions was longer than the intended delay for the delayed retention test. For example, due to inclement weather many students did not interact with the experiment between January 23rd and February 4th (or later). Data were collected each day class was in session and all students who were available and willing were asked to participate. The same process was repeated for each bundle. A questionnaire was attempted at the end of each bundle. It should be noted that the intention in this study was to hold the total practice opportunities equal across students and practice schedules. However, there were some cases in which students accidently navigated away from the intervention, or the system froze. In those cases, the practice session was restarted. The nature of the system is such that students were started from the beginning of the session, and thus had more practice opportunities than students who did not have issues. Further, when a student only had one or two potential targets to choose, the software chose one (or two) at random and substituted it/them for missing targets in the schedules. This inadvertent over exposure is controlled in the analysis with the practice opportunities predictor variable. Measures Outcome variables. Accurate responses are the main outcomes used for the first and second research questions. The log-likelihood of an accurate response on any given observation is modeled in the case of acquisition, and log-likelihood of an accurate 49 response on retention observations is modeled in the case of retention. Retention observations are defined as any observations that are at least two days after the previous practice opportunity. Originally, the plan was to model only responses on the delayed posttest. However, a combination of many canceled school days and student absences during data collection periods led to gaps of greater than 10 days between practice sessions within a bundle. Thus, I decided to change the definition of a retention observation as described above. Specifically, a retention trial is any trial of a target in which the student has not seen that target for at least two days. This change allows for models that include number of practice opportunities and days since last practice opportunity. Thus, there were a varying number of retention observations for each student. Further, attrition (typically in the form of unwillingness to participate further, but also in lack of attendance) led to not every student being exposed to every practice condition, or receiving all practice or posttest sessions for a bundle. The change in retention observation definition allows observations from those students to be used in the models. For Research Question 3, a rate was calculated for each set of math facts associated with a particular practice schedule for each student. This rate was the number of target facts learned at the immediate or delayed posttests divided by the amount of time the student spent in the intervention (including the pretest and posttests). For example, if a student learned all three target facts in the incremental rehearsal bundle, and spent an hour doing so,, then the outcome rate for that student for that bundle was 1 target learned per 20 minutes of practice. 50 Predictor variables. For Research Question 1, predictor variables of interest include pretest score, practice schedule, and number of exposures to the target (referred to as practice opportunities. Interactions between predictor variables were also included in the models. Predictor variables of interest for Research Question 2 include those for Research Question 1, as well as the number of days since the last practice of the target. Predictor variables of interest for Research Question 3 include pretest score and practice schedule. The number of exposures to the target, or practice opportunities, accounts in some ways for the Total Time Hypothesis (Cooper & Pantle, 1967; Underwood, 1970). Pretest score is included to determine if the effects of practice or practice schedule interact with student skill level before the intervention (Ostrow, Heffernan, Heffernan, & Peterson, 2015; Rau, Aleven, & Rummel, 2013). Pretest score is the percent correct on the pretest for a bundle. The role of Practice Schedule as a predictor variable is clear, as it is the driving feature of the primary research questions. The number of days since practice should give an idea of retention over differing lengths of time. Previous research in interleaved practice (Magill & Hall, 1990) indicates that an asymptote is a possibility; as such, a quadratic form of the number of exposures of the target was included in the modeling process. Also, there is a potential effect of day of intervention, which corresponds to the number of calendar days since the start of the study. While there is an expectation that day of intervention will share a lot of variance with number of exposures, there is a possibility of some anomaly related to the day of a 51 practice session (e.g., related to the school calendar, the day of the week, and so forth). To account for such a possibility, day of intervention was included as a predictor. Analysis Plan First, I decided whether to analyze third-grade (addition) and fourth-grade (multiplication) students with the same or separate analyses. This decision was based on whether third- and fourth-grade students performed significantly differently on the first pretest of the study. If they did, then they should be analyzed separately, because they respond differently enough to be treated as two different groups. Once that decision was made, the following analysis plan was followed for each research question. Table 11 lists variables of interest for answering all three research questions, additional details of which are described below. Appendix A lists candidate models for research questions 1 and 2. Research Question 1. How is acquisition of target math facts influenced by practice schedule (repetitive, incremental rehearsal, and interleaved)? In this study multiple observations were made for each student. Thus, the assumption of independence of observations was violated. A mixed-effects regression model accounts for autocorrelation between within-student observations (Long, 2012; Raudenbush & Bryk, 2002). Further, this question has a binary (0 = incorrect and 1 = correct) outcome variable and cannot have normally distributed residuals. Logistic regression uses a general linear model and maximum likelihood estimations to model the log-likelihood of a response of 1 (Hilbe, 2009). A logistic mixed-effects regression model is able to model the log-likelihood of the binary outcome with nested data. Thus, it is a suitable tool for approaching Research Question 1. Unfortunately, the mixed effects 52 models would not converge, and could not be used. Using a logistic model that does not include an estimate of random effects was deemed not appropriate, because it would not be able to account for interdependence of the repeated measures data, violating the assumption of independence. Thus, this research question could only be addressed with descriptive statistics. 53 Table 11 Variable List Variable Type Variable Research Question Operationalization Predictor Pretest score 1, 2, and 3 Percent correct on first pretest Predictor Number of practice opportunities 1 and 2 Number of times target is seen Predictor Practice schedule 1, 2, and 3 IL, IR, or Rep Predictor Days since practice 2 Number of days since target has appeared Predictor Day of Intervention 1 and 2 Calendar days since start of study Predictor Practice Schedule x Number of opportunities 1 and 2 Two-way Interaction Predictor Practice Schedule x Pretest score 1, 2, and 3 Two-way Interaction Predictor Practice Schedule x Grade 1, 2, and 3 Two-way Interaction Predictor Practice Schedule x Number of opportunities x Pretest Score 1 and 2 Three-way Interaction Predictor Practice Schedule x Days since practice x Pretest Score 1 and 2 Three-way Interaction Predictor Practice Schedule x Days since practice 2 Interaction Outcome Accuracy of response at acquisition observation 1 Correct or incorrect response at observations in which the target has been seen that day Outcome Accuracy of response at retention observations 2, 3 Correct or incorrect response at observations in which the target has not been seen for at least two days Outcome Time spent in intervention 3 Sum of reaction times on trials to date (including on pretests and posttests) Note: Quadratic and cubic forms of practice opportunities will be included as well to capture asymptotes and reversals of trends. . IL, IR, and Rep refer to interleaved, incremental rehearsal, and repetitive schedules respectively. 54 Research Question 2. Does retention of target math facts differ by practice schedule? A mixed-effects logistic regression model is the most appropriate analytical tool for this question. Unlike research question 1, the mixed effects models converged. Only one random effect was estimated for two reasons: (1) models with more than one random parameter were not able to converge, and (2) output summaries for models with more than on random parameter assigned a small proportion of the variance to parameters other than the random intercept. See Table A1 for a list of candidate models for research question 2. Further, an information-theoretic (IT) model selection framework (D. Anderson, 2008; Burnham & Anderson, 2002; Burnham, Anderson, & Huyvaert, 2011) was used to make a selection of a final model. With this approach, a set of variables of interest are used to create an a priori set of candidate models. All models are run, and information criteria scores (Akaike Information Criteria [AIC], Bayesian Information Criteria [BIC].) are calculated based on the log-likelihood of the models and compared using both the criteria scores and the Akaike weights. An Akaike weight is the probability that a model is the best model in the candidate set given the data (D. Anderson, 2008). A model with a weight of .9 has a 90% probability of being the best model given the data. Given the data and the calculated information criterion, a probability weight is calculated, and the model with the lowest chosen information criterion and highest weight is chosen as the final model. In this case, given the sample size involved, the corrected Akaike Information Criterion (AICc) was used to pick the best fitting model. Parameter 55 estimates were interpreted through a lens of Beta averaging (Burnham & Anderson, 2002) in which the parameter (Beta) estimates for each parameter are multiplied by the weight of the model and summed across models. Further details about Beta averaging are presented in the results section. Research Question 3. Does efficiency of learning differ by practice schedule in terms of targets learned per unit of time? For this question, I ran three sets of two linear regression models. Models were run for the immediate and delayed posttest for each of the following outcomes: total targets answered correctly on the posttest, total time spent in a specific practice condition in the intervention by the end of each posttest, and targets correct per 20 minutes spent in the intervention. Predictors included practice schedule and centered pre-test score. These three sets of models allowed for comparisons of the practice schedules in terms of total targets learned and time spent doing the intervention individually, as well as the matter of interest, efficiency. The decision to divide the number of correct responses to targets at posttest by 20 minute intervals (rather than milliseconds, seconds, minutes, or hours) was made for three primary reasons. First, it was a number that resulted in a reasonably interpretable result. For example, the referent group averaged about 1.8 correct responses per 20 minutes of practice. That is easier to interpret than the referent group averaging about .09 correct responses per minute. Second, each schedule took at least 20 minutes to complete on average. And third, only one practice condition lasted for greater than an hour, so the hour unit makes little sense for the other two groups. 56 Model Checking Research question 1 could not be addressed via the planned modeling approach, so there were no models to check. Research question 2 was addressed with a binary mixed-effects logistic regression model. A normal distribution of residuals is impossible with a binary outcome. Further, overdispersion is not a concern with a binary outcome (Hilbe, 2009); thus, the main check on model adequacy was via a visual inspection of extreme residuals in a half-normal plot and uniform variation of deviance residuals plotted against a linear predictor (Faraway, 2005). Visual inspections of half-normal and deviance residual plots are discussed in the results sections in reference to model checking and model adequacy. Distributions of numeric predictors were checked for normality. Distributions are noted in the results section. Research question 3 employs ordinary least squares linear regression models to determine learning efficiency differences between the three practice schedules. In this case residuals were plotted on a frequency distribution to check for normally distributed residuals and a scatterplot with a loess smoother to check the appropriateness of a linear model. 57 Chapter 4 RESULTS Random Assignment Due to absences, snow days and extreme cold days on which school was cancelled, student refusal, and other unforeseen events, not every student received every condition. However, because all conditions were counterbalanced, and all students randomly assigned, each condition had a similar number of participants within each grade/operation and in total. See Table 12 for the number of participants in each condition. There were 34 unique students in 3rd grade and 38 unique students in 4th grade. A total of 72 students of the original 74 were included in these analyses. Two fourth grade students were not included because they scored perfect on the pretests. Table 12 Number of Participants in Each Condition Addition Multiplication Total Interleaved 29 22 51 Incremental Rehearsal 27 22 52 Repetitive 30 23 53 Combining Addition and Multiplication for the Analysis Recall that if third- and fourth-grade students performed similarly on the first addition and multiplication pretests, respectively, then they could be combined in the final analyses. Table 13 shows descriptive statistics for pretest scores for each group. A two-sample t-test was run to determine if there was a statistically significant different between pretest scores for addition and multiplication. A modified Levene’s test was used to make sure assumptions of homoscedasticity were met to run the two-sample t- 58 test. The null hypothesis for the Levene’s test was not rejected and indicated that the variances could be treated as equal ( 𝐹1,70 = 1.76, p = .19), and the t-test was run. The null hypothesis for the t-test was not rejected (𝑡66= 1.62, p = .11), indicating that there did not appear to be a statistically significant difference between the performance of the two groups on their respective pretests. However, the effect size (g = 0.37) for the difference between the two groups was large enough to warrant controlling for group differences in the models (What Works Clearinghouse, 2017, pg 14). Student scores on the first pretest were used as a predictor in the candidate models. Table 13 Addition and Multiplication First Pretest Descriptive Statistics for Proportion Correct N Mean SD Median Min Max Skew Kurtosis Addition 34 .70 .13 .74 .42 .84 -0.67 -0.94 Multiplication 38 .63 .19 .68 .16 .89 -0.72 -0.30 Research Question 1 How is acquisition of target math facts influenced by practice schedule (repetitive, incremental rehearsal, and interleaved)? The analyses performed for research question 1 were completed with data from acquisition trials. An observation that is an acquisition trial is an observation of a target in which the participant has been exposed to that target on that day. Any practice opportunity in a practice session that is not the first exposure is an acquisition trial (with the exception of the first practice session in which the participant has seen the target in the pretest). The original analysis plan called for logistic mixed effects regression models 59 to be built to account for the binary outcome variable and autocorrelation of the data structure. However, even relatively simple logistic mixed effects regression models would not converge. This failure to converge could be an artifact of a lack of variation in responses. Acquisition observations included in the model were only observations in which the participant had already seen the problem and the accuracy across all observations was quite high (79%). There might not have been enough variance to model random effects. Descriptive statistics are outlined below. The accuracy for acquisition trials were 75%, 80%, and 81% for interleaved practice, incremental rehearsal, and repetitive practice respectively. The accuracy rate was between 72% and 84% across both addition and multiplication for all acquisition trials and practice conditions. See Table 14 for percent of correct responses broken down by operation and practice schedule. Table 14 Accuracy Rates by Operation and Practice Schedule Addition Multiplication Total Interleaved 72% 79% 75% Incremental Rehearsal 79% 81% 80% Repetitive 79% 84% 81% Total 77% 81% 79% While the proportion of correct responses is generally higher for incremental rehearsal than interleaved practice, and generally higher for repetitive practice than 60 incremental rehearsal, these rates tend to be high across all schedules. Incremental rehearsal does match repetitive practice for addition. A similar pattern can be seen in a histogram of responses by practice opportunity (Figure 3). In general, accuracy greatly increases within the first several acquisition trials, and maintains a high rate throughout the remaining practice. These data show that acquisition trials tend to asymptote relatively quickly and maintain a high level of accuracy across practice. In acquisition trials, student errors reduce dramatically and stay low. While I hypothesized such a pattern for the repetitive schedule, it appears that for math facts, the trend is the same for the incremental rehearsal and interleaved practice schedules as well. The pattern is similar for both addition and multiplication. In the figure there are bins for practice opportunities beyond the 45 expected in the design. This is due to the issue of session restarts and duplicated targets mentioned in the methods section. Rather than truncate the figure based on an ideal experimental situation that did not exist, all observations were graphed. 61 Figure 3. Histogram of correct and incorrect response frequencies by Incremental Rehearsal (IR), Interleaved (IL), and Repetitive (Rep) practice schedules. Research Question 2 Does retention of target math facts differ by practice schedule? Retention observations were defined as any trials in which the student had not seen that item for at least two days. These outcomes were modeled with logistic mixed effects regression. Models with more than one random effect estimated would not converge, so models were run with only a random intercept estimated. Numerical predictors for these models included practice opportunities, score at first pretest, and days since practice. All three numeric predictors had approximately normal distributions. See Table 15 for skew and kurtosis for the three predictors. 62 Table 15 Skew and Kurtosis for Three Numeric Predictors Used in Models for Research Question 2 Mean SD Min Max Skew Kurtosis Practice Opportunities 18 15 1 65 .55 -.91 Pretest Score .64 .17 .16 .95 -.52 -.48 Days Since Practice 13 11 2 59 1.9 4.6 n = 1852 observations of 72 students Forty models were fit to attempt to answer research question 2. These models were created from various combinations of the variables of interest listed above, and can be found Appendix A. An AICc weight of .005 was used as a conservative cut-off for models listed in these results. A model with an AICc weight lower than .005 has less than a .5% chance of being the best model given the data. Nine models in the retention analysis converged and had an AICc weight of .005 or higher and are the only models discussed further in this analysis. The cumulative AICc weight for all models in this analysis through the .005 cut-off round to 1.00 at three significant digits. See Table 16 for a comparison of the nine selected models by number of parameters estimated and AICc metrics. Beta averaging is a way to weight parameter estimates by model likelihood and get a holistic picture of the relative influence of variables estimated by the models examined within the information theoretic framework (Burnham & Anderson, 2002). To obtain beta averages, parameter estimates are multiplied by the AICc weight of the model 63 and summed across models. Table 17 displays the parameter estimates for each of the nine selected models and the beta averages for each parameter. It also shows the odds ratio for each beta average, which is the exponentiation of the log-likelihood average for each parameter. Repetitive practice is the referent group for this analysis, so all intercepts are for the repetitive practice condition at time point zero for a student who got no items correct on the pretest. Table 16 Summary of the Nine Selected Retention Models Ordered by AICc Weight Model Parameters AICc AICc Weight 1 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾80(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝛾90(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 11 2288 0.436 2 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 8 2289 0.274 3 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝑢0𝑗 + 𝑟𝑖𝑗 6 2291 0.109 4 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾80(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 11 2291 0.089 5 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 7 2293 0.045 6 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 4 2295 0.013 7 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 4 2295 0.013 8 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 5 2297 0.006 9 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑂) + 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 5 2297 0.005 Note: Abbreviations above are as follows: Practice Opportunities (PO), Interleaved (IL), Incremental Rehearsal (IR), and Repetitive (Rep). 64 Table 17 Parameter Estimates for the Nine Selected Models Ordered by AICc Weight With Odds Ratios Calculated from Beta Averages Model (Weight) Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Beta Average Odds Ratios 𝐼𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 (𝑟𝑒𝑝𝑒𝑡𝑖𝑡𝑖𝑣𝑒) 0.26 0.23 0.05 -0.32 -0.18 0.25 0.25 0.27 0.07 0.15 1.16 𝐷𝑎𝑦 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 .99 𝐼𝐿 0.15 0.16 0.431 0.15 0.43 NA NA NA NA 0.19 1.21 𝐼𝑅 -0.20 -0.17 0.17 -0.16 0.16 NA NA NA NA -0.12 .89 𝑃𝑂 0.01 0.01 0.02 0.01 0.02 0.02 0.02 0.02 0.02 0.02 1.02 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 -0.06 NA NA 0.90 0.34 NA NA NA 0.27 0.07 1.07 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 NA NA NA 0.02 NA NA NA 0.00 NA 0.00 1.00 𝐼𝐿 𝑥 𝑃𝑂 0.00 0.02 NA 0.02 NA NA NA NA NA 0.01 1.01 𝐼𝑅 𝑥 𝑃𝑂 -0.05 0.02 NA 0.02 NA NA NA NA NA -0.02 .98 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 NA NA NA -0.05 NA NA NA NA NA 0.00 1.00 𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 0.02 NA NA NA NA NA NA NA NA 0.01 1.01 𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 0.10 NA NA NA NA NA NA NA NA 0.04 1.07 Note: all values rounded to two digits and abbreviations above are as follows: Practice Opportunities (PO), Interleaved (IL), Incremental Rehearsal (IR), and Repetitive (Rep). 65 One way to consider the relative importance of different predictors in candidate models is to look at the relative probability of the models that include that predictor compared to models that do not (Anderson, 2008; Burnham & Anderson, 2002). Practice schedule variables do not appear in 4 of the nine top models, however, the combined model weight of the models that include practice schedule variables is .953. Thus, there is a greater than 95% probability that one of those five models is the best model given the data. By comparison, the combined model weights of the models that do not include practice schedule variables is .037, indicating that models that include practice schedule variables are about 26 times more likely than models that do not. Another way to consider the relative effects of the variables is to examine the odds ratios. There are some variables that are part of the top models that do not appear to appreciably influence outcomes. These variables, including the Days Since Practice term for example, had odds ratios at 1.00 indicating that the number of days since practice did not increase or decrease the likelihood of a correct response after accounting for other variables in the model. Also, parameter estimates were largely stable across models. The variable with the largest magnitude odds ratio is the increased probability of a correct response that occurs in the presence of an interleaved practice schedule. Specifically, there is a 13.6% greater chance of getting a response correct when using an interleaved schedule than when using a repetitive schedule, and a 23% greater chance of a correct answer with an interleaved scheduled compared to incremental rehearsal. Conversely, there is an 8.5% lower chance of getting an item correct when using an incremental rehearsal schedule than when using a repetitive schedule. These relative probabilities 66 change over time, as demonstrated by the interaction variables. Students using an incremental rehearsal schedule with high pretest scores have increased odds of a correct response over time when compared to students using a repetitive schedule. Visual analysis of predicted probability of a correct response on retention trials model 1 makes the differences between practice schedules clearer. See Figure 4 for predicted probabilities faceted across four quartiles based on the first pretest score. The model includes pretest accuracy as a continuous variable with the results presented in quartiles in the figure for ease of interpretation. The largest difference is evident in the lowest quartile in which predicted probability of retention accuracy approaches 80% for only the interleaved practice condition. In that group of lower performing students, there is also a pattern of repetitive practice leading to a higher predicted probability of a correct response than incremental rehearsal across all practice opportunities. In the second and third quartiles interleaved and incremental rehearsal conditions are more similar. The predicted probability of a correct response in repetitive practice schedules is stable across all ability levels. Incremental rehearsal schedules are associated with the sharpest change in predicted probability of a correct response across ability levels. Low ability students have a very flat trajectory with incremental rehearsal practice, while higher ability students reach a high predicted accuracy in the same or fewer exposures than interleaved practice. The pattern of predicted accuracy for students in interleaved schedules appears quite stable across ability levels. 67 Another pattern that emerged was that the retention benefit for interleaved practice appears relatively early in the practice and grows steadily as practice opportunities accumulate. Conversely, incremental rehearsal practice appears to have a lower retention benefit early in the intervention before accelerating more than the other schedules later in the intervention for the highest ability learners. The predicted probabilities of a correct response in interleaved schedules when compared to repetitive schedules across all ability levels is consistent with the original hypothesis. The relative similarities in the predicted probability of correct responses of students in interleaved and incremental rehearsal schedules is consistent with what was expected among students in the higher quartiles. The disparity between interleaved and incremental rehearsal schedules in the lower two quartiles was not expected. 68 Figure 4. Graph of probability of a correct response on retention trials by practice schedule (Interleaved (IL), Incremental Rehearsal (IR), and Repetitive (Rep)) based on model 1 and split into four quartiles from the first pretest. Receiver Operator Characteristic (ROC) curves can be used to quantify and visualize the specificity and sensitivity of a classification model (Hilbe, 2009). Binary logistic models are classification models, as the model predicts the likelihood of a response falling into one of two categories (correct or incorrect). The ROC analysis looks at the model predictions, compares them to reality, and plots a curve along the X and Y axes (specificity and sensitivity, respectively). The curve is compared to a model with no predictive value, and area under the curve is evaluated for predictive fit. An area under the curve of 1.0 is a perfect fit. A value of .5 predicts no better than chance, and a value of less than .5 predicts worse than chance. The pROC (Robin et al., 2011) package 69 in R was used to run ROC curve analyses. The area under the ROC curve for the top retention model was .65. The ROC curve is shown in Figure 5. Figure 5. Model 1 retention ROC Curve. Graphing the deviance residuals for the top retention model (Figure 6) shows similar variation across the linear predictor. It also shows approximately equal scatter above and below the 0 line which indicates that a linear model is appropriate. This distribution supports the top model as an adequate one (Faraway, 2016). Graphing the data on a half-normal plot (Figure 7) shows that there is no need to be concerned about extreme or unusual cases, as there are no points that are split far out from the others (Faraway, 2016). 70 Figure 6. Deviance residuals for retention model. Figure 7. Half-normal plot for retention model. 71 Research Question 3 Does efficiency of learning differ by practice schedule in terms of targets learned per unit of time? Sixty-one individual students took an immediate post-test in at least one practice schedule, and 52 took a delayed post-test in at least on practice schedule. Breakdown of the number of students in each practice condition at the immediate and delayed posttests are below in Table 18. Three sets of models were run to look at learning efficiency. First, models were run to compare the three practice schedules at immediate and delayed posttests with respect to the number of targets answered correctly. Second, models were run to compare the three practice schedules based on the amount of time students spent in practice. Finally, models were run comparing the practice schedules based on a learning efficiency rate. All models included practice schedule and mean centered pretest score as covariates. Pretest scores were mean centered to aid in interpretation. A per condition learning rate was calculated for a student by computing the total time a student spent across all trials for a condition, determining the number of 20-minute intervals represented by the total time, and dividing the total number of learning targets answered correctly at the posttest (immediate or delayed) by the number of 20-minute intervals. This process gives an items learned per 20 minutes of practice outcome that can be modeled and predicted by practice schedule. 72 Table 18 Number of students in each Practice Schedule at Immediate and Delayed Posttests Immediate Posttest Delayed Posttest Repetitive 37 27 Interleaved 47 21 Incremental Rehearsal 33 24 Total 117 72 Targets Correct at Immediate and Delayed Posttests The linear model predicting the number of items correct at the immediate posttest by practice schedule indicates that the effect of practice schedule is significant at α = .05 (𝐹 5,111= 2.89, p = .017, 𝑅 2 = .08), however does not account for much variance. See Table 19 for summary of model estimates. The linear model predicting the number of items correct at the delayed posttest by practice schedule indicates that the effect of practice schedule was not significant at α = .05 (𝐹 5,66= 1.02, p = .41, 𝑅 2 = .07). See Table 20 for a summary of model estimates for the delayed posttest. At the immediate posttest, there is no significant difference in targets learned between the three practice schedules, and only a small interaction between interleaved practice and pretest score. A main effect of interleaved practice approaches significance at the posttest, and at an average increase of .6 targets correct, has a meaningful magnitude. 73 Table 19 Summary Table for Model Predicting Correct Targets by Practice Schedule at Immediate Posttest Parameter β SE t p Intercept (repetitive) 1.74 0.13 13.43 <.001 Interleaved Schedule 0.02 0.20 0.12 .91 Incremental Rehearsal Schedule 0.49 0.20 2.40 .19 Pretest (Mean Centered) 0 0.01 -0.13 .90 Pretest x Interleaved 0.02 0.01 -2.00 .05 Pretest x Incremental Rehearsal -0.01 0.01 -0.90 .37 Adj. 𝑅2= .08, 𝐹5,111 = 2.89, p=.017 Table 20 Summary Table for Model Predicting Correct Targets by Practice Schedule at Delayed Posttest Parameter β SE t p Intercept (repetitive) 1.45 0.21 6.86 <.001 Interleaved Schedule 0.60 0.32 1.88 .06 Incremental Rehearsal Schedule 0.16 0.31 0.51 .61 Pretest (Mean Centered) 0 0.01 0.26 .79 Pretest x Interleaved 0 0.02 -0.20 .42 Pretest x Incremental Rehearsal .01 0.02 0.81 .84 Adj. 𝑅2= .07, 𝐹5,66 = 1.02, p=.41 74 Figures 8 and 9 show boxplots of the number of targets learned by students in each condition faceted on pretest quartiles. In the figures, the solid line represents the median while the dashed line is the mean. The figures show that there is a lot of overlap between practice schedules in the highest ability learners. Also, interleaved practice appears to be associated with more targets learned among the lower ability learners at the delayed posttest. Specifically, more than half of students in the interleaved condition in the lower two quartiles remembered all targets at the delayed posttest. This was also true of the middle two quartiles at the immediate posttest. Another important pattern is the difference between the immediate and delayed posttest for incremental rehearsal practice. At the immediate post-test, over half the participants in the lower two quartiles scored perfect on learning targets in the posttest. That was not true of any quartile at the delayed posttest. 75 Figure 8. Targets correct at immediate posttest Figure 9. Targets correct at delayed posttest 76 Time in Practice by Practice Condition Models of time in practice at immediate and delayed posttest are both significant at α = .05 (Adj. 𝑅2= .80, 𝐹5,111 = 93.78, p=<.001 and Adj. 𝑅 2= .79, 𝐹5,66 = 55.37, p=<.001 respectively). Given the nature of the practice conditions, it is no surprise that both the immediate posttest and delayed posttest models show a statistically significant effect of practice schedule on time in practice (see Tables 21 and 22). Note that average time in practice is different at the immediate and delayed posttests. The time students spent taking the posttests was included in the total time students spent in practice. The rational for this inclusion is first, the format for individual exposures is identical between pretests, posttests, and practice. Students received feedback as to the correctness of their response, and corrective feedback in the case of incorrect responses. Second, the posttests were bundled with the practice and are a part of the practice experience. Incremental rehearsal took much longer for both models. There was not a statistically significant difference between repetitive and interleaved practice at either the immediate nor delayed posttests. There was a significant difference between repetitive and incremental rehearsal practice at both the immediate and delayed posttests. Incremental practice took an average of approximately 80 minutes longer than repetitive practice at the immediate posttest, and 81 minutes longer than repetitive practice at the delayed posttest. In addition to being a clear example of the much higher average time spent, Figures 10 and 11 also show how much more spread there is in the amount of time spent in incremental rehearsal practice. In the figures, the solid line 77 represents the median while the dashed line is the mean. Pretest score does not appear to appreciably influence the amount of time students are spending in practice. At both immediate and delayed posttests, a 10% improvement in pretest accuracy above the mean translates to less than a 30 second difference in practice time. Table 21 Summary Table for Model Predicting Minutes in Practice by Practice Schedule and Pretest at Immediate Posttest Parameter β SE t p Intercept (repetitive) 23.06 2.75 8.37 <.001 Interleaved Schedule 0.62 4.15 0.15 .88 Incremental Rehearsal Schedule 80.63 4.30 18.75 <.001 Pretest (Mean Centered) -0.26 0.16 -1.78 .78 Pretest x Interleaved 0 0.23 0.02 .99 Pretest x Incremental Rehearsal -1.56 .29 -5.47 <.001 Adj. 𝑅2= .80, 𝐹5,111 = 93.78, p=<.001 78 Table 22 Summary Table for Model Predicting Minutes in Practice by Practice Schedule and Pretest at Delayed Posttest Parameter β SE t p Intercept (repetitive) 27.61 3.84 7.19 <.001 Interleaved Schedule -0.08 5.81 -0.01 .99 Incremental Rehearsal Schedule 81.14 5.67 14.31 <.001 Pretest (Mean Centered) -0.37 0.21 -1.79 .07 Pretest x Interleaved 0.17 0.36 0.47 .63 Pretest x Incremental Rehearsal -1.57 0.34 -4.52 <.001 Adj. 𝑅2= .79, 𝐹5,66 = 55.37, p=<.001 Figure 10. Time in practice at immediate posttest 79 Figure 11. Time in practice at delayed posttest Correct Targets per 20 Minutes of Practice There was a significant effect of practice schedule in both the immediate posttest and delayed posttest models. In both cases correct responses per 20 minutes of practice were similar for repetitive and interleaved practice, and were much lower for incremental rehearsal practice. See Tables 23 and 24 for summaries of model estimates. A large amount of variance was accounted for by the models at both immediate and delayed posttests (. 𝑅2= .51 and 𝑅2= .37 respectively). 80 Table 23 Summary Table for Model Predicting Targets Learned per 20 Minutes of Practice by Practice Schedule and Pretest at Immediate Posttest Parameter β SE t p Intercept (repetitive) 1.86 0.13 14.06 <.001 Interleaved Schedule 0.07 0.20 0.37 .71 Incremental Rehearsal Schedule -1.35 0.21 -6.51 <.001 Pretest (Mean Centered) 0.03 0.01 4.00 <.001 Pretest x Interleaved 0.03 0.01 2.50 .01 Pretest x Incremental Rehearsal -.018 0.01 -1.27 .21 Adj. 𝑅2= .51, 𝐹5,111 = 25.03, p=<.001 Table 24 Summary Table for Model Predicting Targets Learned per 20 Minutes of Practice by Practice Schedule and Pretest at Delayed Posttest Parameter β SE t p Intercept (repetitive) 1.48 0.17 8.74 <.001 Interleaved Schedule 0.31 0.26 1.23 .22 Incremental Rehearsal Schedule -1.13 0.25 -4.56 <.001 Pretest (Mean Centered) 0.03 0.01 3.49 <.001 Pretest x Interleaved -0.02 0.02 -1.44 .15 Pretest x Incremental Rehearsal -0.02 0.02 -1.42 .16 Adj. 𝑅2= .37, 𝐹5,111 = 9.31, p=<.001 81 The large difference in the time it takes to complete each practice schedule has an important influence on the efficiency of the schedules. It is clear from both the model outputs, and from Figures 12 and 13 that interleaved and repetitive practice schedules are much more efficient than incremental rehearsal schedules. In the figures, the solid line represents the median while the dashed line is the mean. For students in the lower quartile of pretest scores, the three schedules are the most even at the immediate posttest. However, as student ability at pretest increases, the gap between incremental rehearsal and the other schedules widens. Also, the variability in post-test scores decreases, and the predicted effects interleaved practice begin to separate and look more efficient than repetitive practice. At the delayed posttest, interleaved practice appears to have a retention benefit that is larger among students who scored poorly on the pretest, and as student ability increases, repetitive practice begins to look more like interleaved practice when learning math facts. In general, the number of targets learned per 20 minutes of practice is quite similar between interleaved and repetitive practice. The similarity between repetitive and interleaved practice with respect to efficiency is interesting in that it appears to run counter to much of the established literature on the effects of interleaving practice (Magill & Hall, 1990; Rohrer et al., 2014; Rohrer & Taylor, 2007; Shea et al., 1990). As with the total items correct outcome, it would be interesting to see if there would be some more differentiation if there were more potential targets to learn. 82 Figure 12. Number of Targets Learned per 20 Minutes of Practice Figure 13. Number of targets learned per 20 minutes of practice. 83 Model Checking for Research Question 3 Residuals for these models were approximately symmetrical and normally distributed. Table 25 shows descriptive statistics for all six model residuals. Overall assumptions for normal distribution of model residuals appear acceptable. See Figure 14 for density plots of residuals for all six models graphed against a normal distribution and Figure 15 for a scatter plot of the residuals with a loess smoother. These figures indicate that the residuals are approximately normally distributed and that a linear model is appropriate. Model residuals for the models predicting time in practice are more leptokurtic than is ideal; however, all models were generally symmetrical, and did not show any problematic skew. Table 25 Descriptive Statistics for Research Question 3 Model Residuals Model N Mean SD Median Skew Kurtosis Targets Correct Immediate 117 0 .87 .23 -.19 -1.08 Delayed 72 0 1.06 -.03 -.16 -1.1 Time in Practice Immediate 117 0 18.46 .33 .56 3.88 Delayed 72 0 19.21 1.49 1.01 4.29 Targets Correct per 20 minutes of Practice Immediate 117 0 .89 -.06 .69 2.31 Delayed 72 0 .85 -.02 .15 .23 84 Figure 14. Density plot of residuals from models used to address research question 3. 85 Figure 15. Scatter plot of residuals from models used to address research question 3 with a loess smoother 86 Social Validity Measure and Observations The survey that was distributed was returned with little variation and a negative skew in the responses. See Table 26 for descriptive statistics of both survey questions. Both questions were asked after the immediate posttest for a practice schedule. Questions were framed on a Likert type scale ranging from 1 (Not at all helpful/fun) to 7 (Extremely helpful/fun). Based on the mean rating for each survey question, students tended to find the practice to be helpful and fun across all practice schedules. See Figure 9 and 10 for histograms of the proportion of responses in each of the 7 response categories for each question. For both survey items more than 70% of the responses were 5 or higher. Linear models were used to look for differences in student responses predicted by practice schedule. The models did not account for significant variance in student responses (p = .28 and p = .94 for helpful and fun respectively). Table 26 Descriptive Statistics for Survey Questions N Mean SD Median Skew Kurtosis Helpful 112 5.79 1.82 7 -1.58 1.46 Fun 112 5.51 2.07 7 -1.18 -.02 87 Figure 16. Proportion of responses in each of seven responses for first survey question: “How helpful was this practice?” Figure 17. Proportion of responses in each of seven responses for second survey question: “How fun was this practice?” 88 Anecdotally, students generally enjoyed the math practice activity. They tended to be on-task and cooperative throughout data collection. Some students liked the activity less than others did. On occasion some students would scream and hide under desks to avoid participating on a given day. This kind of avoidance behavior appeared to be more typical during incremental rehearsal practice despite survey results and was fairly rare. Teachers generally had positive things to say about the program and what they saw their students doing. Specifically, teachers commented on the high level of engagement (in general) and the need for more math fact drill practice. Throughout the two months of the experiment, teachers typically continued their lessons as scheduled, but also took several opportunities to watch their students engaging with the intervention. They reported that they liked the interface, speed of presentation, and feedback mechanisms. 89 Chapter 5 DISCUSSION Goal and Design Learning math facts is an important component of mathematics preparation (National Research Council, 2001). To ensure that students have opportunities to practice math facts in the most efficient and effective way, teachers need to know how to optimize this practice time. Researchers have demonstrated that the distribution of practice is superior to massed practice (Cepeda et al., 2006; Pashler, Bain, et al., 2007; Pashler, Rohrer, et al., 2007). Further, interleaving learning targets has been shown to convey benefits to retention beyond what would be expected from the distribution inherent in the schedule (Kang & Pashler, 2012; Lee & Magill, 1983; Magill & Hall, 1990; Rohrer, 2012; Taylor & Rohrer, 2010). The goal of this project was two-fold: First, to determine whether a promising practice schedule (interleaved practice), which has been shown to be effective for motor learning (Magill & Hall, 1990), generalizes to a skill such as math facts; and second, to compare that promising schedule to a dosage control schedule (repetitive practice) and to a practice schedule that has more support in the literature and use in schools (incremental rehearsal). This project is the first experimental comparison of interleaved and incremental rehearsal practice, and one of only a handful to examine interleaved practice in an academic context. The design of this study was counterbalanced and within subjects. Students were exposed to multiple practice schedules across practice bundles. The order of schedule 90 presentation was randomized, and a direct comparison of schedules was possible without being obscured by order effects. Analyses included logistic regression, logistic mixed- effects regression, and linear regression. In the following sections, I review each research question along with a brief statement about the results. Following a review of the research questions are a discussion about the limitations of the study, implications for practice, and a discussion of the potential for future research. Research Question 1 Question: How is acquisition of target math facts influenced by practice schedule (repetitive, incremental rehearsal, and interleaved)? Hypothesis: Likelihood of a correct response will increase at a faster rate for the repetitive schedule, but will asymptote over the course of several sessions. Likelihood of a correct response will increase the next most quickly for incremental rehearsal. interleaved practice will yield the slowest change. Repetitive practice has demonstrated a link with fast acquisition across the literature. Across all three schedules, accuracy increased towards an asymptote relatively quickly and was maintained throughout practice. There was less deviation between practice schedules than was predicted in the hypothesis. The acquisition pattern for repetitive practice was consistent with what is seen in motor learning and academic skills literature (Magill & Hall, 1990; Taylor & Rohrer, 2010). Accuracy for both incremental rehearsal and interleaved practice closely mirrored the pattern seen in repetitive practice. This finding is not congruent with what is typically seen with interleaved practice in the motor learning literature, but is not very different from studies of novel symbol and letter 91 writing (Ste-Marie et al., 2004). It may be that the similarity between memorizing and reproducing letter symbols and memorizing and responding to single digit math problems have characteristics that serve to moderate the acquisition benefit typically found in repetitive schedules. It may also be that the binary nature of the outcomes in this study are not sensitive enough to capture any differences. More research is needed to determine if the similarity of acquisition across schedules is an artifact of random error or the nature of the task chosen for the experiment. Research Question 2 Question: Does retention of target math facts differ by practice schedule? Hypothesis: Likelihood of a correct response at retention trials will be highest for incremental rehearsal and interleaved practice and will be almost indistinguishable between the two. The difference in retention rates between the interleaved and incremental rehearsal conditions among the lower scoring students was an important finding of this study. As these schedules have not been compared before, it was difficult to predict the relative effects of the two schedules, and the difference in the averaged odds ratios when compared to the referent group (1.21 for interleaved and .89 for incremental rehearsal) indicates that, efficiency aside, interleaved practice can be a more effective way to learn math facts than incremental rehearsal. Particularly in the lowest quartile, incremental rehearsal showed lower retention accuracy predicted than repetitive practice. Among students who scored average and above on the pretest, the results reflected the hypothesis. Predictions for accuracy for both interleaved and incremental rehearsal 92 conditions were quite similar, with the repetitive condition performing much worse. This finding is consistent with the extant literature that indicates that distributed and interleaved practice tends to lead to improved retention outcomes compared to repetitive practice (Benjamin & Tullis, 2010; Magill & Hall, 1990; Varma & Schleisman, 2014). Research Question 3 Question: Does efficiency of learning differ by practice schedule in terms of time investment per math fact? Hypothesis: Interleaved practice should be associated with a much better efficiency rate than incremental rehearsal. In general, incremental rehearsal and interleaved practice were associated with better retention outcomes than repetitive practice, which is consistent with the bulk of the literature available (Joseph, 2006; Magill & Hall, 1990; Rohrer et al., 2014; Rohrer & Taylor, 2007; J. B. Shea & Morgan, 1979; Taylor & Rohrer, 2010; Varma & Schleisman, 2014). The largest difference between the practice conditions emerges in the results from the analysis related to learning efficiency. The mean targets answered correctly per 20 minutes of practice was about 1.86 for the repetitive condition, 1.93 for the interleaved condition, and approximately .52 for the incremental rehearsal condition at the immediate posttest, and approximately 1.48, 1.78, and .34 at the delayed posttest. Ability, as measured by pretest score, also seemed to have an impact on efficiency. These differences are not artifacts of accuracy alone, as indicated by the models that predict the number of targets answered correctly regardless of time, but are a result of the 93 incremental rehearsal schedule taking approximately 80 minutes longer than the repetitive or interleaved schedules. These results generally follow the predictions made in the hypothesis, with the notable exception that retention for students in the repetitive condition appeared to be closer to retention in the other schedules than might be expected. Three potential explanations for the lack of differentiation between schedules come to mind. First, it may be that the number of practice trials at the posttest was sufficient to retain the target skills regardless of schedule. The results from research question two that model data collected throughout the experiment certainly seem to indicate that there is a tangible difference between the three schedules on the accuracy of students during retention trial. Second, practice was delivered over six sessions that were at least a day apart. It may be that there was sufficient distribution of practice to impart some extra retention benefit to students engaged in a repetitive schedule. Third, it may be that the ceiling of three potential correct responses per practice bundle at each pretest did not allow for the demonstration of a substantial difference. There is no precedent in the literature for comparing the efficiency of interleaved and incremental rehearsal practice. The result that interleaved practice is more efficient than incremental rehearsal was expected given that the logistics of incremental rehearsal necessitate a longer schedule. While researchers have examined differing efficiency between different ratios of incremental rehearsal schedules (Swehla et al., 2016), this is the first study that compares efficiency for incremental rehearsal to interleaved practice. The discovery of the efficiency benefit for interleaved practice over incremental rehearsal 94 is an important addition to the literature as well as being directly applicable for teachers in the classroom. Limitations Findings of this study should be interpreted in light of the following limitations. First, the sample recruited for this project was not diverse nor representative of students from the broader population of interest (mid-elementary). The students who participated were a very specific subset of the general population: African-American students in an urban charter school setting, many of whom were on free and reduced lunch plans. The field would benefit from similar experiments conducted with more diverse participant samples to determine if the findings replicate. Second, dosage was not controlled as tightly as originally planned. Due to the limitations of the technology used, students with interrupted sessions were restarted within the session and had more trials than others. Fortunately, the practice opportunity variable accounted for dosage in the model and allowed for comparison controlling for dosage. Third, it was not possible to run the desired model type for the acquisition data. Technically, clustered data with a binary response should be modeled with logistic mixed-effects regression. Logistic mixed-effects models would not converge. Because technical best practices are important (Odenkirk & Ervin, 2000), descriptive statistics were used to provide some insight about the influence of practice schedules rather than reporting suspect models or violating assumptions of independence. A future analysis might define acquisition differently to take advantage of the more appropriate modeling 95 technique. For example, a researcher might model reaction times for acquisition responses and use accuracy as a predictor variable. Finally, this study used single digit addition and multiplication facts as the learning targets. From the results obtained, how different practice schedules might perform within this specific context is now clearer, but it is unclear whether these findings would generalize to other academic skills. This study is not intended to provide a comprehensive statement on the appropriateness of an interleaved scheduled when compared to incremental rehearsal or repetitive practice in all practice situations. It simply provides another piece of evidence that suggests the potential for interleaved practice to help optimize academic temporal resources, particularly in drill and memorization scenarios. Further research is warranted to continue to explore the conditions in which these results hold true, and what mechanisms might drive practice decisions in different scenarios. Implications for Practice Within the context of providing practice opportunities for students to learn single digit math facts, the results of this study can be used to derive some clear expectations for practitioners. First, both incremental rehearsal and interleaved practice would be expected to lead to high predicted rates of accuracy over similar timelines. Second, predicted accuracy for those schedules is higher than for repetitive practice. Thus, not all practice is equal. Third, for the same outcome, interleaved practice can be implemented for less than one third of the temporal resources needed for incremental rehearsal. Thus, 96 all things considered, practitioners might strongly consider using interleaved practice to support single-digit math fact learning. As instructional technology is developed to drill math facts, creators can start tailoring tools to maximize retention for a given amount of instructional time. Developers of instructional software educational games and intervention tools can leverage the results of this study by organizing practice in a way that interleaves learning targets. Further, although this project was implemented with a technology component, the same principles can be applied with low-tech instructional tools. A teacher could administer a pretest and, based on the results, give each student a set of three unknown math fact flash cards. Approximately two minutes of individual practice per day would closely replicate the conditions of this study, with a high probability of leading to long- term retention of the target facts. Implications for Future Research The study described in this paper can be used as a genesis for both further analyses of the data collected, as well as for future studies. As with most research, this study has highlighted some promising results that require further investigation to fully illuminate the potential costs, benefits, and implications of the constructs examined. Data collected throughout this experiment include reaction times and responses to non-target items, both within incremental rehearsal practice and at pretests and posttests. These data could be used to examine the relation between accuracy and reaction time with a potential of creating a metric for mastery that reflects both. It may also be worth 97 looking at skill transfer by using student responses to non-target items in the pretest and posttests. Does practice of math facts lead to increased accuracy of non-practiced facts that use some of the same numbers? Do skills transfer beyond similar facts? Does number of facts retained at the first posttest predict scores on the next pretest, or future learning rates? How does practice condition influence these relationships? The results described above also create a jumping off point for future studies. Direct replications with different sample populations, or target skills could continue the generalization of interleaved practice research results and perhaps outline boundaries of effectiveness. Extending research to include different dimensions of interleaving are also natural avenues for further examination. This study interleaved individual targets within a very specific skill. Could there be a benefit to interleaving between operations or different mathematical processes entirely? A novel line of research could look at the blending of practice schedules. Is it possible that practice that starts repetitive (or incrementally rehearsed) and shifts to interleaved works over a broader range of students? Could student characteristics dictate which schedule or blend of schedules should be used or what dimension is interleaved? Practice schedule manipulation and practice optimization is a fertile field for future follow-up. There is great potential along these lines of inquiry to improve classroom outcomes both related to absolute learning outcomes and regarding learning efficiency. 98 Conclusion The purpose of this study was to compare an interleaved practice schedule to an established practice technique and a dosage control along the dimensions of acquisition accuracy, retention accuracy, and learning efficiency. Results support the use of interleaved practice in single digit math fact drill practice as a method that leads to a high probability of accuracy in retention trials while conserving temporal resources. This study has taken steps to generalize previous findings related to interleaved practice to this specific domain. It has also introduced a novel comparison into the scientific corpus. Future research should continue efforts of generalization and also dive deeper into the mechanisms of interleaving that may allow for further differentiation and utility. 99 REFERENCES Anderson, D. (2008). Model based inference in the life sciences : a primer on evidence. New York; London: Springer. Anderson, D. I., Magill, R. a, & Sekiya, H. (2001). Motor Learning as a Function of KR Schedule and Characteristics of Task-Intrinsic Feedback. Journal of Motor Behavior, 33(1), 59–66. https://doi.org/10.1080/00222890109601903 Benjamin, A. S., & Tullis, J. (2010). What makes distributed practice effective? Cognitive Psychology, 61(3), 228–247. https://doi.org/10.1016/j.cogpsych.2010.05.004 Birnbaum, M. S., Kornell, N., Bjork, E. L., & Bjork, R. a. (2013). Why Interleaving Enhances Inductive Learning: The Roles of Discrimination and Retrieval. Memory & Cognition, 41(3), 392–402. https://doi.org/10.3758/s13421-012-0272-7 Blandin, Y., Proteau, L., & Alain, C. (1994). On the cognitive processes underlying contextual interference and observational learning. Journal of Motor Behavior, 26(1), 18–26. https://doi.org/10.1080/00222895.1994.9941657 Blasiman, R. N. (2017). Distributed Concept Reviews Improve Exam Performance. Teaching of Psychology, 44(1), 46–50. https://doi.org/10.1177/0098628316677646 Booth, J. L., Cooper, L. A., Donovan, M. S., Huyghe, A., Koedinger, K. R., & Paré- Blagoev, E. J. (2015). Design-Based Research Within the Constraints of Practice: AlgebraByExample. Journal of Education for Students Placed at Risk (JESPAR), 100 20(1–2), 79–100. https://doi.org/10.1080/10824669.2014.986674 Broadbent, D. P., Causer, J., Ford, P. R., & Mark Williams, a. (2014). Contextual Interference Effect in Perceptual-Cognitive Skills Training. Medicine and science in sports and exercise. https://doi.org/10.1249/MSS.0000000000000530 Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach Second Edition (Second). New York: Springer. Burnham, K. P., Anderson, D. R., & Huyvaert, K. P. (2011). AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology, 65(1), 23–35. Retrieved from http://journals.sagepub.com/doi/10.1177/0888406417730112 Burns, M. K. (2005). Using Incremental Rehearsal to Increase Fluency of Single-Digit Multiplication Facts With Children Identified as Learning Disabled in Mathematics Computation. Education and Treatment of Children, 28(3), 237–249. Retrieved from https://pantherfile.uwm.edu/dermer/public/courses/620/Articles/peter_platten_first.p df Carpenter, S. K., Cepeda, N. J., Rohrer, D., Kang, S. H. K., & Pashler, H. (2012). Using Spacing to Enhance Diverse Forms of Learning: Review of Recent Research and Implications for Instruction. Educational Psychology Review, 24(3), 369–378. https://doi.org/10.1007/s10648-012-9205-z 101 Carr, M., & Alexeev, N. (2011). Fluency, accuracy, and gender predict developmental trajectories of arithmetic strategies. Journal of Educational Psychology, 103(3), 617–631. https://doi.org/10.1037/a0023864 Carter, C. E., & Grahn, J. A. (2016). Optimizing music learning: Exploring how blocked and interleaved practice schedules affect advanced performance. Frontiers in Psychology, 7(AUG), 1–10. https://doi.org/10.3389/fpsyg.2016.01251 Carvalho, P. F., & Goldstone, R. L. (2014a). Effects of Interleaved and Blocked Study on Delayed Test of Category Learning Generalization. Frontiers in Psychology, 5(AUG). https://doi.org/10.3389/fpsyg.2014.00936 Carvalho, P. F., & Goldstone, R. L. (2014b). Putting category learning in order: Category structure and temporal arrangement affect the benefit of interleaved over blocked study. Memory and Cognition, 42, 481–495. https://doi.org/10.3758/s13421-013- 0371-0 Carvalho, P. F., & Goldstone, R. L. (2015). The benefits of interleaved and blocked study: different tasks benefit from different schedules of study. Psychonomics Bulletin Review, 22, 281–288. Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed Practice in Verbal Recall Tasks : A Review and Quantitative Synthesis. Psychological Bulletin, 132(3), 354–380. https://doi.org/10.1037/0033- 2909.132.3.354 Codding, R. S., Archer, J., & Connell, J. (2010). A systematic replication and extension 102 of using incremental rehearsal to improve multiplication skills: An investigation of generalization. Journal of Behavioral Education, 19(1), 93–105. https://doi.org/10.1007/s10864-010-9102-9 Committee, N. R. C. & M. L. S. (2001). Adding It Up. National Academies Press. https://doi.org/10.17226/9822 Cooper, E. H., & Pantle, A. J. (1967). THE TOTAL-TIME HYPOTHESIS IN VERBAL LEARNING. Psychological Bulletin, 68(4), 221–234. Desmottes, L., Maillart, C., & Meulemans, T. (2017). Mirror-drawing skill in children with specific language impairment: Improving generalization by incorporating variability into the practice session. Child Neuropsychology, 23(4), 463–482. https://doi.org/10.1080/09297049.2016.1170797 Faraway, J. (2005). Linear models in R, 56(5). Faraway, J. (2016). Extending the Linear Model with R Generalized Linear, Mixed Effects and Nonparametric Regression Models (Second). Boca Raton: CRC Press. Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–1160. https://doi.org/10.3758/BRM.41.4.1149 Fishman, E. J., Keller, L., & Atkinson, R. C. (1968). Massed versus distributed practice in computerized spelling drills. Journal of Education & Psychology, 59(4), 290–296. https://doi.org/10.1037/h0020055 103 Fuchs, L. S., Fuchs, D., & Malone, A. S. (2017). The Taxonomy of Intervention Intensity. Teaching Exceptional Children, 50(1), 35–43. https://doi.org/10.1177/0040059917703962 Geary, D. C. (2005). Role of cognitive theory in the study of learning disability in mathematics. Journal of Learning Disabilities, 38(4), 305–307. https://doi.org/10.1177/00222194050380040401 Gersten, R., Beckmann, S., Clarke, B., Foegen, A., Marsh, L., Star, J. R., & Witzel, B. (2009). Assisting Students Struggling with Mathematics:Response to Intervention (RtI) for elementary and middle schools. What Works Clearinghouse, 1–98. https://doi.org/10.1016/j.jhazmat.2011.04.026 Gettinger, M., Bryant, N. D., & Fayne, H. R. (1982). Designing Spelling Instruction for Learning-Disabled Children: An Emphasis on Unit Size, Distributed Practice, and Training for Transfer. The Journal of Special Education, 16(4), 439–448. https://doi.org/10.1177/002246698201600407 Guadagnoli, M. a, & Lee, T. D. (2004). Challenge Point: a Framework for Conceptualizing the Effects of Various Practice Conditions in Motor Learning. Journal of Motor Behavior, 36(2), 212–224. https://doi.org/10.3200/JMBR.36.2.212-224 Hausman, H., & Kornell, N. (2014). Mixing topics while studying does not enhance learning. Journal of Applied Research in Memory and Cognition, 3(3), 153–160. https://doi.org/10.1016/j.jarmac.2014.03.003 104 Healy, A. F., Kole, J. A., & Bourne, L. E. (2014). Training principles to advance expertise. Frontiers in Psychology. https://doi.org/10.3389/fpsyg.2014.00131 Hilbe, J. M. (2009). Logistic Regression Models (1st ed.). New York: Chapman and Hall/CRC. https://doi.org/https://doi.org/10.1201/9781420075779 Joseph, L. M. (2006). Incremental Rehearsal: A Flashcard Drill Technique for Increasing Retention of Reading Words. The Reading Teacher, 59(8), 803–807. https://doi.org/10.1598/RT.59.8.8 Kang, S. H. K., & Pashler, H. (2012). Learning Painting Styles : Spacing is Advantageous when it Promotes Discriminative Contrast. Applied Cognitive Psychology, 103(May 2011), 97–103. Kornell, N., & Bjork, R. a. (2008). Learning concepts and categories: is spacing the “enemy of induction”? Psychological Science, 19(6), 585–592. https://doi.org/10.1111/j.1467-9280.2008.02127.x Kornell, N., Castel, A. D., Eich, T. S., & Bjork, R. a. (2010). Spacing as the friend of both memory and induction in young and older adults. Psychology and Aging, 25(2), 498–503. https://doi.org/10.1037/a0017807 Kulasegaram, K., Min, C., Howey, E., Neville, A., Woods, N., Dore, K., & Norman, G. (2015). The mediating effect of context variation in mixed practice for transfer of basic science. Advances in Health Sciences Education : Theory and Practice, 20(4), 953–968. https://doi.org/10.1007/s10459-014-9574-9 105 Landin, D., & Hebert, E. P. (1997). Comparison of Three Practice Schedules along the Contextual Interference Continuum. Research Quarterly for Exercise and Sport, 68(4), 357–361. https://doi.org/10.1080/02701367.1997.10608017 Lee, T. D., & Magill, R. a. (1983). The Locus of Contextual Interference in Motor-Skill Acquisition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9(4), 730–746. https://doi.org/10.1037//0278-7393.9.4.730 Long, J. D. (2012). Longitudinal Data Analysis for the Behavioral Sciences Using R. Los Angeles: Sage. MacQuarrie, L. L., Tucker, J. a., Burns, M. K., & Hartman, B. (2002). Comparison of Retntion Rates Using Traditional, Drill Sanwich, and Incremental Rehearsal Flash Card Methods. School Psychology Review. Magill, R. A., & Hall, K. G. (1990). A Review of the Contextual Interference Effect in Motor Skill Acquisition. Human Movement Science, 9, 241–289. Mitchell, C., Nash, S., & Hall, G. (2008). The intermixed-blocked effect in human perceptual learning is not the consequence of trial spacing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(1), 237–242. https://doi.org/10.1037/0278-7393.34.1.237 Morehead, K., Rhodes, M. G., & DeLozier, S. (2016). Instructor and student knowledge of study strategies. Memory, 24(2), 257–271. https://doi.org/10.1080/09658211.2014.1001992 106 National Mathematics Advisory Panel. (2008). The Final Report of the National Mathematics Advisory Panel. Foundations, 37(9), 595–601. https://doi.org/10.3102/0013189X08329195 Odenkirk, B. (Writer), & Ervin, M. (Director). (2000, April 2). How Hermes Requisitioned His Groove Back. [Television Series Episode] In D. Cohen, M. Groening & C. Katz (Producers), Futurama. Los Angeles. Ostrow, K., Heffernan, N., Heffernan, C., & Peterson, Z. (2015). Blocking vs. Interleaving: Examining Single-Session Effects Within Middle School Math Homework. In International Conference on Artificial Intelligence in Education (pp. 338–347). https://doi.org/10.1007/978-3-319-19773-9 Pashler, H., Bain, P. M., Bottge, B. A., Graesser, A., Koedinger, K., McDaniel, M., & Metcalfe, J. (2007). Organizing Instruction and Study to Improve Student Learning. US Department of Education National Center for Education Research. Pashler, H., Rohrer, D., Cepeda, N., & Carpenter, S. (2007). Enhancing learning and retarding forgetting : Choices and consequences. Psychonomic Bulletin & Review, 14(2), 187–193. Pollatou, E., Kioumourtzoglou, E., Agelousis, N., & Mavromatis, G. (1997). Contextual Interference Effects in Learning Novel Motor Skills. Perceptual and Motor Skills, 84, 487–496. Powell, S. R., Fuchs, L. S., & Fuchs, D. (2013). Reaching the mountaintop: Addressing the common core standards in mathematics for students with mathematics 107 difficulties. Learning Disabilities Research and Practice, 28(1), 38–48. https://doi.org/10.1111/ldrp.12001 Rau, M. A., Aleven, V., & Rummel, N. (2013). Interleaved practice in multi-dimensional learning tasks: Which dimension should we interleave? Learning and Instruction, 23(1), 98–114. https://doi.org/10.1016/j.learninstruc.2012.07.003 Rau, M., Aleven, V., & Rummel, N. (2010). How to Schedule Multiple Graphical Representations? A Classroom Experiment with an Intelligent Tutoring System for Fractions. In Intelligent Tutoring Systems (pp. 413–422). Rau, M., Aleven, V., & Rummel, N. (2013). How to use multiple graphical representations to support conceptual learning? research-based principles in the fractions tutor. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7926 LNAI, 762–765. https://doi.org/10.1007/978-3-642-39112-5-107 Rau, M., Aleven, V., Rummel, N., & Pardos, Z. (2014). How Should Intelligent Tutoring Systems Sequence Multiple Graphical Representations of Fractions? A Multi- Methods Study. International Journal of Artificial Intelligence in Education, 24(2), 125–161. https://doi.org/10.1007/s40593-013-0011-7 Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. SAGE Publications. Retrieved from https://books.google.com/books?id=uyCV0CNGDLQC&pgis=1 Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchex, J.-C., & Muller, M. 108 (2011). pROC : an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. Rohrer, D. (2012). Interleaving Helps Students Distinguish among Similar Concepts. Educational Psychology Review, 24(3), 355–367. https://doi.org/10.1007/s10648- 012-9201-3 Rohrer, D., Dedrick, R., & Burgess, K. (2014). The benefit of interleaved mathematics practice is not limited to superficially similar kinds of problems. Psychonomic Bulletin & Review. Retrieved from http://link.springer.com/article/10.3758/s13423- 014-0588-3 Rohrer, D., Dedrick, R. F., & Stershic, S. (2015). Interleaved practice improves mathematics learning. Journal of Educational Psychology, 107(3), 900–908. https://doi.org/10.1037/edu0000001 Rohrer, D., & Taylor, K. (2007). The shuffling of mathematics problems improves learning. Instructional Science, 35(6), 481–498. https://doi.org/10.1007/s11251-007- 9015-8 Sana, F., Yan, V. X., & Kim, J. A. (2017). Study sequence matters for the inductive learning of cognitive concepts. Journal of Educational Psychology, 109(1), 84–98. https://doi.org/10.1037/edu0000119 Schmidt, R. A., & Lee, T. D. (2011). Motor control and learning : a behavioral emphasis. Human Kinetics. 109 Schutte, G. M., Duhon, G. J., Solomon, B. G., Poncy, B. C., Moore, K., & Story, B. (2015). A Comparative Analysis of Massed vs Distributed Practice on Basic Math Fact Fluency Growth Rates. Journal of School Psychology, 53, 149–159. Shea, C. H., Kohl, R., & Indermill, C. (1990). Contextual interference: Contributions of practice. Acta Psychologica, 73(2), 145–157. https://doi.org/10.1016/0001- 6918(90)90076-R Shea, J. B., & Morgan, R. L. (1979). Contextual interference effects on the acquisition, retention, and transfer of a motor skill. Journal of Experimental Psychology: Human Learning & Memory, 5(2), 179–187. https://doi.org/10.1037//0278-7393.5.2.179 Simon, D. A., Lee, T. D., & Cullen, J. D. (2008). Win-shift, lose-stay: contingent switching and contextual interference in motor learning. Percept Mot Skills, 107(2), 407–418. https://doi.org/10.2466/pms.107.2.407-418 Sorensen, L. J., & Woltz, D. J. (2016). Blocking as a friend of induction in verbal category learning. MEMORY & COGNITION, 44(7), 1000–1013. https://doi.org/10.3758/s13421-016-0615-x Spybrook, J., Bloom, H., Congdon, R., Hill, C., Martinez, a, & Raudenbush, S. (2011). Optimal design for longitudinal and multilevel research: Documentation for the “Optimal Design” software. Survey Research …, 1–215. https://doi.org/10.1037/h0065543 Stambaugh, L. a. (2011). When Repetition Isn’t the Best Practice Strategy: Effects of Blocked and Random Practice Schedules. Journal of Research in Music Education, 110 58(4), 368–383. https://doi.org/10.1177/0022429410385945 Ste-Marie, D. M., Clark, S. E., Findlay, L. C., & Latimer, A. E. (2004). High levels of contextual interference enhance handwriting skill acquisition. Journal of Motor Behavior, 36(1), 115–126. https://doi.org/10.3200/JMBR.36.1.115-126 Swehla, J., Burns, M., Zaslofsky, A., Hall, M., Varma, S., & Volpe, R. (2016). Examining the Use of Spacing Effect to Increase the Efficiency of Incremental Rehearsal. Psychology in the Schools, 53(4), 404–415. https://doi.org/10.1002/pits.21909 Taylor, K., & Rohrer, D. (2010). The Effects of Interleaved Practice. Applied Cognitive Psychology, 848(July 2009), 837–848. https://doi.org/10.1002/acp Underwood, B. J. (1970). A breakdown of the total-time law in free-recall learning. Journal of Verbal Learning and Verbal Behavior, 9(5), 573–580. https://doi.org/10.1016/S0022-5371(70)80104-9 Vakil, E., & Heled, E. (2016). The effect of constant versus varied training on transfer in a cognitive skill learning task: The case of the Tower of Hanoi Puzzle. Learning and Individual Differences, 47, 207–214. https://doi.org/10.1016/j.lindif.2016.02.009 Varma, S., & Schleisman, K. B. (2014). The Cognitive Underpinnings of Incremental Rehearsal. School Psychology Review, 43(2), 222–228. What Works Clearinghouse. (2017). Standards Handbook (4th ed.). Princeton, NJ. Retrieved from http://ies.ed.gov/ncee/wwc 111 Wulf, G., & Schmidt, R. a. (1997). Variability of practice and implicit motor learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(4), 987–1006. https://doi.org/10.1037//0278-7393.23.4.987 Zulkiply, N., & Burt, J. S. (2013). The exemplar interleaving effect in inductive learning: moderation by the difficulty of category discriminations. Memory & Cognition, 41(1), 16–27. https://doi.org/10.3758/s13421-012-0238-9 112 Appendices Appendix A: Candidate Models for Research Question 2 Table A1 Candidate Models for Research Question 2 Model ?̂?10 = 𝛾00 + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒)+ 𝛾80(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾90(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾80(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝛾90(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾80(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝛾50(𝐼𝐿 𝑥 𝑃𝑂) + 𝛾60(𝐼𝑅 𝑥 𝑃𝑂) + 𝛾70(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾80(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾10 0(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝛾11 0(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂)+ 𝛾12 0(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂) + 𝛾13 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝑢0𝑗 + 𝑟𝑖𝑗 113 Table A1 (continued) Model ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂 2) + 𝛾70(𝐼𝑅 𝑥 𝑃𝑂 2) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂 2) + 𝛾70(𝐼𝑅 𝑥 𝑃𝑂 2) + 𝛾80(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒)+ 𝛾90(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾10 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂 2) + 𝛾70(𝐼𝑅 𝑥 𝑃𝑂 2) + 𝛾80(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂 2) + 𝛾70(𝐼𝑅 𝑥 𝑃𝑂 2) + 𝛾80(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾90(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 2) + 𝛾10 0(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 2) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂 2) + 𝛾70(𝐼𝑅 𝑥 𝑃𝑂 2) + 𝛾80(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾10 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝐼𝐿) + 𝛾50(𝐼𝑅) + 𝛾60(𝐼𝐿 𝑥 𝑃𝑂 2) + 𝛾70(𝐼𝑅 𝑥 𝑃𝑂 2) + 𝛾80(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾10 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾11 0(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 2) + 𝛾12 0(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥𝑃𝑂 2)+ 𝛾13 0(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂 2) + 𝛾14 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂 2) + 𝑢0𝑗 + 𝑟𝑖𝑗 114 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝑢0𝑗 + 𝑟𝑖𝑗 Table A1 (continued) Model ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾30(𝐼𝐿) + 𝛾40(𝐼𝑅) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂 3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂 3) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂 3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂 3) + 𝛾90(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒)+ 𝛾10 0(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾11 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂 3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂 3) + 𝛾90(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂 3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂 3) + 𝛾90(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾10 0(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 3) + 𝛾110(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 3) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂 3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂 3) + 𝛾90(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾10 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾11 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾50(𝐼𝐿) + 𝛾60(𝐼𝑅) + 𝛾70(𝐼𝐿 𝑥 𝑃𝑂 3) + 𝛾80(𝐼𝑅 𝑥 𝑃𝑂 3) + 𝛾90(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾10 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾11 0(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾12 0(𝐼𝐿 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 3) + 𝛾13 0(𝐼𝑅 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥𝑃𝑂 2)+ 𝛾14 0(𝐼𝐿 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂 3) + 𝛾15 0(𝐼𝑅 𝑥 𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂 3) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 115 Table A1 (continued) Model ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾40(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾40(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾50(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30( 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂)+ 𝛾40(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂) ?̂?𝑖𝑗 = (𝛾00 + 𝑢10) + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾30(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾50(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 2) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾50(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾60(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40( 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 2)+ 𝛾50(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂 2) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾50(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 116 Table A1 (continued) Model ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾30(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾50(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾60(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 2) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾50(𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒)+ 𝛾60(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒) + 𝛾70(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒) + 𝑢0𝑗 + 𝑟𝑖𝑗 ?̂?𝑖𝑗 = 𝛾00 + 𝛾10(𝐷𝑎𝑦) + 𝛾20(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠) + 𝛾30(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 2) + 𝛾40(𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑂𝑝𝑝𝑜𝑟𝑡𝑢𝑛𝑖𝑡𝑖𝑒𝑠 3) + 𝛾50( 𝑃𝑟𝑒𝑡𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 𝑥 𝑃𝑂 2)+ 𝛾60(𝐷𝑎𝑦𝑠 𝑆𝑖𝑛𝑐𝑒 𝑃𝑟𝑎𝑐𝑡𝑖𝑐𝑒 𝑥 𝑃𝑂 2) + 𝑢0𝑗 + 𝑟𝑖𝑗