Bayesian Hierarchical Modeling based on Multi-Source Exchangeability A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Alexander Mark Kaizer IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy Advised by Joseph S. Koopmeiners, Ph.D. July, 2017 c© Alexander Mark Kaizer 2017 ALL RIGHTS RESERVED Acknowledgements I have been extremely fortunate to have worked with so many professionals and mentors who have inspired me and helped me during my academic development. I am eternally grateful to my thesis adviser, Joe Koopmeiners, who has served as an excellent role model, provided a seemingly never ending supply of patience as I learned how to be a biostatis- tician, and was always willing to help me connect the dots when things did not seem to make sense. I am also extremely appreciative of all the support and insights from Brian Hobbs which helped to make this dissertation possible from weekly conference calls in Joe’s office to extensive email chains. Additionally, my love for collaborative work with medical professionals would not have been possible without the guidance and encouragement of Kyle Rudser. I am very grateful to my committee members, Brad Carlin, Haitao Chu, and Julia Drew for taking the time to review my thesis and for asking questions which challenged me, but ultimately led to a stronger dissertation. i Dedication This thesis is dedicated to my family who have supported me from both near and far, and to my housemates who have become a second family here in Minnesota. ii Abstract Progress in medical practice traditionally takes place over a sequence of clinical stud- ies which are designed to establish clinical efficacy, identify the safety profile, and seek regulatory approval for a novel treatment strategy. These trials in humans can be ex- pensive and present numerous challenges in their implementation. While some challenges may be addressed by the development of innovative trial designs, it may also be advan- tageous to incorporate supplemental sources of information, which are typically ignored in traditional approaches to analysis. In this dissertation, we introduce Multi-Source Ex- changeability Models (MEMs), a general Bayesian hierarchical approach that integrates supplemental data arising from multiple, possibly non-exchangeable, sources into the anal- ysis of a primary source. We first describe the proposed framework and prove some desir- able asymptotic properties that show the consistency of posterior estimation. Simulation results illustrate that MEMs incorporate more supplemental information in the presence of homogeneous supplemental sources and exhibit reduced bias in the presence of heteroge- neous supplemental sources relative to competing Bayesian hierarchical modeling strategies. Next, we illustrate how MEMs can be used to design a more efficient sequential platform de- sign for Ebola virus disease by sharing information across trial segments. When compared to the standard platform design, we demonstrate that MEMs with adaptive randomization improved power by as much as 51% with limited type-I error inflation. We conclude by extending our work with model averaging to the estimation of multiple mixture distribu- tions in the presence of a hypothesized biological relationship between groups to identify non-compliance in a regulatory tobacco clinical trial. The results of this dissertation illus- trate that MEMs yield favorable characteristics across a variety of scenarios and motivates further research to extend the MEM framework to other settings, as well. iii Contents Acknowledgements i Dedication ii Abstract iii List of Tables viii List of Figures xiii 1 Introduction 1 1.1 Incorporating supplemental sources of data into a primary source . . . . . . 1 1.1.1 Previous approaches to incorporating supplemental sources of infor- mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Motivating examples and plan of dissertation . . . . . . . . . . . . . . . . . 3 1.2.1 A regulatory tobacco clinical trial . . . . . . . . . . . . . . . . . . . 3 1.2.2 An adaptive platform trial design for emerging infectious disease epi- demics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.3 Dissertation objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Bayesian Hierarchical Modeling based on Multi-Source Exchangeability 8 2.1 Multi-source Exchangeability Models . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Estimation and theoretical results . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Specification of model-specific prior weights . . . . . . . . . . . . . . 13 2.2.2 Asymptotic properties . . . . . . . . . . . . . . . . . . . . . . . . . . 15 iv 2.3 Simulation studies to establish small sample properties . . . . . . . . . . . . 18 2.3.1 Simulation design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Dynamic borrowing . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Bias and coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4 Application to a regulatory tobacco clinical trial . . . . . . . . . . . . . . . 22 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3 A Multi-source Adaptive Platform Design for Emerging Infectious Dis- eases 30 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Standard design of PREVAIL II master protocol . . . . . . . . . . . . . . . 32 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 General framework of multi-source adaptive designs . . . . . . . . . 35 3.3.2 Incorporating supplemental information with Bayesian modeling us- ing MEMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.3 MEM prior probability specification . . . . . . . . . . . . . . . . . . 40 3.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.1 Parameter calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4 A Fully Bayesian Mixture Model Approach for Identifying Non-Compliance in a Regulatory Tobacco Clinical Trial 54 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 Methods for estimating compliance within randomized groups . . . . . . . . 57 4.2.1 Prior distributions and posterior inference . . . . . . . . . . . . . . . 59 4.2.2 Model averaging with RJMCMC . . . . . . . . . . . . . . . . . . . . 61 4.3 Simulation studies to establish small sample properties . . . . . . . . . . . . 64 4.4 Application to a regulatory tobacco clinical trial . . . . . . . . . . . . . . . 74 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 v 5 Conclusion 84 5.1 Summary of developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2 Significance of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3 Future work and considerations . . . . . . . . . . . . . . . . . . . . . . . . . 86 Bibliography 88 Appendix A. Proof of theorem from Chapter 2 95 A.1 Proofs for Convergence of Model Weights . . . . . . . . . . . . . . . . . . . 95 A.1.1 Convergence of the marginal likelihoods . . . . . . . . . . . . . . . . 95 A.1.2 Convergence of the ratio of marginal likelihoods . . . . . . . . . . . . 97 A.1.3 Convergence of the priors . . . . . . . . . . . . . . . . . . . . . . . . 98 Appendix B. Additional simulation results for Chapter 2 100 B.1 Simulation Operating Characteristics . . . . . . . . . . . . . . . . . . . . . . 100 Appendix C. Additional simulation results for Chapter 3 104 C.1 Additional simulation results for proposed multi-source adaptive platform design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 C.1.1 Both PP thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 C.1.2 RR=0.5 results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 C.1.3 Exploring different parameter values . . . . . . . . . . . . . . . . . . 109 C.1.4 Two effective therapeutics . . . . . . . . . . . . . . . . . . . . . . . . 113 Appendix D. R Code for Chapter 4 116 D.1 Functions to implement Gibbs samplers without model averaging . . . . . . 117 D.1.1 Gibbs sampler for approach not assuming a relationship . . . . . . . 117 D.1.2 Gibbs sampler for approach assuming a relationship . . . . . . . . . 119 D.1.3 Implement Gibbs samplers for posterior estimation . . . . . . . . . . 121 D.2 Reversible-jump MCMC algorithm functions . . . . . . . . . . . . . . . . . . 125 D.2.1 Gibbs sampler to flexibly estimate with either approach for RJMCMC125 D.2.2 RJMCMC update step . . . . . . . . . . . . . . . . . . . . . . . . . . 128 D.2.3 Function keep track of RJMCMC model state . . . . . . . . . . . . . 131 vi D.2.4 Implement entire RJMCMC algorithm . . . . . . . . . . . . . . . . . 131 Appendix E. Acronyms 135 vii List of Tables 2.1 Summary statistics for change in cigarettes smoked daily since baseline by group and posterior model estimates and ESSS for no borrowing, MEM pie, MEM pin, CP, and SHM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 Posterior model weights for each multi-source exchangeability under various priors to demonstrate the impact different priors can have on the resulting weights. P represents the current segments, Sh represents the supplemental segments potentially available for incorporation (h = 1, 2, 3). . . . . . . . . . 41 3.2 Posterior probability thresholds for MEMs with empirical Bayesian prior (piEB10), MEMs with fully Bayesian uniform prior (pie), and the naive pool- ing (POOL) approach presented to optimize to maintain average type-1 error rate across scenarios for constant underlying mortality scenario only (“Constant”) or to minimize trade-off in inflation of type-1 error in constant underlying mortality scenario while maintaining power equal to, or greater, than the PREVAIL II scenario under the varying underlying mortality sce- nario (“Both”). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 viii 3.3 Operating characteristics and trial properties for the utilized platform de- sign as well as alternative adaptive platform designs. 25,000 simulations for the constant underlying mortality case (p = 0.4 for all segments) with RR=0.7 for non-null segments for the PREVAIL II (P-II) master protocol; MEMs incorporating adaptive randomization with the constrained empiri- cal Bayes, c = .10 prior (piEB10) and the fully Bayesian uniform prior (pie); and the naive pooling (POOL) of all supplemental information incorporat- ing adaptive randomization using posterior probability thresholds optimized for the constant mortality case. Results provided for power/type-I error for each segment, average (sd) total sample size (N) across entire trial, aver- age (sd) proportion allocated to treatment arm in segments 2-5, and average (sd) proportion surviving in the non-null segments (for Trt=S2-S5) or across segments 2-5 (for Trt=S0). . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4 Operating characteristics and trial properties for the utilized platform de- sign as well as alternative adaptive platform designs. 25,000 simulations for the varying underlying mortality case (p = (0.74, 0.61, 0.48, 0.36, 0.23) for segments 1-5, respectively) with RR=0.7 for non-null segments for the PREVAIL II (P-II) master protocol; MEMs incorporating adaptive random- ization with the constrained empirical Bayes, c = .10 prior (piEB10) and the fully Bayesian uniform prior (pie); and the naive pooling (POOL) of all sup- plemental information incorporating adaptive randomization using posterior probability thresholds optimized for the constant mortality case. Results provided for power/type-I error for each segment, average (sd) total sample size (N) across entire trial, average (sd) proportion allocated to treatment arm in segments 2-5, and average (sd) proportion surviving in the non-null segments (for Trt=S2-S5) or across segments 2-5 (for Trt=S0). . . . . . . . 50 ix 4.1 Scenario 1 simulation results where all dose-levels follow the assumed rela- tionship. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) pro- vided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2 Scenario 2 simulation results where dose-level 4 does not follow the assumed relationship. Bias and MSE reported for proportion compliant (pˆc), com- pliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3 Scenario 3 simulation results where dose-levels 3 and 4 do not follow the assumed relationship. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 x 4.4 Scenario 4 simulation results where dose-levels 2, 3, and 4 do not follow the assumed relationship. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5 Scenario 5 simulation results where no dose-level follows the assumed rela- tionship. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) pro- vided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6 Scenario 6 simulation results the dose-level relationship underestimates the compliant means. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.7 Summary statistics at week 6 by CENIC-p1 groups for all subjects and restricted to those self-reporting compliance at week 6. Mean (sd) for con- tinuous measures, N (%) for categorical measures. . . . . . . . . . . . . . . 75 4.8 Mean (95% HPD interval) of proportion in compliant group (pˆc), compliant component mean log(TNE) (µˆ1), non-compliant component mean log(TNE) (µˆ2), shared standard deviation (σˆ), 90/95th percentile of µˆ1, and proportion of the chain spent in the REL approach for the RJMCMC. . . . . . . . . . 79 xi C.1 Constant underlying mortality scenario results using posterior probability thresholds calibrated for both constant and varying mortality cases with RR=0.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 C.2 Varying underlying mortality scenario results using posterior probability thresholds calibrated for both constant and varying mortality cases with RR=0.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 C.3 Constant underlying mortality scenario results with RR=0.5 for the PRE- VAIL II design (P-II) and piEB10 with posterior probability thresholds cali- brated under the constant mortality only (-C) or both constant and varying mortality (-B). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 C.4 Varying underlying mortality scenario results with RR=0.5 for the PRE- VAIL II design (P-II) and piEB10 with posterior probability thresholds cali- brated under the constant mortality only (-C) or both constant and varying mortality (-B). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 C.5 Constant underlying mortality scenario results assuming RR=0.7 with piEB10 results provided for direct comparison with increased and decreased values for nburn, increased c to 0.20, no AR, and interim monitoring (IM) only for AR without possibility of terminating early. . . . . . . . . . . . . . . . . . . 111 C.6 Varying underlying mortality scenario results assuming RR=0.7 with piEB10 results provided for direct comparison with increased and decreased values for nburn, increased c to 0.20, no AR, and interim monitoring (IM) only for AR without possibility of terminating early. . . . . . . . . . . . . . . . . . . 112 C.7 Constant underlying mortality scenario results, RR=0.7, assuming two ef- fective therapeutics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 C.8 Varying underlying mortality scenario results, RR=0.7, assuming two effec- tive therapeutics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 E.1 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 xii List of Figures 2.1 Each MEM is a combination of supplemental sources assumed exchangeable with the primary cohort in order to estimate the parameters of interest, θp, and is contained within each box for Ωk. Within a box the solid arrows θp and the observables, yh, represent which supplemental sources are as- sumed exchangeable with the primary cohort within the given MEM. The dashed arrows represent that the posterior model weights for each MEM, ωk, are used in calculating the weighted average of each MEM’s posterior distribution, p(θp|Ωk, D), to be used for posterior inference, p(θp|D). . . . 10 2.2 Large-sample properties for posterior estimation of model weights for (a) MEM with pin approach as only n→∞ and (b) n, n1, n2, n3 →∞ assuming S1 exchangeable (M2). Large-sample properties for posterior estimation of model weights with pin under (c) BMA and (d) MEM as n, n1, n2, n3 →∞ assuming S1 and S2 exchangeable (M5). Model labels are imposed along the convergence trajectory to help with identification of overlapping lines. . . . 17 2.3 Median effective supplemental sample size using MEM with pie, pin priors, CP, and SHM under each scenario. Dashed vertical gray lines are used to represent assumed observed values of the supplemental group means for each scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Plots demonstrating bias versus shrinkage trade-offs using the methods of CP, SHM, and MEM with pie, pin source-inclusion priors. Note that CP overlaps for all four scenarios. . . . . . . . . . . . . . . . . . . . . . . . . . . 22 xiii 2.5 Plot comparing percent change in mean estimation from the standard model with no borrowing and the percent reduction in the posterior standard de- viation for the difference between the treatment and control groups in Table 2.1 for the standard approach with no borrowing to the CP, SHM, and MEM with pie, pin source-inclusion priors approaches. . . . . . . . . . . . . . . . . . 26 3.1 Example comparing three segments of a trial with (I) PREVAIL II mas- ter protocol which only compares contemporaneously enrolled subjects with equal allocation to the study arms, τ = 0.5, versus (II) framework with methods to potentially incorporate non-contemporaneous data and ability to adaptively alter the randomization ratio as a function of the effective supplemental sample size, τ(t) = f(ESSS). Equally sized triangles further indicate segments with equal allocation versus smaller oSOC triangles which indicate the potential for greater allocation to the experimental arm in the presence of supplemental information. . . . . . . . . . . . . . . . . . . . . . 33 3.2 Plots demonstrating power versus average type-1 error rate across scenarios for segments 2-5 for PREVAIL II, MEMs with piEB10 and pie priors, and the naive pooling case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 Proportion assigned to treatment arm across segments 2-5 (left) and pro- portion surviving across segments 2-5 (Null scenario) or within the non-null segment (Scenarios 2-5) (right) under the constant mortality scenario (top) and the varying mortality scenario (bottom) for the PREVAIL II design, MEM-based designs, and naive pooling. . . . . . . . . . . . . . . . . . . . . 47 4.1 Histograms for distribution of self-reported compliers of log(TNE) in each CENIC-p1 treatment grouping at week 6. . . . . . . . . . . . . . . . . . . . 76 4.2 Histograms for distribution of self-reported compliers of log(TNE) in each CENIC-p1 treatment arm at week 6 with mixture densities for the RJ ap- proach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3 Predicted compliance (Pr(C=1)) as a function of observed TNE by study group for IND approach and model averaging with RJMCMC. . . . . . . . 81 B.1 Bias for Each Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 B.2 MSE for Each Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 xiv B.3 Coverage for Each Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 xv Chapter 1 Introduction 1.1 Incorporating supplemental sources of data into a pri- mary source Advances in medical practice arise from evaluating therapeutic interventions over a se- quence of clinical studies devised to establish the clinical efficacy and safety profile of a novel treatment strategy. Conduct of clinical trials in humans is expensive and inher- ently challenging. Furthermore, the current paradigm for evaluating novel interventions, whereby therapies are screened one-at-a-time in phases, remains inefficient with each in- vestigational drug requiring a sequence of disjointed trials designed to acquire and evaluate different types of information. Moreover, design, review, and initiation of a single study, a period often referred to as operational “whitespace,” is a gradual process. After initi- ation, a successful trial usually requires several years to achieve the targeted enrollment. Additionally, a considerable proportion of studies fail due to low recruitment (Williams et al., 2015). This system for clinical testing produces redundancy, whereby similar treat- ment strategies are replicated, either as experimental or standard-of-care therapy, across multiple studies and development phases. Some of these limitations to traditional clinical trial progression may be addressed by the development of new designs for clinical trials, such as sequential platform trials, which incorporate and test multiple treatments sequentially within a single master proto- col. However, these newer designs present some potential redundancies as well. Within 1 2a sequential platform trial there may be supplemental sources of information from previ- ously completed segments of the trial which are ignored in favor of the traditional approach to analysis which only utilizes contemporaneously collected information. This represents a potential inefficiency, as data from previous segments may provide useful information about the current segment, as well. Conventional approaches to statistical inference, which assume exchangeable data sam- pling models, are inappropriate for integrating data from disparate studies because of their failure to account for between-study heterogeneity, which yields statistical estimators that are sensitive to inter-cohort bias. Thus, while ignoring relevant, supplemental information reduces the efficiency of any study, supplemental data acquired from broadly similar ther- apeutic interventions, patient cohorts, previous investigations, or biological processes is often excluded from statistical analysis, in practice (U.S. Food and Drug Administration, 2001). 1.1.1 Previous approaches to incorporating supplemental sources of in- formation Statistical methods for integrating information from commensurate trials that relax the assumption of inter-cohort data exchangeability and leverage inter-trial redundancy have been developed. Pocock (1976) was first to propose using Bayesian models to incorporate supplemental information into the analysis of a primary data source through static, data- independent shrinkage estimators that require the extent of between-source variability to be pre-specified. Numerous models have been discussed since, which involve pre-specification of the amount of borrowing under different paradigms related to the power prior (Ibrahim and Chen, 2000; Hobbs et al., 2011; De Santis, 2006; Rietbergen et al., 2011) or inflating the standard error to down-weight supplemental cohorts (Goodman and Sladky, 2005; French et al., 2012; Whitehead et al., 2008). Hierarchical linear models and models which include adaptive down-weighting of data from supplemental cohorts have been extensively explored, as well. For these models, the extent of shrinkage towards the supplemental sources is not predetermined but is estimated from the data. More strength is borrowed in the absence of evidence for inter-trial effects, 3which controls the extent of bias induced from using the supplemental information. One ap- proach is the power prior of Ibrahim and Chen (2000) which can be constructed to discount supplemental sources relative to the primary data. Bayesian (Smith et al., 1995) and fre- quentist (Doi et al., 2011) methods which utilize hierarchical modeling have been developed to estimate between-source variability with univariate observables or repeated measures. Other authors have considered hyperprior specifications for Bayesian hierarchical models (Daniels, 1999; Natarajan and Kass, 2000; Spiegelhalter, 2001; Gelman, 2006; Browne and Draper, 2006; Kass and Natarajan, 2006). Recently, approaches using Bayesian hierarchi- cal modeling to leverage supplemental controls in data analysis (Neuenschwander et al., 2010; Pennello and Thompson, 2008; Chen et al., 2011; Neelon and O’Malley, 2010) and trial design (Hobbs et al., 2013) have been explored. Furthermore, dynamic approaches to incorporating supplemental information using hierarchical modeling with sparsity inducing spike-and-slab hyperpriors and empirical Bayesian inference have been described (Hobbs et al., 2011, 2012; Murray et al., 2014). 1.2 Motivating examples and plan of dissertation 1.2.1 A regulatory tobacco clinical trial The Center for the Evaluation of Nicotine in Cigarettes, project 1 (CENIC-p1), was a 6-week randomized multi-center trial designed to evaluate the effect of nicotine reduction on tobacco use and dependence (Donny et al., 2015). 839 current smokers underwent randomization, with 780 completing the 6-week study after being equally randomized to seven treatment groups, including a usual brand control condition or one of six experimental cigarette conditions with nicotine content ranging from 15.8 mg per gram of tobacco (15.8 mg/g group; approximately equivalent to the nicotine content of commercial cigarettes) to 0.4 mg per gram of tobacco (0.4 mg/g group; also referred to as very low nicotine content (VLNC) cigarettes). At the end of six weeks, participants randomly assigned to the lowest nicotine condition had significantly reduced tobacco use, dependence, and nicotine exposure relative to the normal nicotine controls. In Chapter 2, the data from CENIC-p1 are used to illustrate the benefit of using MEMs to borrow information from a supplemental data source compared to standard analytical 4approaches. For purposes of illustration, our analysis focuses on the comparison of the change in cigarettes smoked per day from baseline to week 6 between the 15.8mg/g group and the 0.4 mg/g group. There are multiple supplemental sources of data that could be combined with our primary data source (the 15.8 mg/g and 0.4 mg/g groups from CENIC-p1) to obtain a more precise estimate of the effect of nicotine reduction on cigarette consumption. Specifically, data are available from two previous trials of VLNC cigarettes (Hatsukami et al., 2013, 2010). In addition, CENIC-p1 also included a 0.4 mg/g group with a higher-than-normal tar yield, which could be viewed as supplemental data for the 0.4 mg/g group. Finally, the 15.8 mg/g group could be combined with the usual brand control group from CENIC-p1 to achieve a more precise estimate of cigarette consumption in the control condition. However, assuming that the primary and supplemental sources are exchangeable may be inappropriate for a number of reasons and we desire a statistical method that is flexible enough to down-weight individual supplemental sources that are not consistent with the primary data source, while achieving increased precision when the supplemental sources are consistent with the primary source. In Chapter 4, model averaging is used to develop a mixture modeling approach for identifying non-compliance in CENIC-p1. Participants self-reported high rates of non- compliance to study product (i.e., smoking cigarettes other than those provided by the study) and, while the primary analysis was completed following the intent-to-treat princi- ple, investigators are interested in estimating the effect of nicotine reduction if all subjects had complied to the treatment to understand nicotine reduction as a potential tobacco regulatory strategy. In addition to high rates of self-reported non-compliance, it has also been noted that self-reported compliance to study product is not a reliable measure of com- pliance (Nardone et al., 2016) an biomarkers of nicotine exposure, such as cotinine or total nicotine equivalents, have been proposed as an approach for identifying non-compliance in randomized trials of VLNC cigarettes (Benowitz et al., 2015). However, while data from an auxiliary study provide information about the distribution of biomarkers for the VLNC group (Denlinger et al., 2016), no data are available for the intermediate dose levels, which limits the utility of biomarkers for identifying non-compliance in intermediate dose lev- els. To address this, we develop an approach to identify non-compliance in intermediate dose levels, which utilizes model averaging to potentially share information across nicotine 5dose levels based on our biological understanding of the relationship between the nicotine content of the cigarettes and biomarkers of nicotine exposure. 1.2.2 An adaptive platform trial design for emerging infectious disease epidemics The outbreak of the highly infectious Ebola virus disease (EVD) in West Africa beginning in March 2014 through 2016 resulted in more cases and deaths from EVD than all previous outbreaks combined (WHO, 2016). The outbreak had initial case mortality estimates as high as 74% in some areas and represented a dire situation with no known effective thera- peutics (Schieffelin et al., 2014). The West African outbreak called for novel platform-based trial designs which sequentially consider multiple treatments, potentially in combination, within a single trial in order to most effectively identify beneficial therapeutics and quickly incorporate them into the standard of care for EVD to reduce morbidity and mortality. Additional difficulties in designing a trial for the EVD outbreak were the relatively sparse knowledge of this EVD strain, the potential for disease evolution or changes in the course of the trial, the urgent need for identifying any potentially beneficial treatments or combi- nations of treatments as quickly as possible, and the need to maintain traditional clinical trial benchmarks as much as possible for outcomes such as activity, safety, and efficacy (Dodd et al., 2016). In response to this need, the National Institutes of Health (NIH) launched the Part- nership for Research on Ebola Vaccines in Liberia II (PREVAIL II) trial, a randomized clinical trial to evaluate medical countermeasures against EVD, in March of 2015 (Dodd et al., 2016; PREVAIL II Writing Group, for the Multi-National PREVAIL II Study Team, 2016). PREVAIL II was designed as a modified platform trial, as defined by Renfro and Sargent (2016), in order to improve the efficiency of testing multiple potentially beneficial therapeutics, accelerate clinical development, and maintain flexibility in the context of an emerging infectious disease epidemic. The trial had two unique characteristics driven by the urgent need for new treatments. First, the objective of PREVAIL II was to evalu- ate multiple treatments within a single, master protocol rather than multiple independent studies. Treatments would be evaluated sequentially against the optimal standard of care (oSOC) which, initially, consisted only of supportive care (intravenous fluids, hemodynamic 6monitoring, etc.). As the trial progressed, any treatment which demonstrated a significant improvement over oSOC would be added to the oSOC for future comparisons. Second, PREVAIL II utilized frequent interim monitoring to allow very early termination if a new treatment exhibited sufficient statistical evidence for a decline in the mortality rate with respect to the concurrent oSOC. 1.2.3 Dissertation objectives In this dissertation, we introduce multi-source exchangeability models (MEM), a general Bayesian approach that integrates supplemental data arising from multiple, possibly non- exchangeable, sources into the analysis of a primary source, while reducing the dimension- ality of the prior space by enabling prior specification on supplemental sources and avoiding the limiting assumption of exchangeability among the supplemental sources. Our model- ing framework effectuates source-specific smoothing parameters that can be estimated from the data to facilitate dynamic multi-resolution smoothing. We will demonstrate that our proposed approach yields asymptotically consistent posterior estimates, while achieving more desirable small sample properties when compared to competing Bayesian hierarchical modeling strategies. In Chapter 2 we introduce the general framework for MEMs and apply it to the context of Gaussian-distributed data with an unknown mean and known precision. Asymptotic properties are presented showing the consistency of the posterior MEM estimates with simulation studies demonstrating that desirable small sample properties can be obtained through carefully selected prior distributions. When compared to competing Bayesian hierarchical modeling strategies, the simulation results demonstrate that MEMs achieve up to a 56% reduction in bias when there is heterogeneity among the supplemental sources. When applied to the data from CENIC-p1, MEMs resulted in a 30% improvement in efficiency compared to a standard analysis without borrowing. In Chapter 3 we propose a multi-source adaptive platform design for EVD using the MEM framework to facilitate the borrowing of information across trial segments. Since the incorporation of supplemental information can induce an imbalance between the arms 7within a segment, we also incorporate adaptive randomization to balance overall infor- mation within a segment, which improves power relative to a fixed randomization strat- egy that results in information imbalances across treatment groups. Our proposed design demonstrates improvements of up to 51% in power with limited type-1 error inflation while randomizing more participants to promising treatment arms when compared to the design used in PREVAIL II. In Chapter 4 we focus on the estimation of mixture distributions to identify non- compliance in CENIC-p1 using a fully Bayesian approach. A model averaging approach, which incorporates our biological understanding regarding the relationship between the nicotine content and biomarkers of nicotine exposure, is proposed to borrow information across dose-levels and achieve more precise estimates of the mixture components. There exist multiple approaches to specifying the MCMC samplers, so model averaging techniques over multiple proposed specifications are explored via reversible-jump MCMC (RJMCMC). Our proposed method results in more precise estimates of the mixture components when the proposed relationship is appropriate, while minimizing bias when the relationship does not hold. Chapter 2 Bayesian Hierarchical Modeling based on Multi-Source Exchangeability This chapter introduces the proposed MEM framework, which enables the incorporation of supplemental sources of information into a primary data source. MEMs are developed in the context of Gaussian-distributed outcomes with unknown means and known precision, with an extension to binary outcomes in Chapter 3 and to estimation of mixture distributions in Chapter 4. The remainder of Chapter 2 proceeds as follows. We first introduce MEMs, in generality, in Section 2.1. We then discuss estimation and investigate the asymptotic properties of MEMs in the Gaussian case in Section 2.2. Simulation results comparing the small-sample properties of MEMs to existing methods are presented in Section 2.3 and we apply our proposed method to CENIC-p1 in Section 2.4. Finally, we conclude with a brief discussion in Section 2.5. 2.1 Multi-source Exchangeability Models Consider the general case where there is a single primary cohort, P , with n observables represented by yp, and a total of H independent supplemental cohorts considered for incorporation into the analysis with nh observables each, represented by yh, h = 1, ...,H. 8 9Together these observables represent the data, D. Our primary goal is to estimate θp, which represents the parameters for the primary cohort. θh represents the same parameters from supplemental cohort h. For our framework, exchangeability is defined between the primary cohort and a supplemental cohort h to be where θp = θh. Since it is likely that the supplemental sources may not be exchangeable with the primary cohort or that the supplemental sources may be heterogeneous, themselves, let Sh denote an indicator function of whether or not supplementary source h is assumed exchangeable with the primary cohort. A MEM, noted generally as Ωk, is defined by con- sidering a set of source-specific indicators, (S1 = s1,k, ..., SH = sH,k), where sh,k ∈ {0, 1} with indices for source h and model k representing all K = 2H possible configurations of assumptions regarding exchangeability between the primary and supplemental sources. A conceptual diagram representing these K configurations of exchangeability is depicted by Figure 2.1 with each box representing a potential configuration of exchangeability for a MEM, Ωk. Solid arrows indicate sh,k = 1 and graphically demonstrate what observables are used to estimate θp and therefore indicate which supplemental sources are exchange- able with the primary cohort. If there is no solid arrow connecting the observables, then sh,k = 0. The dashed arrows with posterior model weights, ωk, visualize that posterior inference is completed by a weighted average of each MEM posterior. To quickly identify the sources assumed exchangeable in the K MEMs, the k subscript can be represented as the supplemental sources assumed exchangeable with the primary source such that Ω represents the MEM which assumes no supplemental information and Ω1,2,...,H assumes all supplemental sources are exchangeable. 10 Conceptual Diagram of Multi-source Exchangeability Models p(θp|D) = K∑ k=1 ωkp(θp|Ωk, D) Ω Ω1 Ω1,2Ω1,2,...,H . . . θp θp θpθp yp yp ypyp y1 y2 ... yH y1 y2 ... yH y1 y2 .. . yH y1 y2 .. . yH ω ω1 ω1,2ω1,2,...,H Figure 2.1: Each MEM is a combination of supplemental sources assumed exchangeable with the primary cohort in order to estimate the parameters of interest, θp, and is contained within each box for Ωk. Within a box the solid arrows θp and the observables, yh, represent which supplemental sources are assumed exchangeable with the primary cohort within the given MEM. The dashed arrows represent that the posterior model weights for each MEM, ωk, are used in calculating the weighted average of each MEM’s posterior distribution, p(θp|Ωk, D), to be used for posterior inference, p(θp|D). A natural approach to estimating model-specific weights is through Bayesian model averaging (BMA) (Raftery, 1995; Raftery et al., 1997; Hoeting et al., 1999). BMA accounts for uncertainty in the model specification by facilitating posterior inference that averages over a collection of posterior distributions that are obtained from a set of candidate models. BMA describes the Bayesian framework for using conditional probability to estimate a weight for each candidate model in the presence of the data. In our case, BMA would be used to average over the K = 2H multi-source exchangeability models representing all possible assumptions regarding the exchangeability of the supplementary data with the 11 primary data. A model weight given the data, ωk, which is often random, is determined for each possible model as laid out by Hoeting et al. (1999) and used for posterior inference. Let L represent the likelihood given the data for Ωk, Θ = (θp, θ1, ..., θH), and pi(Θ|Ωk) denote the prior density of Θ under Ωk. In the context of MEMs, the integrated marginal likelihood for a particular multi-source exchangeability model given the data is obtained by averaging the likelihood over the posterior distribution for the vector of all model pa- rameters of interest, p(D|Ωk) = ∫ L(Θ|D,Ωk)pi(Θ|Ωk)dΘ. (2.1) The posterior model weights for each MEM are given by ωk = p(Ωk|D) = p(D|Ωk)pi(Ωk)∑K j=1 p(D|Ωj)pi(Ωj) , (2.2) where pi(Ωk) is the prior probability that Ωk is the true model. The marginal posterior distribution given the observable data D to be used for inference on θp is the weighted average using the posterior model weights of the K multi-source exchangeability model posteriors, p(θp|Ωk, D): p(θp|D) = K∑ k=1 ωkp(θp|Ωk, D). (2.3) Unfortunately, BMA quickly becomes highly parameterized as the number of models grows exponentially with the number of supplementary sources (K = 2H), and prior speci- fication over a model space of large size is problematic. Ferna´ndez et al. (2001) noted that posterior model weights can be very sensitive to the specification of priors in the model, especially in the absence of strong prior knowledge. In the analysis of limited data obtained from a clinical study, these issues with the conventional BMA approach become critical and motivate our proposed approach. With MEMs the supplemental sources are assumed to be distinct and independent, therefore we can specify priors with respect to sources instead of models, pi(Ωk) = pi(S1 = s1,k, ..., SH = sH,k) = pi(S1 = s1,k)× · · · × pi(SH = sH,k). This results in drastic dimension reduction in that it necessitates the specification of only H source-specific prior inclusion probabilities in place of 2H prior model probabilities com- prising the entire model space. In Section 2.2.2, we propose prior weights for the source- inclusion probabilities that result in consistent posterior model weights and yield desirable 12 small sample properties, as evaluated by simulation in Section 2.3. In contrast, simi- larly constructed prior weights on the models did not result in consistent posterior model weights. 2.2 Estimation and theoretical results We now describe posterior inference using MEMs in the Gaussian case and investigate the asymptotic properties of our proposed approach for two classes of source-specific prior model weights. For our primary cohort (P ), let x1,p, x2,p, ..., xn,p, denote a sample of i.i.d. and normally distributed observables with mean µ and variance σ2. Similarly, for supplemental cohorts h = 1, ...,H, let x1,h, ..., xnh,h, denote a sample of i.i.d. normally distributed samples with source-specific mean µh and variance σ 2 h. Throughout this section, we assume σ2 and σ2h are known and define v = σ2 n and vh = σ2h nh for notational simplicity. Using the MEM approach, the likelihood is a weighted average of multi-source ex- changeability models representing all possible assumptions regarding exchangeability: K∑ k=1 ωkL(µ,σ 2|(s1,k, ..., sH,k)) = K∑ k=1 ωkN (µ, σ2) H∏ h=1 {N (µsh,k + µh(1− sh,k), σ2h)} . (2.4) Then, assuming a flat prior on the Gaussian mean, µk, as described by (Gelman, 2006), pi(µk|Ωk) ∝ 1 in (2.1), the Gaussian conditional marginal likelihood can be generally written for any MEM as: p(D|Ωk) = ( √ 2pi)(H+1)− ∑H h=1{sh,k}√( 1 v + ∑H i=1 { si,k vi })(∏H j=1 {[ 1 vj ]1−sj,k})× exp ( −1 2 [ H∑ l=1 { sl,k(x¯− x¯l)2 v + vl + vvl( ∑ m6=l{sm,kv−1m }) + H∑ l 0, potentially specified for each source, such that the extent of shrinkage is determined by µ = µh + h. This model, however, would not be identifiable without inducing sparsity on the domain of h, perhaps with a spike-and-slab prior. This would require additional hyperparameter specification beyond that of the MEM framework, which would complicate the calibration to control frequentist error in practical application. Another limitation is that the type of study, retrospective/observational versus prospec- tive/randomized, affects the quality and reliability of the resulting estimates, and there- fore consideration of design type should be taken into account. For example, as conveyed in Pocock’s “Acceptability Criteria,” (1976) decisions pertaining to which supplemental sources should be considered for integrative analysis should be based on the study objec- tives, as well as controlling sources of potential bias arising from inconsistent eligibility criteria, application of the interventions, or outcome ascertainment. Similarly, when using 29 data arising from non-randomized designs, confounding due to selection bias needs to be accounted for in some manner, such as through existing methods for causal inference using propensity scoring, inverse-probability weighting, or matching. We endeavor to extend the methodology to facilitate integration of multiple trials while accounting for confounding from non-randomized designs as future work. Chapter 3 A Multi-source Adaptive Platform Design for Emerging Infectious Diseases 3.1 Introduction In the context of the dire, rapidly developing Ebola outbreak introduced in Chapter 1, the NIH launched the Partnership for Research on Ebola Vaccines in Liberia II (PREVAIL II) trial, a randomized clinical trial to evaluate medical countermeasures against Ebola virus disease (EVD). PREVAIL II was designed as a sequential platform design, where multiple experimental therapeutics were evaluated sequentially compared to the optimal standard of care (oSOC). Treatments that showed a significant improvement relative to the oSOC were added to the oSOC for all future comparisons. This allowed PREVAIL II the potential to rapidly evaluate multiple treatments, accelerate clinical development, and maintain flexibility in the context of an emerging infectious disease epidemic within one master protocol. However, particular aspects of the platform design methodology could be enhanced for future outbreaks. One major limitation is that the proposed design only allowed the use of contemporaneous controls. For instance, if the initial drug (drug A) represented a significant improvement over the oSOC in the initial segment of the trial, the second segment would evaluate drug B in addition to drug A in a randomized design, 30 31 i.e. oSOC + drug A vs. oSOC + drug A + drug B, but only subjects from segment two would be used to evaluate drug B. This ignores subjects randomized to receive oSOC + drug A in the initial segment of the trial. In this chapter, we illustrate how MEMs and adaptive randomization (AR) can be incor- porated into the PREVAIL II master protocol to achieve improved operating characteristics relative to the original PREVAIL II design. In Chapter 2, we illustrated that MEMs can be used to integrate information arising from potentially non-exchangeable normal populations while minimizing bias. Integrating supplemental information using MEMs dynamically de- termines if the non-contemporaneous segments in PREVAIL II are exchangeable (e.g., if the segments have equivalent mortality rates) and thereby shares information across seg- ments, if appropriate, to achieve more precise estimates of the disease-response rate and increase power. In addition, we will incorporate AR through an extension of the dynamic allocation procedure proposed by Hobbs et al. (2013), which targets information balance across treat- ment groups. Borrowing information from previous segments using MEMs may lead to imbalances in statistical information across treatment groups within a segment because supplemental information are only available for the control group, but, by adapting the randomization ratio to allocate more subjects to the treatment group, we are able to bal- ance statistical information within a segment. Balancing information by boosting allocation to the novel treatment arm improves statistical power to detect effective treatments. As effective novel therapies emerge in the platform, boosting allocation to the treatment arm in the presence of evidence for inter-segment exchangeability among controls also has the potential to improve outcomes for trial participants. It is important to note that the proposed multi-source AR differs fundamentally from conventional response- or outcome-AR methods. A recent article by Thall et al. (2015) evaluated the impact of outcome-AR, concluding that designs that use outcome-AR attain diminished power, often to a considerable extent, necessitating a larger overall sample size. Moreover, outcome-AR risks large imbalances in sample size in the wrong direction, assigning more patients to inferior treatments in the presence of the small to moderate effect sizes observed in practice. In contrast, our proposed AR scheme endeavors to balance total effective information in the presence of potentially non-exchangeable supplemental cohorts, 32 which maximizes power for comparing treatment groups. The remainder of the chapter proceeds as follows. First, the standard design of the PREVAIL II master protocol is introduced in Section 3.2, followed by our proposed design which incorporates MEMs and AR in Section 3.3. The scenarios considered for the simu- lation studies, the process for design calibration, and results for the simulation studies are presented in Section 3.4. We conclude with a brief discussion in Section 3.5. 3.2 Standard design of PREVAIL II master protocol The initial objective of PREVAIL II was to sequentially evaluate multiple candidate ther- apies for the treatment of EVD. Each treatment was to be evaluated in a separate trial segment and each segment to consist of a separate randomized trial to compare the new treatment versus the current oSOC. Treatments found to offer a significant survival benefit compared to the standard of care would be added to the standard of care for all future segments. Figure 3.1(I) graphically depicts an example with 3 segments and four differ- ent treatment combinations represented by color. In segment 1, the oSOC arm (blue) is compared to an experimental arm of drug A + oSOC (yellow) with the proportion ran- domized to the experimental arm fixed at τ = 0.5 (also represented by the equally sized triangles signifying equal enrollment throughout the segment). If the experimental arm was determined to provide significant improvement over the standard of care, then the next trial segment would consist of a comparison between the updated oSOC, drug A + oSOC (yellow), and a new experimental arm, drug B + drug A + oSOC (green). However, if the drug B + drug A + oSOC does not demonstrate a significant improvement, drug A + oSOC is carried forward to the next segment where drug A + oSOC (yellow) is compared to drug C + drug A + oSOC (maroon). 33 𝜏 = 0.5 𝜏 = 0.5 𝜏 = 0.5 Segment 1 Segment 2 Segment 3 Trial Time (𝑡) Arms oSOC Experimental LEGEND Study arm currently enrolling: I) PREVAIL II oSOC Experimental oSOC Experimental 𝜏 = 0.5 𝜏(𝑡) = 𝑓(𝐸𝑆𝑆𝑆) 𝜏(𝑡) = 𝑓(𝐸𝑆𝑆𝑆) Segment 1 Segment 2 Segment 3 Trial Time (𝑡) Arms oSOC Experimental LEGEND Study arm currently enrolling: Supplemental information from previously enrolled segments: II) Multi-Source Adaptive Platform oSOC Experimental oSOC Experimental Figure 3.1: Example comparing three segments of a trial with (I) PREVAIL II master protocol which only compares contemporaneously enrolled subjects with equal allocation to the study arms, τ = 0.5, versus (II) framework with methods to potentially incorporate non-contemporaneous data and ability to adaptively alter the randomization ratio as a function of the effective supplemental sample size, τ(t) = f(ESSS). Equally sized trian- gles further indicate segments with equal allocation versus smaller oSOC triangles which indicate the potential for greater allocation to the experimental arm in the presence of supplemental information. 34 The PREVAIL II master protocol allowed for the rapid evaluation of multiple candidate treatments. Within a segment, PREVAIL II used frequent sequential monitoring in the Bayesian paradigm to allow early termination in the event that a treatment provided a substantial survival benefit over the standard of care (Dodd et al., 2016; Proschan et al., 2016). The primary outcome for PREVAIL II was a binary indicator of 28-day mortality. Let xA be the number of deaths, nA be the total number of subjects randomized, and pA be the 28-day mortality rate for hypothetical treatment arm A. Define xB, nB and pB analogously for hypothetical control arm B. Assuming independent beta(α = 1, β = 1) priors for pA and pB results in independent beta posteriors with α = 1+xA, β = 1+nA−xA for pA and α = 1 + xB, β = 1 + nB − xB for pB. Formal inference on the 28-day mortality rate was based on the posterior distribution. The posterior probability that the 28-day mortality rate in arm A is less than arm B is: P (pA < pB|xA, xB) = nA+1∑ k=xA+1 ( nA+1 k )( nB+1 xB ) (nB − xB + 1)( nA+nB+2 k+xB ) (nA + nB − k − xB + 2) . (3.1) PREVAIL II was planned with a within-segment maximum sample size of 100 subjects per arm. If the maximum sample size were reached, treatment A would be declared a significant improvement over the control if P (pA < pB|xA, xB) ≥ 0.975. PREVAIL II enabled aggressive interim monitoring to stop the trial early if the new treatment represented a substantial improvement over the standard of care. Interim analy- ses were first completed after six subjects were randomized to each arm and were completed after every 2 subjects until data were available for 40 subjects. After the first 40 subjects, interim monitoring was carried out after every 40 subjects until a maximum of 200 subjects were enrolled (100 per arm). The trial would stop and declare treatment A a significant improvement over the control if P (pA < pB|xA, xB) ≥ 0.999. Simulation results demon- strated that this design had an overall within-segment type-I error rate near 0.03 and 86% power to detect significant difference assuming a relative risk of 0.5 for the new treatment (Dodd et al., 2016). 35 3.3 Methods A limitation of the PREVAIL II design is that only contemporaneous information from the current segment is included in the analysis. Thus, relevant data from prior segments was ignored. In fact, after the initial segment, non-contemporaneous, supplemental data is always available for the control arm from one or more previous segments. While avoiding the introduction of inter-cohort bias, the PREVAIL II design makes inefficient use of the data. Utilizing Bayesian methods that estimate partial exchangeability across segments, however, overcomes this inefficiency, resulting in increased power (or decreased total sam- ple size), while protecting against bias in the presence of systematic differences between supplemental and contemporaneous controls. 3.3.1 General framework of multi-source adaptive designs To address this limitation, we propose the general conceptual design graphically represented in Figure 3.1(II). Segment 1 is identical to the original design proposed for PREVAIL II; patients are randomized with a fixed allocation ratio of τ = 0.5 to the oSOC (blue) or experimental drug A + oSOC (yellow). However, after the first segment, there will always be supplemental, non-contemporaneous control arm information acquired from past segments which can be integrated into future comparisons using a dynamic Bayesian model. The figure depicts data acquired from prior segments by rectangles with diagonal lines placed in the segment of observation. For example, in Segment 2, supplemental data for the controls are available from Segment 1, which in our figure comes from the yellow study arm. In Segment 3, supplemental data for the controls are available from the first two segments. Incorporating supplemental information from previous segments can potentially result in imbalances in the total effective information between treatment groups within a seg- ment if a fixed allocation to the experimental arm of τ = 0.5 is maintained. Extending the AR method proposed by Hobbs et al. (2013) to the setting of a sequential platform design attenuates this imbalance and maximizes power. This is achieved by allowing the allocation ratio to vary as a function of the effective supplemental sample size (ESSS). Recall, ESSS is a measure reflecting the extent of relative gain in the posterior precision 36 obtained from a Bayesian model when compared to a model that neglects the supplemental sources. The measure is intended to characterize the effective number of samples incorpo- rated from supplemental sources. By defining the allocation ratio as a function of ESSS (τ(t) = f(ESSS)), the proposed AR methodology aims to balance total information across treatment groups within a segment. Within Figure 3.1(II) this potentially unequal allo- cation to the treatment arms is represented by the differing slopes, which are adjusted in relation to the extent of estimated exchangeable data contributed by concordant treatment regimes during segments 2 and 3. The remainder of this subsection presents notation to explain the general framework for multi-source AR. During the initial period of a segment, equal allocation between arms is used until sufficient information is acquired to facilitate estimation of inter-segment exchangeability, after which block-randomization is used to update the allocation ratios according to estimates of ESSS. Let nburn represent the number of patients which must be observed for the “burn-in” period at the start of any segment where supplemental information is available, B represent the total number of blocks to adaptively randomize patients after the burn-in period, and tb be the “time” of the b th interim analysis at the start of a block. Additionally, define nA(tb) and nB(tb) as the total sample size assigned in the current segment to the treatment and control arms at tb, respectively, ESSS(tb) as the estimated effective supplemental sample size from the data at tb, and R(tb) as the number of subjects left to be randomized at tb assuming the maximum sample size of the segment is achieved. Recall, the objective is to balance total effective information at trial completion such that, at tb, nA(tb) = ESSS(tb) + nB(tb). Thus, allocation is needed in relation to nA(tb) + τR = ESSS(tb) + nB(tb) + (1 − τ)R. Therefore, under the the aim of balanced allocation, assignments to treatment for the next block of patients attains the following formulation τ(tb) = 1 2 ( ESSS(tb) + nB(tb)− nA(tb) R(tb) + 1 ) . (3.2) τ(tb) can range between 0 and 1 depending on the extent of shrinkage to supplemental information with a value of 0 implying all patients are randomized to the control arm, a value of 0.5 implying a 1:1 allocation ratio, and a value of 1 implying all patients are randomized to the treatment arm. 37 3.3.2 Incorporating supplemental information with Bayesian modeling using MEMs Our proposed multi-source adaptive design as described thus far is general and can be im- plemented using any method for incorporating supplemental information. While there are many potential methods available for incorporating supplemental information, the MEM framework is specifically considered herein on the basis of recent efforts demonstrating its desirable properties for yielding shrinkage estimators in the presence of non- or partially exchangeable cohorts while avoiding highly parameterized models (which are computation- ally infeasible for implementation and calibration of sequential design). Recall from Chapter 2, the MEM framework takes the H supplemental segments avail- able for incorporation and maps them to 2H = K multi-source exchangeability models, denoted Ωk, which represent all possible combinations of assumptions of exchangeability between the current segment and the H supplemental segments. For an example in the context of our proposed design, refer back to Figure 3.1(II) where analyses at segment 3 would have four possible MEMs: no supplemental segments assumed exchangeable with segment 3 (Ω), only segment 1 assumed exchangeable with segment 3 (Ω1), only segment 2 assumed exchangeable with segment 3 (Ω2), and both segments 1 and 2 assumed ex- changeable with segment 3 (Ω1,2). The MEM framework produces a posterior estimate over these K models using posterior model weights, ωk, such that ∑K k=1 ωk = 1. MEMs comprising non-exchangeable supplemental data receive smaller posterior weights whereas models contributing only exchangeable sources carry more influence in the posterior distri- bution. A resultant smoothed posterior estimator synthesizing all possible exchangeability relationships is used for inference. If the standard beta-binomial model is updated to accommodate MEMs in the control arm, a similar structure to the current PREVAIL II master protocol can be utilized which is able to incorporate supplemental data from previous segments, making more efficient use of available evidence and potentially improving the power of the trial. In the setting of PREVAIL II, we have supplementary data for the control arm, arm B, but not for the treatment arm, arm A. Therefore, we will model arm A using the beta-binomial model and model arm B using MEMs. Introducing formal notation, the marginal posterior distribution of pB given the observable data, D, from the current segment’s controls and the observable 38 data from H supplemental segments is derived as the weighted average of the posterior distributions for the K multi-source exchangeability models, q(pB|Ωk, D): q(pB|D) = K∑ k=I ωkq(pB|Ωk, D). (3.3) The posterior model weight, ωk, for each MEM is given by ωk = pr(Ωk|D) = p(D|Ωk)pi(Ωk)∑K j=I p(D|Ωj)pi(Ωj) , (3.4) where p(D|Ωk) is the integrated marginal likelihood for Ωk and pi(Ωk) is the prior probabil- ity that Ωk is the true model. The formulation of posterior model weights in (3.4) utilizes a framework similar to Bayesian model averaging (BMA), however the MEM framework re- duces the dimension of the prior weight space by enabling specification on the supplemental sources rather than models as described in Chapter 2. Using the notation from Section 3.2 for arm A and the current segment for arm B, let xB,h be the number of deaths observed in arm B for non-contemporaneous supplemental segment h (h = 1, ...,H), nB,h be the number of subjects randomized to arm B in segment h, and pB,h be the 28-day mortality rate for arm B in segment h. Let Sh denote an indicator function of whether or not the non-contemporaneous supplementary source h is assumed exchangeable (i.e., if Sh = 1, pB,h = pB). A model, Ωk, is then defined by considering a set of source-specific indicators, (S1 = s1,k, ..., SH = sH,k), where sh,k is a binary indicator of whether or not source h is assumed exchangeable with the primary data in Ωk. Assuming independent beta(α, β) priors on pB and pB,1,...pB,H , the integrated marginal likelihood for each MEM can be written as follows: p(D|Ωk) = B ( x+ α+ ∑H h=1 sh,kxh, n+ β − x+ ∑H j=1 sj,k(nj − xj) ) B(α, β) × H∏ i=1 ( B(xi + αi, ni + βi − xi) B(αi, βi) )1−si,k , (3.5) where B() represents the beta function. The marginal likelihood results in the following 39 MEM-specific posterior distribution used to calculate the marginal posterior distribution in (3.3): q(pB|D,Ωk) = Beta x+ α+ H∑ h=1 sh,kxh, n+ β − x+ H∑ j=1 sj,k(nj − xj)  . (3.6) Therefore the posterior distribution of (3.3) for the 28-day mortality rate for the MEM estimator is a mixture of beta distributions encompassing all possible exchangeability re- lationships. Since supplementary data are only available for the control arm (arm B), the marginal posterior probability that pA < pB is a weighted average of the conditional posterior probability that pA < pB for all possible assumptions about exchangeability: PMEM (pA < pB|xA, xB) = K∑ i=1 ωiP (pA < pB,Ωi |xA, xB,Ωi). (3.7) In the context of AR with MEMs, a 1:1 allocation ratio is assumed during the burn-in period. The specific calculations used for ESSS in (3.2) are defined as follows. For each individual MEM, in the context of the beta-binomial model, the posterior effective sample size (ESS) can be generally derived as ESS(Ωk) = α+ β + nB + H∑ h=1 sh,knB,h. (3.8) The posterior ESSS for the overall MEM estimate is then calculated as the weighted average of the difference from each individual MEM’s ESS and the current control arm’s sample size: ESSS = ∑K k=1 ωk[ESS(Ωk) − nB]. Further, it should be noted that the beta(α, β) prior in the beta-binomial model confers the effective information of α + β subjects in (3.8). Therefore, the MEM facilitates a non-zero ESSS of α + β when assuming the prior probability of 1 on the independence model (e.g., the model which does not borrow strength across segments). 40 3.3.3 MEM prior probability specification As with any Bayesian model, the properties of MEMs depend on the prior specification assumed for the model weights, with more flexible choices imparting robustness for poste- rior inference. Since supplemental sources are assumed independent in the MEM frame- work, the prior model weight formulation can be specified as the product of the source- specific prior inclusion probabilities: pi(Ωk) = pi(S1 = s1,k, ..., SH = sH,k) = pi(S1 = s1,k)× · · · × pi(SH = sH,k) (Kaizer et al., 2017). While there are numerous strategies that could be used to identify potential priors for each source, this section considers specific fully Bayesian and empirical Bayesian approaches which were found to achieve desirable operating characteristics in our simulation study. Our proposed fully Bayesian prior, denoted by pie, assumes equal prior weight for in- clusion and exclusion for all supplementary sources: pie(Sh = 1) = 1 2 . This prior provides impartiality to which supplemental segments should be considered exchangeable with the primary segment. In contrast to the fully Bayesian approach, an empirical Bayesian (EB) approach utilizes the data collected to inform the prior distribution by maximizing the marginal likelihood with respect to the prior weights. For the proposed MEM model for binary data discussed above, the marginal likelihood is maximized by placing a prior inclusion weight of 1 on sources assumed exchangeable, while all other supplemental sources receive a prior inclusion weight of 0. This induces posterior weights of 1 for the model which maximizes the marginal density and 0 for all other models. The proposed EB prior is denoted by piEB. However, placing all of the weight on a single MEM may induce less than ideal operating characteristics under circumstances where the marginal density of multiple MEMs may be close to the maximum marginal density. Therefore, a constrained EB prior is proposed, denoted piEBc , where 0 ≤ c ≤ 1, such that the marginal density is maximized under the constraint that the prior source inclusion probabilities must be less than c. This results in a prior inclusion probability of c for segments assumed exchangeable in the MEM that maximizes the marginal density, with all other segments receiving a prior inclusion probability of 0. When c = 1, the standard EB formulation is achieved, and when c = 0, there is no borrowing of supplemental information. Constraining the optimization over Ωk with piEBc attenuates bias and avoids over smoothing in the presence of limited evidence 41 for exchangeability. As an illustrative example of the behavior of the posterior weights for these priors, consider the case where there are three supplemental segments. Let nB, nB,1, nB,2, nB,3 = 100, xB = 50, xB,1 = 52, xB,2 = 45, and xB,3 = 65. The calculated posterior weights for each MEM are provided in Table 3.1. We see that besides pie, no other prior gives any weight to S3 which has an estimated mortality of 0.65 and is the most different from our primary segment with its 0.50 estimated mortality. Further, as c decreases for the piEBc priors we see that Ω5, which assumes both S1 and S2 are exchangeable, receives less weight while Ω1, which assumes no supplemental sources are exchangeable, receives increasing amounts of the posterior weight. Sources in Ωk MEM P S1 S2 S3 pie piEB1 piEB.9 piEB.5 piEB.1 piEB0 Ω1 1 0 0 0 0.025 0.000 0.000 0.030 0.420 1.000 Ω2 1 1 0 0 0.136 0.000 0.026 0.165 0.256 0.000 Ω3 1 0 1 0 0.111 0.000 0.021 0.134 0.208 0.000 Ω4 1 0 0 1 0.015 0.000 0.000 0.000 0.000 0.000 Ω5 1 1 1 0 0.556 1.000 0.953 0.672 0.116 0.000 Ω6 1 1 0 1 0.064 0.000 0.000 0.000 0.000 0.000 Ω7 1 0 1 1 0.012 0.000 0.000 0.000 0.000 0.000 Ω8 1 1 1 1 0.081 0.000 0.000 0.000 0.000 0.000 Table 3.1: Posterior model weights for each multi-source exchangeability under various priors to demonstrate the impact different priors can have on the resulting weights. P rep- resents the current segments, Sh represents the supplemental segments potentially available for incorporation (h = 1, 2, 3). 3.4 Simulation Study Simulation was used to evaluate and compare the operating characteristics of the PREVAIL II master protocol and the proposed multi-source AR approach. Data were generated assuming an underlying mortality rate, poSOC , for oSOC alone, and the mortality rate for each potential drug combination was defined through a multiplicative model utilizing the relative risk (RR) of each drug and assuming no interactions. For example, the mortality 42 rates for the various combinations of oSOC and two potential treatments are: oSOC =poSOC , oSOC+Drug A =poSOC × RRA, oSOC+Drug B =poSOC × RRB, oSOC+Drug A+Drug B =poSOC × RRA × RRB. The multi-source AR approach assumes independent beta(α = 1, β = 1) priors for pA and pB and hypothesis testing at the interim and final analyses is based on the marginal posterior probability that pA is less than pB. As in PREVAIL II, our proposed design will stop and declare the experimental arm to be a significant improvement over the control if P (pA < pB|xA, xB) > 0.999 for all interim analyses. The posterior probability thresholds used at the final analysis will be calibrated for each prior to achieve the desired operating characteristics, as described in Section 3.4.1. Operating characteristics for the multi-source AR approach using MEMs with pie and piEBc were compared to the naive approach of pooling all available supplemental information regardless of exchangeability, which will maximize the amount of supplemental information available, but introduces a prohibitive extent of bias in the presence of non-exchangeable supplementary segments. Rather than adopting the aggressive interim monitoring of PRE- VAIL II, we propose interim monitoring after the enrollment of every 40 subjects until the end of the burn-in period, where it will then follow the practical schedule of interim anal- yses at the start of each block, at tb, with nburn = 60 and B = 5. This implies interim analyses after 40, 60, 95, 130, and 165 patients were observed, but updating the allocation ratio using (3.2) only occurs after 60, 95, 130, and 165 patients are observed. If the trial does not terminate early for superiority it will proceed to enroll a total of 200 patients in the current segment and conduct the final analysis after all information is collected. Additionally, bounds are placed on (3.2) to ensure τ(tb) ∈ [0, 1]. We considered the sequential testing of 5 potential therapeutics in the context of our platform design with two scenarios for the underlying oSOC mortality rate: (1) a constant mortality rate for all segments: p = 0.40 and (2) a decreasing mortality rate by segment, p = (0.74, 0.61, 0.48, 0.36, 0.23). These decreasing values reflect observed mortality rates as 43 the Ebola epidemic progressed from May 2014 to December 2014 in Sierra Leone (Dodd et al., 2016). The varying mortality scenario is more challenging for MEMs than the constant mortality case since the supplemental controls are not exchangeable, in which case minimal borrowing is preferred. Further, five different therapeutic RR profiles are examined: (1) all drugs have a null effect (RR=1) and (2-5) one drug in the treatment pipeline has a moderate effect in segment 2, 3, 4, or 5 with RR=0.7. In a rapidly evolving epidemic it may very well be that no or very few of the included therapeutics demonstrate an improvement. Moreover, location in the pipeline may impact the platform’s operating characteristics. 3.4.1 Parameter calibration To provide context, the original PREVAIL II design had a type-I error rate of around 0.03 with 86% power to reject the null hypothesis assuming a RR of 0.5, a baseline 28-day mortality rate of 0.4, and a posterior probability threshold of 0.975 at the final analysis within a segment if the trial did not terminate early. While the same posterior probability threshold could be used for MEMs, performance will be optimized if the posterior proba- bility threshold is optimized to achieve the desired operating characteristics. Furthermore, optimal characteristics may be achieved with a threshold that varies by segment because more supplemental information will be available at later segments as compared to earlier segments. Given the two scenarios for the underlying mortality rate, two potential processes to calibrate the posterior probability thresholds are considered. First, thresholds can be cal- ibrated to achieve a type-I error rate of approximately 0.025 within a segment for the constant mortality scenario and the operating characteristics under the varying mortality scenario can be evaluated to determine if the inflation in the type-I error rate is within acceptable levels. Alternatively, thresholds can be calibrated to limit the inflation of the type-I error rates in the varying mortality scenario while maintaining similar power in the constant mortality scenario to that observed for the PREVAIL II design. To address the first case, potential thresholds are identified via simulation without interim monitoring us- ing a gradient descent algorithm with nburn = 60 and B = 5 until the average segment-wise type-I error rate is between 0.024 and 0.026. These estimates are used as the initial values 44 Segment Scenario Approach 1 2 3 4 5 Constant piEB10 0.975 0.97125 0.96625 0.95875 0.95750 pie 0.975 0.96375 0.95875 0.94375 0.93250 POOL 0.975 0.97375 0.96375 0.95375 0.94000 Both piEB10 0.975 0.97500 0.97750 0.97500 0.97500 pie 0.975 0.98150 0.98500 0.98750 0.98750 POOL 0.975 0.99500 0.99900 0.99750 0.99900 Table 3.2: Posterior probability thresholds for MEMs with empirical Bayesian prior (piEB10), MEMs with fully Bayesian uniform prior (pie), and the naive pooling (POOL) approach presented to optimize to maintain average type-1 error rate across scenarios for constant underlying mortality scenario only (“Constant”) or to minimize trade-off in inflation of type-1 error in constant underlying mortality scenario while maintaining power equal to, or greater, than the PREVAIL II scenario under the varying underlying mortality scenario (“Both”). to further refine thresholds to achieve desired performance. In the second case, thresholds are identified by increasing the posterior threshold segment-by-segment to achieve equal or greater power compared to the PREVAIL II design in the constant mortality scenario while attempting to minimize the inflation of the type-I error rate in the varying mor- tality scenario as compared to the PREVAIL II design. The latter approach may result in a trade-off between power and type-1 error but, in the context of an emerging disease outbreak, this is beneficial due to the importance of maintaining power to detect effective treatments. 3.4.2 Results 25,000 simulated trials were completed for each scenario. Operating characteristics are presented for the PREVAIL II master protocol, MEMs with the fully Bayesian uniform prior (pie), MEMs with the constrained EB prior (piEBc), and naive pooling. Results are presented for thresholds calibrated to achieve the desired type-I error rate in the constant mortality scenario as described in Section 3.4.1, with the thresholds identified under each approach to calibration presented in Table 3.2. The value of c = 0.10 was selected for piEBc based on extensive sensitivity analyses (not presented) that considered values of c from 0.05 to 0.50. 45 The operating characteristics presented for each scenario include the probability of attaining a positive test within a segment based on the Bayesian posterior probability thresholds, the mean (sd) total number of subjects (N) treated throughout all segments as a measure of early termination, the mean (sd) proportion randomized to the treatment arm in segments 2-5 as a measure of adaptive randomization performance, and the mean (sd) proportion who survived either across segments 2-5 in the null case or in the specific non- null segment in scenarios with an efficacious therapy. When considering a drug under the null case it is ideal to rarely attain a positive test within a segment (i.e., have a probability of rejecting near 0), whereas it is desirable to attain a positive test within segments that include efficacious treatment (i.e., have a probability of rejecting near 1). Further, the proportion surviving in the null scenario is expected to be identical in PREVAIL II and the proposed AR design, but an improvement in survivorship is expected in non-null segments when the AR design effectuates more allocation to the treatment arm. Table 3.3 presents simulation results for the constant mortality scenario. The average type-I error rate across segments for MEMs is similar to or less than the average type- I error for the PREVAIL II master protocol. However, the power to detect an effective drug is higher in every non-null segment for piEB10 and pie compared to PREVAIL II, with increases in power ranging from 9% to 27% and 29% to 69% for piEB10 and pie, respectively. Naive pooling, which represents the best-case upper bound on performance in the presence of a constant mortality rate, results in similar or reduced type-I error rates compared to those observed in PREVAIL II with increases in power of 34% to 76% across all segments. Operating characteristics are summarized visually in the left panel of Figure 3.2, where open triangles represent the results for the PREVAIL II design, closed shapes represent approaches incorporating supplemental information, and the different colors identify the segment. In the presence of a constant mortality rate, all 8 MEM-based designs have increased power compared to PREVAIL II while 6 also have lower average type-1 error rates. 46 0.020 0.022 0.024 0.026 0.028 0.030 0. 4 0. 5 0. 6 0. 7 0. 8 Average Type−1 Error Rate Across Scenarios Po w e r Constant Mortality PREVAIL II MEM piEB10 MEM pie Naive Pooling Segment 2 Segment 3 Segment 4 Segment 5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0. 2 0. 4 0. 6 0. 8 1. 0 Average Type−1 Error Rate Across Scenarios Po w e r Varying Mortality PREVAIL II MEM piEB10 MEM pie Naive Pooling Segment 2 Segment 3 Segment 4 Segment 5 Figure 3.2: Plots demonstrating power versus average type-1 error rate across scenarios for segments 2-5 for PREVAIL II, MEMs with piEB10 and pie priors, and the naive pooling case. While a moderate RR=0.7 results in minimal early termination, as observed by the average sample size estimates near 1000 for each overall trial, AR balances the information available for evaluating the control and experimental arms, with the positive byproduct of more patients receiving a potentially beneficial treatment in the MEM-based designs than the PREVAIL design. Across segments, we observed an absolute maximum increase of 15.5% and 29.7% in the proportions assigned to the treatment arm for piEB10 and pie, respectively. This also corresponds to increases in the proportion surviving within non-null segments for the MEM-based designs compared to the PREVAIL II design with improve- ments observed for both piEB10 and pie. Figure 3.3(a) presents the proportion randomized to the treatment group and Figure 3.3(b) presents the proportion surviving by segment for all designs in the constant mortality scenario. pie and naive pooling randomize a similar number to the control group, while piEB10 is more conservative. The MEM-based designs and naive pooling show clear increases in the median proportion surviving in each non-null scenario compared to the PREVAIL II design. 47 l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l ll l ll l l l ll llll ll l l lll l l l l l l l l l ll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll llll l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l ll l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 0. 50 0. 55 0. 60 0. 65 0. 70 0. 75 0. 80 Pr op or tio n R an do m ize d to T re a tm en t Null Scenario 2 Scenario 3 Scenario 4 Scenario 5 PREVAIL II MEM piEB10 MEM pie Naive Pooling (a) l ll lll l l l l l l ll lll l l l l l l l ll lll l l l l l l ll lll l l l l l l l l l l l l l l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l ll l l l ll l l l ll l l ll l l l l l l l ll l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l ll l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l ll l l l l l l ll l l ll l ll l l l l l l l l l l l l l l l l l l l l ll l l l ll l l l l l l l l l l l l l l l l l l l 0. 4 0. 5 0. 6 0. 7 0. 8 Pr op or tio n Su rv iv in g Null Scenario 2 Scenario 3 Scenario 4 Scenario 5 PREVAIL II MEM piEB10 MEM pie Naive Pooling (b) l ll l l lll l l l l l l l l l l ll l l l l ll l lll l l l l l llll l l l l ll ll l l l ll l l ll ll l ll ll l l llll l ll l l l l l l l l l l ll l l l l l llll l l lll l l l lll ll l l l ll l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l ll l l l l l l l l l l ll l ll l l l l ll l l l l l l l l l l l l ll l l l l l ll l ll ll l lll ll l l ll ll l l ll ll ll l l l l l l ll l l l l l 0. 50 0. 55 0. 60 0. 65 0. 70 0. 75 0. 80 Pr op or tio n R an do m ize d to T re a tm en t Null Scenario 2 Scenario 3 Scenario 4 Scenario 5 PREVAIL II MEM piEB10 MEM pie Naive Pooling (c) l l l l l l l l ll ll ll l l l l l l l l l l ll ll l l l l l l l llll l l l l l l l ll ll l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l lll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l ll l l l l l l l l l l lll l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l ll l ll ll l l l l l l ll l l l l l l ll l l ll l ll l l l l ll l l l l l l l l l l ll l l l l l ll l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l ll l l l l l l l l ll lll l l l l l l l l l l l l l l l ll l l ll l l l l l l l l ll ll l ll ll l l l l l l l l l l ll l l l ll l l ll l l l l l l l ll l l l l 0. 2 0. 4 0. 6 0. 8 Pr op or tio n Su rv iv in g Null Scenario 2 Scenario 3 Scenario 4 Scenario 5 PREVAIL II MEM piEB10 MEM pie Naive Pooling (d) Figure 3.3: Proportion assigned to treatment arm across segments 2-5 (left) and proportion surviving across segments 2-5 (Null scenario) or within the non-null segment (Scenarios 2- 5) (right) under the constant mortality scenario (top) and the varying mortality scenario (bottom) for the PREVAIL II design, MEM-based designs, and naive pooling. Table 3.4 presents simulation results for the varying mortality scenario using the thresh- olds calibrated for the constant mortality scenario. Recall that this scenario is motivated by the mortality rate observed in Sierra Leone during the West Africa Ebola outbreak and is more challenging than the constant mortality scenario. The PREVAIL II design 48 maintains a type-I error rate between approximately 0.025 to 0.03 since no supplemental information is incorporated across segments, but the power steadily decreases with each segment as the underlying mortality rate continues to drop and the absolute difference in the mortality rate due to an effective treatment shrinks. Figure 3.2 clearly identifies that the MEM-based design with pie (filled-in triangles) and naive pooling (filled-in diamonds) result in drastic inflation to the average type-I error rates, which negate any benefit due to increased power compared to the PREVAIL II design. However, the more conserva- tive piEB10 prior (filled-in squares) demonstrates a more acceptable trade-off, where the largest inflation of the type-I error rate occurs in segment 5, increasing from 0.026 for the PREVAIL II design (open triangles) to 0.064 for piEB10 , while power increases from 2% to 51% across non-null segments. Figures 3.3 (c) and (d) demonstrate similar operating characteristics to the constant mortality scenario, where the MEM-based designs and naive pooling assign a higher proportion of subjects to the treatment arm with increases in the proportion surviving the non-null segments as compared to the PREVAIL II design. When considering Tables 3.3 and 3.4 together, the MEM-based design with piEB10 offers perhaps the best performance when taking into account all operating characteristics and trial properties. While there is an inflation of the type-I error rate in the varying mortality case, we believe it is within acceptable levels when you consider the context of a rapidly developing disease outbreak and that there are clear improvements to power, the proportion randomized to the treatment arms, and the proportion surviving within non-null segments compared to the PREVAIL II design. Section B of the Supplementary Materials provides additional simulation results. Sec- tion B.1 provides results when thresholds are calibrated to control the type-I error rates in the varying mortality scenario while maintaining the power for the PREVAIL II design in the constant mortality scenario. Section B.2 provides results for scenarios with RR=0.5. Section B.3 provides results with different parameter values for nburn, B, and c. Section B.4 provides results with two effective therapeutics. 49 Probability Reject in Segment Mean (sd) Mean (sd) Mean (sd) Trt 1 2 3 4 5 N Prop Trt Prop Surv P-II S0 0.032 0.028 0.029 0.031 0.029 996 (25.58) 0.5 (0) 0.600 (0.017) S2 0.032 0.432 0.028 0.029 0.028 988 (38.41) 0.5 (0) 0.659 (0.036) S3 0.032 0.028 0.431 0.029 0.030 988 (38.89) 0.5 (0) 0.659 (0.036) S4 0.032 0.028 0.029 0.434 0.028 988 (38.79) 0.5 (0) 0.659 (0.036) S5 0.032 0.028 0.029 0.031 0.441 988 (38.78) 0.5 (0) 0.659 (0.036) piEB10 S0 0.027 0.026 0.026 0.030 0.026 998 (14.97) 0.655 (0.029) 0.600 (0.017) S2 0.027 0.470 0.025 0.028 0.024 990 (32.03) 0.642 (0.032) 0.669 (0.035) S3 0.027 0.026 0.509 0.031 0.025 990 (31.49) 0.634 (0.035) 0.676 (0.035) S4 0.027 0.026 0.026 0.548 0.028 990 (31.09) 0.638 (0.032) 0.682 (0.035) S5 0.027 0.026 0.026 0.030 0.560 991 (30.96) 0.654 (0.029) 0.686 (0.036) pie S0 0.027 0.027 0.026 0.033 0.037 998 (14.63) 0.797 (0.021) 0.600 (0.017) S2 0.027 0.556 0.017 0.020 0.026 990 (32.08) 0.785 (0.032) 0.683 (0.035) S3 0.027 0.027 0.611 0.021 0.024 989 (32.85) 0.782 (0.038) 0.696 (0.035) S4 0.027 0.027 0.026 0.694 0.027 988 (33.67) 0.784 (0.034) 0.701 (0.035) S5 0.027 0.027 0.026 0.033 0.745 987 (34.80) 0.794 (0.023) 0.701 (0.035) POOL S0 0.027 0.022 0.030 0.036 0.040 998 (15.38) 0.825 (0.009) 0.600 (0.017) S2 0.027 0.581 0.011 0.017 0.025 984 (37.24) 0.817 (0.024) 0.691 (0.036) S3 0.027 0.022 0.682 0.018 0.024 982 (38.71) 0.814 (0.030) 0.701 (0.036) S4 0.027 0.022 0.030 0.731 0.028 981 (38.51) 0.814 (0.028) 0.702 (0.036) S5 0.027 0.022 0.030 0.036 0.778 981 (38.75) 0.821 (0.013) 0.702 (0.036) Table 3.3: Operating characteristics and trial properties for the utilized platform design as well as alternative adaptive platform designs. 25,000 simulations for the constant under- lying mortality case (p = 0.4 for all segments) with RR=0.7 for non-null segments for the PREVAIL II (P-II) master protocol; MEMs incorporating adaptive randomization with the constrained empirical Bayes, c = .10 prior (piEB10) and the fully Bayesian uniform prior (pie); and the naive pooling (POOL) of all supplemental information incorporating adaptive randomization using posterior probability thresholds optimized for the constant mortality case. Results provided for power/type-I error for each segment, average (sd) total sample size (N) across entire trial, average (sd) proportion allocated to treatment arm in segments 2-5, and average (sd) proportion surviving in the non-null segments (for Trt=S2-S5) or across segments 2-5 (for Trt=S0). 50 Probability Reject in Segment Mean (sd) Mean (sd) Mean (sd) Prior Trt 1 2 3 4 5 N Prop Trt Prop Surv P-II S0 0.028 0.030 0.032 0.027 0.025 997 (22.73) 0.5 (0) 0.580 (0.017) S2 0.028 0.764 0.027 0.025 0.027 972 (51.38) 0.5 (0) 0.482 (0.041) S3 0.028 0.030 0.555 0.026 0.026 984 (42.33) 0.5 (0) 0.591 (0.038) S4 0.028 0.030 0.032 0.386 0.026 990 (35.35) 0.5 (0) 0.693 (0.035) S5 0.028 0.030 0.032 0.027 0.235 994 (27.80) 0.5 (0) 0.804 (0.029) piEB10 S0 0.027 0.040 0.048 0.058 0.061 998 (16.20) 0.543 (0.016) 0.580 (0.017) S2 0.027 0.781 0.042 0.063 0.070 971 (47.98) 0.557 (0.019) 0.487 (0.038) S3 0.027 0.040 0.637 0.052 0.069 983 (39.13) 0.551 (0.018) 0.597 (0.036) S4 0.027 0.040 0.048 0.511 0.057 990 (31.19) 0.546 (0.016) 0.699 (0.032) S5 0.027 0.040 0.048 0.058 0.354 994 (24.99) 0.542 (0.016) 0.807 (0.027) pie S0 0.027 0.102 0.165 0.213 0.262 995 (23.33) 0.665 (0.044) 0.580 (0.017) S2 0.027 0.854 0.094 0.236 0.326 961 (52.95) 0.689 (0.042) 0.500 (0.038) S3 0.027 0.102 0.768 0.134 0.307 972 (46.86) 0.673 (0.042) 0.611 (0.036) S4 0.027 0.102 0.165 0.717 0.198 980 (41.23) 0.665 (0.043) 0.712 (0.032) S5 0.027 0.102 0.165 0.213 0.637 986 (36.15) 0.663 (0.043) 0.816 (0.027) POOL S0 0.027 0.342 0.729 0.561 0.736 926 (54.93) 0.785 (0.038) 0.576 (0.023) S2 0.027 0.997 0.171 0.486 0.674 878 (48.89) 0.749 (0.034) 0.510 (0.047) S3 0.027 0.342 0.986 0.199 0.648 884 (39.12) 0.740 (0.029) 0.621 (0.047) S4 0.027 0.342 0.729 0.947 0.376 893 (44.84) 0.752 (0.036) 0.720 (0.039) S5 0.027 0.342 0.729 0.561 0.955 884 (56.21) 0.773 (0.039) 0.824 (0.033) Table 3.4: Operating characteristics and trial properties for the utilized platform design as well as alternative adaptive platform designs. 25,000 simulations for the varying underlying mortality case (p = (0.74, 0.61, 0.48, 0.36, 0.23) for segments 1-5, respectively) with RR=0.7 for non-null segments for the PREVAIL II (P-II) master protocol; MEMs incorporating adaptive randomization with the constrained empirical Bayes, c = .10 prior (piEB10) and the fully Bayesian uniform prior (pie); and the naive pooling (POOL) of all supplemental information incorporating adaptive randomization using posterior probability thresholds optimized for the constant mortality case. Results provided for power/type-I error for each segment, average (sd) total sample size (N) across entire trial, average (sd) proportion allocated to treatment arm in segments 2-5, and average (sd) proportion surviving in the non-null segments (for Trt=S2-S5) or across segments 2-5 (for Trt=S0). 51 3.5 Discussion In the context of rapidly developing disease outbreaks, there is a need for flexible, dynamic methods to identify effective therapeutics as quickly as possible. In the EVD outbreak from 2014–16, the PREVAIL II study was designed to test multiple therapies within one overarching trial while incorporating aggressive interim monitoring to identify effective treatments as early as possible. However, the PREVAIL II trial design did not utilize previous segments of the control arm to improve the efficiency of the analyses and is per- haps suboptimal given the availability of adaptive design features and Bayesian hierarchical modeling techniques. To address this shortcoming we proposed incorporating MEMs with AR to estimate the extent of exchangeability across segments administering identical treat- ment regimes and balance allocation in relation to ESSS. Note that the modification to the design is general and can be applied with other methods for incorporating supplemental information. There are many approaches to AR which could have been incorporated to our proposed design. For example, Berry and Eick (1995) compare four different response adaptive approaches to a standard equal randomization scheme and identify improvements using AR in many scenarios. More recently, Thall and Wathen (2007) explored the use of a positive constraint on Bayesian AR in order to limit the extent of adjusting the allocation ratio. Methods also exist which use biomarker information to adaptively randomize individuals during a study to more advantageous trial arms based on an individual’s biomarker profile (Zhou et al., 2008) or to adjust patient allocation in trials with binary outcomes to address covariate imbalances such that more patients can access the superior treatments identified in the study (Ning and Huang, 2010). Other methods utilize predictive probability and Bayesian AR to treat more patients with the more efficacious treatment while enabling early termination if superiority or equivalence can be demonstrated before trial completion (Yin et al., 2012). However, outcome-AR techniques, which can lead to an imbalance in the sample size or poor operating characteristics, are controversial (Thall et al., 2015). Our proposed de- sign utilizes AR to target information balance, which results in additional allocation to experimental therapies in the presence of exchangeable information contributed by supple- mental controls. In the context of the EVD outbreak, where concerns were raised about 52 the appropriateness of randomized trials due to ethical or practical concerns (Adebamowo et al., 2014; Ippolito et al., 2016), the modified multi-source adaptive platform design offers perhaps an ideal trade-off: controlling for cohort bias with the potential to randomize more study participants to emerging therapies. The methods presented herein represent a useful tool for designing platform trials to address future infectious disease outbreaks, such as the Zika virus or future Ebola outbreaks. It can be noted that 1:1 allocation is maintained in the absence of evidence supporting exchangeable controls. Further, while AR has historically been shown to result in poten- tially undesirable operating characteristics, MEMs with AR under piEB10 maintained type-I error in the constant mortality scenario with reasonable inflation in the varying underlying mortality case, while increasing the power to detect an effective drug in both scenarios. This approach to AR addresses the concerns of sample size imbalance and potentially undesirable operating characteristics raised by Thall et al. (2015). Even though the simulations and designs presented in this chapter are unique to the EVD outbreak, they can be generalized to other settings of clinical study, such as screening platform for intermediate-phased drug trials as well as sequential experimental designs of biomarker assays. In addition, a number of other parameters can be adjusted (e.g., priors on the MEM weights, the components of the adaptive randomization, and the frequency of interim monitoring) to achieve the desired operating characteristics in other settings where incorporating supplemental control information is desired. The number of different parameters, while demonstrating the flexibility of our design, also represents a potential limitation. It may be challenging to identify the most appropri- ate parameters to use given the many unknowns of a rapidly developing disease outbreak. However, this can be moderated by assuming conservative priors on the MEM weights, such as c = 0.1 for piEBc , which ensures that posterior model weight is given to the MEM which assumes no exchangeable segments while incorporating information in a manner that results in an improvement to the power, survivorship, and proportion randomized to the treatment arm. The value chosen for c reflects a boundary imposed on the probability of exchangeability and should be evaluated in the context of each trial. One should explore values between 0 and 1 to identify values of c with better performance, but in our experi- ence it should be set at a low value in order to moderate the influence of piEB which assigns 53 all prior weight to a single source, and as a result, posterior, weight to a single MEM. Further, while calibration may be challenging, the proposed piEBc prior is based on a single hyperparameter, a benefit compared to Bayesian non-parametric density estimation with finite- or infinite-mixtures or the traditional BMA framework which requires sets of priors for all 2K models. Another potential limitation is that simulation results presented in this chapter only consider the scenario with one effective treatment across all segments. However, in the context of a rapidly emerging infectious disease, it is unlikely that an effective treatment will be present in all segments, with many therapies potentially nominated for inclusion based on pharmacokinetic studies, laboratory research, or hypothetical causal drug effects by simulation rather than more rigorous clinical trials. Results for cases with two out of five effective treatments are presented in Section B.4 of the Supplementary Materials and have similarly encouraging operating characteristics and trial properties as presented in Section 3.4.2. Chapter 4 A Fully Bayesian Mixture Model Approach for Identifying Non-Compliance in a Regulatory Tobacco Clinical Trial 4.1 Introduction The Center for the Evaluation of Nicotine in Cigarettes, project 1 (CENIC-p1), was a 6- week randomized multi-center trial designed to evaluate the effect of nicotine reduction on tobacco use behavior (Donny et al., 2015). 839 current smokers underwent randomization, with 780 completing the 6-week study after being equally randomized to one of seven groups, including a usual brand control condition and one of six experimental cigarettes with nicotine content ranging from 15.8 mg per gram of tobacco (normal nicotine controls) to 0.4 mg per gram of tobacco (very low nicotine content (VLNC) cigarettes). At the end of six weeks, participants randomly assigned to the lowest nicotine condition had significantly reduced tobacco use, dependence, and nicotine exposure relative to the normal nicotine controls. In 2009, Congress passed the Family Smoking Prevention and Tobacco Control Act (FSPTCA), which gave the Food and Drug Administration (FDA) the authority to regulate 54 55 the content, marketing, and sale of tobacco products. In particular, the FSPTCA gives the FDA the authority to reduce the nicotine content of cigarettes to non-addictive levels (but not zero) if it can be shown that this will improve public health. CENIC-p1 provides key scientific evidence in support of a new nicotine standard for cigarettes, but understanding the impact of mandated nicotine reduction on tobacco use behavior is difficult due to the presence of substantial non-compliance to randomized treatment assignment. In CENIC-p1, non-compliance is defined as smoking any commercially available, non- study cigarettes in place of, or in addition to, the study cigarettes provided by the trial. Identifying non-compliance in CENIC-p1 is difficult because the most direct measure of non-compliance, self-reported smoking of non-study cigarettes, is known to be unreliable (Nardone et al., 2016). An alternative strategy is to use biomarkers of nicotine exposure (i.e., cotinine or total nicotine equivalents (TNEs)) to identify non-compliance among self- reported compliers (Benowitz et al., 2015). For example, a recent study characterized the distribution of biomarkers of nicotine exposure in participants who were sequestered in a hotel for five days to ensure that they only smoked cigarettes with 0.4 mg of nicotine per gram of tobacco (Denlinger et al., 2016). In this study, the 95th percentile for TNEs in fully compliant subjects was estimated at 6.41 nmol/ml, which has been used as a threshold for identifying non-compliance in secondary analyses of CENIC-p1 (Tidey et al., 2017; Rup- precht et al., 2017). Alternatively, Boatman et al. (2017) used a mixture model approach, which incorporated the data from Denlinger et al. (2016), to estimate the probability that a subject randomized to the VLNC groups was compliant conditional on their biomarker values, which facilitated estimation of the causal effect of nicotine reduction on cigarettes smoked per day. While the results of Denlinger et al. (2016) provide thresholds for identifying non- compliance in the VLNC group, they can not be used to identify non-compliance at the intermediate dose-levels (i.e., nicotine dose-levels between 0.4 mg/g and 15.8 mg/g). The mixture model approach proposed by Boatman et al. (2017) could be used to identify non-compliance at the intermediate dose-levels but fitting mixture models is known to be challenging due to label switching or overlap between the mixture components (Marin et al., 2005; Jasra et al., 2005; Stephens, 2000). Alternatively, we could leverage our understanding of the biological relationship between the nicotine content of cigarettes and 56 biomarkers of nicotine exposure to better estimate the mixture components (Benowitz et al., 2015). We expect the nicotine content of the cigarettes to have an additive effect on biomarkers of nicotine exposure and we could specify a model that incorporates this relationship, in which case the mean of the compliant mixture component is estimated relative to the control or usual brand groups. This could improve estimation of the mixture components, but is also susceptible to model misspecification. Having two different modeling frameworks represents a challenge given the uncertainty as to which model is most appropriate for the data. The Bayesian framework is naturally able to account for the uncertainty around selecting the “best” approach by considering multiple models simultaneously and averaging the results, thus avoiding the need to select the “best” model. For example, the posterior estimates for each model can be estimated using Markov chain Monte Carlo (MCMC) techniques which account for the different assumptions of each approach, such as the proposed biological relationship or various pa- rameter constraints, and model averaging can smooth over the posteriors from each model. However, model averaging in our context is made more difficult because the dimensionality of the parameter space differs between the two proposed approaches. The reversible-jump MCMC (RJMCMC) algorithm addresses this difficulty by providing the flexibility to ac- count for the different dimensionalities while implementing model averaging within each dose-level (Green, 1995). This allows us to “mix and match” which approach is most ap- propriate for each dose-level so that any gains in efficiency from the biological relationship can be used within the applicable dose-levels while avoiding the introduction of bias when it is inappropriate. The remainder of this chapter proceeds as follows. In Section 4.2 we define finite Gaus- sian mixture models with two components (compliant and non-compliant) and discuss how our model specification can be adapted to account for the hypothesized relationship be- tween the nicotine content of cigarettes and biomarkers of nicotine exposure. A RJMCMC algorithm for averaging over various models is described in Section 4.2.2. Simulation re- sults evaluating our modeling approach when the hypothesized relationship does and does not hold are presented in Section 4.3 and we apply our proposed method to CENIC-p1 in Section 4.4. Finally, we conclude with a brief discussion in Section 4.5. 57 4.2 Methods for estimating compliance within randomized groups We first introduce the notation that will be used throughout the remainder of this chapter. Let i = 1, ..., n be an index for each observation, j = 1, ..., J index the different randomized groups, and k = 1, 2 represent the two components of the mixture distribution within a level j, where k = 1 implies compliance and k = 2 implies non-compliance. Further, let y represent the observed outcome, such that yij represents participant i who belongs to randomization group j. The goal of this Chapter is to identify non-compliance to randomized treatment as- signment. As discussed previously, two approaches to this problem utilizing biomarkers of nicotine exposure have been proposed in the literature. Denlinger et al. (2016) identified quantiles of the biomarker distribution (e.g., 90th, 95th, etc.) that can be used as thresh- olds for identifying non-compliant individuals, while Boatman et al. (2017) estimated the probability of individual compliance conditional on the observed biomarkers through an application of Bayes theorem. Regardless of which approach is used, both require at least estimating the distribution of the biomarkers in fully compliant subjects. This requires es- timation of a mixture distribution because we do not know which subjects were compliant, which is challenging in the absence of auxiliary data from known compliers. We first specify a two component mixture distribution for an arbitrary treatment group j, assuming no dose-response relationship across dose-levels, which we will label the “IND model”. For group j, let µj1 be the mean of the compliant component (k = 1), µj2 be the mean of the non-compliant component (k = 2), τj be the common precision of the Gaussian distributions for compliant and non-compliant subjects, and pj be the mixture weight, which represents the proportion of non-compliant subjects in group j. This results in the following likelihood for group j: p(y|µj1, µj2, τj , pj) = n∏ i=1 [(1− pj)N (yi|µj1, τj) + pjN (yi|µj2, τj)] , (4.1) whereN is a Gaussian density with the given mean and precision. While intuitive, this form of the likelihood is susceptible to the issue of label-switching (i.e., the peaks of the posterior 58 distribution for the two components switch such that µj1 is actually estimating the mean for the second component and µj2 is estimating the first component mean) and makes deriving the conditional posterior distributions under conjugate priors very challenging. To address these issues, the likelihood can be reparameterized as follows. First, to address the potential for label-switching, redefine µj1 as µj and µj2 as µj + θj , where θj > 0. This implies that µj < µj + θj and ensures that the non-compliant component mean will be larger than the compliant component mean. Second, to address the challenges in deriving the conditional posteriors, we introduce a latent indicator variable z through data augmentation (Chib, 1996). Define zijk = 1 when yi belongs to both group j and component k, and 0 otherwise. These changes induce a complete data likelihood considering both our known y and unknown z of p(y, z|µj , θj , τj , pj) = n∏ i=1 [(1− pj)×N (yi|µj , τj)]zij1 [pj ×N (yi|µj + θj , τj)]zij2 . (4.2) In the setting where J = 1, the notation can be simplified by replacing zij2 = 1− zij1, however with multiple j this substitution implies that 1−zij1 = 1 not only when participant i in group j belongs to k = 2, as desired, but also when a participant does not belong to group j. Extending (4.2) to multiple j, let Θ = (p,µ,θ, τ ), where the bold notation represents vectors including the corresponding parameter for each of the j groups. The complete data likelihood for all subjects can be written as: p(y, z|Θ) = n∏ i=1  J∏ j=1 [(1− pj)N (yi|µj , τj)]zij1 [pjN (yi|µj + θj , τj)]zij2  . (4.3) The complete data likelihood can be modified to incorporate the hypothesized biological relationship discussed earlier, which we refer to as the “REL model” for the remainder of the Chapter. First, within CENIC-p1, assume that group J represents the combined usual brand control and 15.8 mg/g groups, where all participants are assumed to be compliant. Then assume that some relationship exists between µj and µJ that can be defined by an arbitrary function h: µ∗j = hj(µJ). In this case we no longer estimate each µj , j 6= J , but only µJ . Estimates for each µj are based on the proposed relationship, hj(µJ). Note, data are now utilized across all levels to estimate µJ , but estimates of θj are derived within each 59 level to account for differences in the mean for the non-compliers across dose-levels due to differences in the prevalence of partial compliance or smoking behavior across dose-levels. Specifically in the context of CENIC-p1 for the log-normally distributed TNE biomarker, we will consider the following proposed relationship. Letting wj represent the nicotine content for dose level j, we propose the following function, hj , that maps the ratio of the nicotine contents to the mean of the biomarker distribution: µ∗j = hj(µJ) = log ( wj wJ ) + µJ . (4.4) Biomarkers of nicotine exposure are thought to be linearly related to the nicotine con- tent of the cigarettes and the proposed relationship imposes a linear relationship between the nicotine content of the cigarettes and the median on the original scale. Given this relationship, the complete data likelihood is very similar, but with µ∗j in place of µj : p(y, z|Θ) = n∏ i=1  J∏ j=1 [(1− pj)N (yi|µ∗j , τj)]zij1 [pjN (yi|µ∗j + θj , τj)zij2 ]  . (4.5) When considering all j levels, it may be the case that some levels are better estimated with the IND approach and others the REL approach. In this case, we can also average over intermediate models where some dose-levels are estimated assuming the IND model, while others follow the REL model. For example, within CENIC-p1 there are a total of 16 potential models, m = 1, ..., 16, considering all the possible combinations of modeling frameworks assumed for the four non-control groups where the state of the model can be denoted by a vector, ξm, where 0 indicates the IND model and 1 the REL model. For example, ξm = (1, 0, 0, 0) indicates that group 1 is estimated from the REL model, while groups 2-4 are estimated from the IND model. The RJMCMC algorithm can traverse the model space of 16 models with different parameter dimensionality to average over all possible combinations. 4.2.1 Prior distributions and posterior inference In this section, we discuss prior distributions and posterior inference for the IND and REL approaches. In general, the posterior will not be available in closed form and must be 60 approximated using MCMC. For treatment group j, we assume the following prior distributions for the various model parameters: µj ∼N (0, sµj), θj ∼N (0, sθj), τj ∼Γ(aj , bj), pj ∼Beta(αj , βj). (4.6) Letting njk represent the number of observations in component k of group j and nj =∑K k=1 njk, the joint posterior for the IND model is proportional to the following: p(Θ|y, z) ∝ J∏ j=1 { (1− pj)nj1+βj−1pnj2+αj−1j × τ nj 2 +aj−1 j exp (− bjτj)× exp(−sµj 2 µ2j ) × exp ( −sθj 2 θ2j ) × exp ( −τj 2 [ n∑ i=1 zij1(yi − µj)2 + zij2(yi − [µj + θj ])2 ])} . (4.7) From (4.7), the conditional posterior for each parameter in the IND model within group j is: p(µj |θj , τj ,y, z) ∝ N ( τj(nj1y¯j1 + nj2(y¯j2 − θj)) njτj + sµj , njτj + sµj ) , (4.8) p(θj |µj , τj ,y, z) ∝ N ( nj2τj(y¯j2 − µj) nj2τj + sθj , nj2τj + sθj ) , (4.9) p(τj |µj , θj ,y, z) ∝ Γ (nj 2 + aj , 1 2 [ (nj1 − 1)s2j1 + (nj2 − 1)s2j2+ nj1(y¯j1 − µj)2 + nj2(y¯j2 − (µj + θj))2 + 2bj ]) , (4.10) p(pj |y, z) ∝ Beta(nj2 + αj , nj1 + βj), (4.11) p(zij2 = 1|yij ,Θ) ∝ Ber ( pjN (yij |µj + θj , τj) (1− pj)N (yij |µj , τj) + pjN (yij |µj + θj , τj) ) , (4.12) where y¯jk is the mean of yijk for group j and component k. The previous derivation can be easily extended to the REL model where hj(µJ) is 61 specified as in (4.4). The conditional posteriors for the REL model, with some additional notation, where µ∗j = hj(µJ) is dj = log ( wj wJ ) , (4.13) gj =τj(njdj + nj2θj − nj1y¯j1 − nj2y¯j2), (4.14) p(µJ |Θ,y, z) ∝ N nJτJ y¯J −∑J−1j=1 gj∑J j=1(njτj) + sµJ , J∑ j=1 (njτj) + sµJ  , (4.15) p(θj |µj , τj ,y, z) ∝ N ( nj2τj(y¯j2 − µ∗j ) nj2τj + sθj , nj2τj + sθj ) , (4.16) p(τj |µj , θj ,y, z) ∝ Γ (nj 2 + aj , 1 2 [ (nj1 − 1)s2j1 + (nj2 − 1)s2j2+ nj1(y¯j1 − µ∗j )2 + nj2(y¯j2 − (µ∗j + θj))2 + 2bj ]) , (4.17) p(pj |y, z) ∝ Beta(nj2 + αj , nj1 + βj), (4.18) p(zij2 = 1|yij ,Θ) ∝ Ber ( pjN (yij |µ∗j + θj , τj) (1− pj)N (yij |µ∗j , τj) + pjN (yij |µ∗j + θj , τj) ) . (4.19) Note that the previously derived conditional posteriors for θ, p, τ , and z are un- changed from the IND model, except that estimates for µj are derived from the proposed relationship. 4.2.2 Model averaging with RJMCMC We now discuss how we will average over the IND, REL, and intermediate models using RJMCMC. RJMCMC accounts for the varying dimensionality of the parameter space dur- ing model averaging. For example, the IND model estimates µj from the group j observed data only, but in the REL model µj is derived from the proposed relationship with µJ , representing a change in the dimensionality of the parameter space due to the assumed relationship between µj and µJ . As noted above, we will consider intermediate models, in addition to the IND and REL models, to account for variability in the appropriateness of the hypothesized relationship between nicotine content and biomarkers of nicotine exposure across dose-levels. That is, the hypothesized relationship may be appropriate for some dose levels but not others. For 62 example, if ξm = (1, 0, 0, 1), µJ is updated using only groups j = 1, 4 and the control group j = 5, while groups j = 2 and 3, are not included because only data from their own group are used to derive parameter estimates. While described in greater detail below, the general steps of the RJMCMC algorithm are: 1. Update the parameters conditional on the current model using the Gibbs sampler discussed in Section 4.2.1. 2. Update the model, m, conditional on the current parameter values using the RJM- CMC algorithm which includes the following steps: 2.a Randomly select a group j′ from j = 1, ..., J − 1 with equal probability as a candidate to change states from REL to IND or vice versa. 2.b Accept this proposed move with some probability. We introduce additional notation for RJMCMC. Let q(·) be an (arbitrary) proposal density for parameters that must be generated as the result of proposed moves that increase the dimensionality of the parameter space, f(y|Θ,m) represent the likelihood of our data given Θ in model m, and p(m) represent the prior probability of model m. Let Ij = 1 be an indicator that group j is estimated using the IND model, with Ij = 0 indicating that group j is estimated using the REL model and define p(Ij = 1) to be the prior probability that Ij = 1 with p(Ij = 0) = 1 − p(Ij = 1), the prior probability that Ij = 0. We define p(m) as the product of these priors: ∏J−1 j=1 [p(Ij = 1)] Ij [1 − p(Ij = 1)]1−Ij . If p(Ij = 1) = 0.5, all models receive equal prior weight, whereas p(Ij = 1) > 0.5 favors IND for group j and p(Ij = 1) < 0.5 favors REL for group j. In Step 2a, the intermediate approaches can be conceptualized as a set of nested models of the REL approach where all groups are estimated assuming the relationship, so the proposal scheme suggested by Green and Hastie (2009) for scenarios with nested models is implemented. In the scheme for nested models, at each iteration we propose a move from (Θ,m) to (Θ′,m′) by randomly selecting j′ ∈ 1, ..., J − 1 with equal probability as the candidate dose-level to flip from IND to REL or vice versa. The change in the state of j′ will increase or decrease the dimension of the parameter space by 1. For example, 63 if ξm = (1, 0, 0, 0) and j ′ = 3, then ξm′ = (1, 0, 1, 0) with group 3 switching from IND to REL and the parameter dimensionality decreases by 1. We note that, while we are considering a RJMCMC algorithm that only considers state-changes one-dose-level-at-a- time, an alternative proposal framework would be to consider jumping to any of the 15 other permutations of the model states for m′ rather than flipping one group at each iteration. In greater detail, if the proposed move changes group j′ from the REL model to the IND model (increasing dimensionality), we define the bijective functions (i.e., one-to-one and onto functions that map to only a single value in the range of possible values) for our proposed values to be: p′ = p, τ ′ = τ , θ′ = θ, µ′j 6=j′ = µj 6=j′ , µ′j′ = u, (4.20) where u ∼ q(·), such that q(·) is some (arbitrary) proposal distribution for µ′j′ . We define q(·) as the corresponding conditional posterior distribution from (4.8). Conversely, the reverse move for group j′ from IND to REL (decreasing dimensionality) is determined completely by the move from REL to IND described above: p = p′, τ = τ ′, θ = θ′, µj 6=j′ = µ′j 6=j′ , u = µ′j′ . (4.21) We note that when decreasing the dimensionality we do not have to simulate from a proposal distribution, but set µj′ at the value determined by the proposed relationship to µJ . 64 In step 2b, the probability of accepting a proposed move is equal to min(1, A), where: A = p(Θ′,m′|y)P (m|m′)q′(u′) p(Θ,m|y)P (m′|m)q(u) ∣∣∣∣∂(Θ′,u′)∂(Θ,u) ∣∣∣∣ , (4.22) where P (m|m′) denotes the probability of proposing to move from m to m′, ∣∣∣∂(Θ′,u′)∂(Θ,u) ∣∣∣ is the Jacobian, and p(Θ,m|y) is the joint distribution of Θ and the model m: p(Θm,m|y, z) ∝ p(y, z|Θm,m)p(Θm|m)p(m). In our implementation, each group j has an equal chance of being selected and flipped within each iteration, P (m′|m) = P (m|m′). Therefore, the A term simplifies to Aj′:REL→IND = p(Θ′,m′|y, z)P (m|m′)q′(u′) p(Θ,m|y, z)P (m′|m)q(u) ∣∣∣∣∂(Θ′,u′)∂(Θ,u) ∣∣∣∣ = p(Θ′,m′|y, z) p(Θ,m|y, z)q(u) = f(y, z|Θ′,m′)p(Θ′|m′)p(m′) f(y, z|Θ,m)p(Θ|m)p(m)q(u) . (4.23) The reverse move from the model where estimation of group j′ is changed from IND to REL can be described by the inverse of the stated acceptance probability such that min(1, Aj′:IND→REL) = min(1, A−1j′:REL→IND). The final step in 2b is to simulate a single value v ∼ Unif(0, 1) and accept the move to the proposed state if v ≤ min(1, A). If the proposed move is not accepted, the original model m continues to the next iteration with its original parameter values. 4.3 Simulation studies to establish small sample properties In this section, we present the results of a small simulation study to evaluate the small sample properties of the model averaging approach discussed in Section 4.2.2. Our sim- ulation study will mimic the data collected in CENIC-p1; we consider 5 dose-levels, with dose-level 5 treated as the normal nicotine condition, dose-levels 1-4 treated as varying levels of reduced nicotine content conditions and nj = 100 for all dose-levels. Data for the control condition are drawn from a single component, N (3, 1). Data for the four treatment conditions are simulated from a two-component mixture distribution with true probability 65 of non-compliance, pj , equal to 0.70, which is approximately equal to the estimated prob- ability of non-compliance for the 0.4 mg/g group reported in Nardone et al. (2016) and Boatman et al. (2017). Data for the non-compliant component were drawn from a N (3, 1) distribution for all dose-levels. Data for the compliant components were drawn from a normal distribution with τ = 1 and a mean that varied as a function of dose-level. For the remainder of this section, we specify the mean for the reduced nicotine content groups by the effect size, defined as ES= 3−µj σ . That is, we specify the effect size, which in turn specifies µj . For purposes of estimation, the hypothesized dose-response relationship for the REL model assumes reductions of 98.1%, 95.0%, 86.5%, and 63.2% in the mean of the biomarker for groups 1 through 4 relative to the control condition, corresponding to ES of 4, 3, 2, and 1 for groups 1 through 4, respectively. For all j dose-levels, assume “non- informative” prior specifications of αj = βj = 1 for the beta prior on pj , aj = bj = 0.001 for the gamma prior on τj , and sµj = sθj = 0.00001 for the normal priors on µj and θj . We consider six scenarios for the true relationship between the mean in the compliant component and the dose-level. The first scenario assumes that the hypothesized relation- ship in the REL model is correct, in which case ES=(4,3,2,1) and µj1 = (−1, 0, 1, 2) for dose-levels 1-4, respectively. The second scenario assumes that dose-levels 1 through 3 follow the hypothesized relationship but dose-level 4 does not: ES=(4,3,2,0.5) and µj1 = (−1, 0, 1, 2.5). The third scenario assumes that dose-levels 1 and 2 follow the hypothesized relationship but dose-levels 3 and 4 do not: ES=(4,3,1,0.5) and µj1 = (−1, 0, 2, 2.5). The fourth scenario assumes that only dose-level 1 follows the relationship: ES=(4,1.5,1,0.5) and µj1 = (−1, 1.5, 2, 2.5). The fifth scenario assumes that the effect size is less than the hypothesized effect size for all dose-levels: ES=(2,1.5,1,0.5) and µj1 = (1, 1.5, 2, 2.5). Fi- nally, the sixth scenario assumes that the effect size is more than the hypothesized effect size for all dose-levels: ES=(6,4.5,3,1.5) and µj1 = (−3,−1.5, 0, 1.5). 1,000 simulated studies were completed for each scenario. Simulation results are pre- sented for the IND model, the REL model with our hypothesized relationship, and for our RJMCMC approach that averages over the IND, REL, and intermediate approaches. Two different model priors are chosen for the RJMCMC simulations: assuming each model specification is equally likely (RJ) with p(Ij = 1) = 0.5 for each group j, versus favoring models assuming IND for each level (RJ95) with p(Ij = 1) = 0.95 for each group j. Within 66 each MCMC, a chain with 10,000 total iterations was used, with the first 1,000 iterations excluded for the burn-in period. The performance of each model is summarized by bias, standard deviation (SD), and mean square error (MSE) within each dose-level j for the proportion compliant (pc), the mean of the compliant component (µ1), the mean of the non-compliant component (µ2), their shared standard deviation (σ), and the 95th per- centile of the distribution of the biomarker on the original scale (i.e., ey), which is included as this is a quantity of interest for the analysis of the data from CENIC-p1. We note that the SD for the REL, RJ, and RJ95 models are presented as a ratio relative to the SD for the IND model in order to illustrate the gain in efficiency due to borrowing. Addition- ally, the RJMCMC results also provide the mean (standard deviation) of the proportion of iterations spent in states where the hypothesized relationship is true for each dose level. Simulation results for Scenario 1 are presented in Table 4.1. In Scenario 1, the assumed relationship is correct for all dose-levels and we expect that model averaging will lead to increased efficiency compared to the model that estimates all components. All four models are mostly unbiased across the posterior estimates, with the exception of the proportion compliant and the non-compliant component mean for the group with ES=1. Additionally, the IND approach shows greater bias in the estimation of the compliant component mean for the group with ES=2 and its corresponding 95th percentile of the TNE distribution. Both RJMCMC models strongly favor the REL model with each level spending more than 84% of the chain in states that assume the hypothesized relationship. The REL model is more efficient than IND, as is evident by the SD ratio values below 1, and, because the two RJMCMC models strongly favor REL, they are also much more efficient than IND. For example, we observe a 50% to 80% decrease in the standard deviation of the mean of the compliant component and as much as an 85% decrease in the MSE of the 95th percentile. The results in Tables 4.2-4.6 (Scenarios 2-6) demonstrate that model averaging down- weights the proportion of the chain estimated by the REL approach if a dose-level is not appropriately defined by the hypothesized relationship. However, large effect sizes are needed to downweight the influence of the REL approach. For example, when there was a one ES difference between the two and hypothesized mean for the compliant component, the RJ model picked models that assume the hypothesized relationship approximately 75% 67 of the time, while the RJ95 picked models that assume the hypothesized relationship ap- proximately 45% of the time (Tables 4.3-4.5). This demonstrates that calibration of model priors in the RJMCMC algorithm is important and can have a direct impact on the pro- portion of time spent in a given state. The direction of misspecification (i.e., effect sizes greater or less than the hypothesized relationship) influences the amount of downweighting of the REL approach. This can be be observed in Table 4.6 (Scenario 6), which has effect sizes that are larger than the hypothesized relationship, where the RJ and RJ95 models were less likely to pick models that assumed the hypothesized relationship than in Scenario 5, even though the absolute difference of the ES from the hypothesized mean relationship was the same for both scenarios. Interestingly, the true value for one group does not seem to have a large impact on whether borrowing from the REL model occurs. For example, the proportion of time a group spends in REL when misspecified is fairly constant across other scenarios, even though different groups are misspecified in each scenario. pˆc µˆ1 µˆ2 σˆ e µ1 95th Percentile Proportion Model ES Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE in REL IND 4 0.00 0.02 0.00 0.00 0.20 0.04 -0.01 0.13 0.02 0.02 0.08 0.01 0.21 0.61 0.42 - 3 0.01 0.04 0.00 0.06 0.30 0.09 -0.03 0.16 0.03 0.06 0.13 0.02 3.47 6.53 54.75 - 2 0.06 0.09 0.01 0.16 0.41 0.19 0.00 0.18 0.03 0.09 0.12 0.02 14.51 13.01 379.89 - 1 0.17 0.13 0.05 -0.05 0.40 0.16 0.30 0.29 0.17 -0.04 0.09 0.01 4.93 12.43 178.90 - 0 - - - 0.00 0.10 0.01 - - - 0.00 0.07 0.01 - - - - REL 4 0.00 0.93 0.00 0.01 0.42 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.14 0.58 0.15 - 3 0.00 0.80 0.00 0.01 0.29 0.01 -0.03 0.98 0.03 0.04 0.83 0.01 0.69 0.19 2.05 - 2 -0.01 0.62 0.00 0.01 0.21 0.01 -0.04 0.93 0.03 0.04 0.89 0.01 1.80 0.25 13.51 - 1 0.11 0.64 0.02 0.01 0.21 0.01 0.22 0.59 0.07 -0.06 1.04 0.01 -1.82 0.53 46.49 - 0 - - - 0.01 0.85 0.01 - - - 0.00 1.00 0.01 - - - - RJ 4 0.00 0.94 0.00 0.01 0.44 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.14 0.59 0.15 0.98 (0.06) 3 0.00 0.78 0.00 0.01 0.32 0.01 -0.03 0.97 0.03 0.04 0.83 0.01 0.77 0.22 2.73 0.98 (0.05) 2 -0.01 0.59 0.00 0.02 0.24 0.01 -0.04 0.92 0.03 0.04 0.90 0.01 2.34 0.29 20.11 0.97 (0.05) 1 0.12 0.65 0.02 0.01 0.25 0.01 0.22 0.61 0.08 -0.06 1.03 0.01 -1.58 0.53 45.45 0.97 (0.05) 0 - - - 0.01 0.85 0.01 - - - 0.00 1.00 0.01 - - - - RJ95 4 0.00 0.94 0 0.01 0.51 0.01 -0.01 1 0.02 0.02 0.98 0.01 0.14 0.62 0.16 0.91 (0.12) 3 0.00 0.77 0.00 0.02 0.43 0.02 -0.03 0.97 0.03 0.04 0.85 0.01 1.10 0.37 7.05 0.90 (0.12) 2 0.00 0.60 0.00 0.05 0.41 0.03 -0.03 0.91 0.03 0.04 0.92 0.01 4.18 0.46 53.12 0.84 (0.13) 1 0.13 0.69 0.02 0.00 0.46 0.03 0.24 0.68 0.10 -0.06 1.03 0.01 -0.85 0.59 54.77 0.85 (0.12) 0 - - - 0.01 0.86 0.01 - - - 0.00 1.00 0.01 - - - - Table 4.1: Scenario 1 simulation results where all dose-levels follow the assumed relationship. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. 68 pˆc µˆ1 µˆ2 σˆ e µ1 95th Percentile Proportion Model ES Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE in REL IND 4 0.00 0.02 0.00 0.00 0.20 0.04 -0.01 0.13 0.02 0.02 0.09 0.01 0.21 0.61 0.41 - 3 0.01 0.04 0.00 0.06 0.29 0.09 -0.03 0.16 0.03 0.06 0.13 0.02 3.45 6.42 53.08 - 2 0.06 0.09 0.01 0.16 0.42 0.20 0.00 0.19 0.04 0.09 0.12 0.02 14.46 12.92 376.00 - 0.5 0.20 0.14 0.06 -0.29 0.35 0.21 0.47 0.32 0.33 -0.1 0.08 0.02 -14.83 12.50 376.26 - 0 - - - 0.00 0.10 0.01 - - - 0.00 0.07 0.01 - - - - REL 4 0.00 0.93 0.00 0.02 0.42 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.16 0.60 0.16 - 3 0.00 0.81 0.00 0.02 0.29 0.01 -0.03 0.98 0.03 0.04 0.83 0.01 0.75 0.20 2.18 - 2 -0.01 0.61 0.00 0.02 0.21 0.01 -0.04 0.91 0.03 0.04 0.89 0.01 1.98 0.25 14.54 - 0.5 0.03 0.61 0.01 -0.48 0.25 0.24 0.26 0.47 0.09 -0.13 1.03 0.02 -30.65 0.42 966.94 - 0 - - - 0.02 0.85 0.01 - - - 0.00 1 0.01 - - - - RJ 4 0.00 0.93 0.00 0.02 0.45 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.16 0.60 0.16 0.98 (0.06) 3 0.00 0.80 0.00 0.02 0.32 0.01 -0.03 0.98 0.03 0.04 0.83 0.01 0.81 0.23 2.80 0.98 (0.05) 2 -0.01 0.59 0.00 0.03 0.24 0.01 -0.04 0.90 0.03 0.04 0.90 0.01 2.45 0.29 20.17 0.97 (0.05) 0.5 0.05 0.61 0.01 -0.46 0.30 0.22 0.29 0.52 0.11 -0.13 1.03 0.02 -29.47 0.48 904.06 0.95 (0.09) 0 - - - 0.02 0.85 0.01 - - - 0.00 1 0.01 - - - - RJ95 4 0.00 0.94 0.00 0.01 0.51 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.15 0.64 0.17 0.91 (0.12) 3 0.00 0.79 0.00 0.03 0.43 0.02 -0.03 0.97 0.03 0.04 0.85 0.01 1.13 0.36 6.64 0.90 (0.12) 2 0.00 0.59 0.00 0.05 0.39 0.03 -0.03 0.88 0.03 0.04 0.92 0.01 4.35 0.46 54.54 0.84 (0.13) 0.5 0.09 0.72 0.02 -0.41 0.47 0.20 0.35 0.71 0.17 -0.13 1.03 0.02 -26.27 0.65 757.01 0.78 (0.17) 0 - - - 0.02 0.87 0.01 - - - 0.00 1.00 0.01 - - - - Table 4.2: Scenario 2 simulation results where dose-level 4 does not follow the assumed relationship. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. 69 pˆc µˆ1 µˆ2 σˆ e µ1 95th Percentile Proportion Model ES Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE in REL IND 4 0.00 0.02 0.00 0.00 0.20 0.04 -0.01 0.13 0.02 0.02 0.09 0.01 0.21 0.63 0.44 - 3 0.01 0.04 0.00 0.06 0.29 0.09 -0.03 0.16 0.03 0.06 0.13 0.02 3.41 6.27 50.91 - 1 0.16 0.13 0.04 -0.06 0.42 0.18 0.30 0.29 0.18 -0.04 0.09 0.01 4.39 12.46 174.58 - 0.5 0.19 0.14 0.06 -0.3 0.35 0.21 0.47 0.32 0.32 -0.1 0.08 0.02 -14.93 12.45 377.74 - 0 - - - 0.00 0.10 0.01 - - - 0.00 0.07 0.01 - - - - REL 4 0.00 0.93 0.00 0.04 0.43 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.19 0.59 0.17 - 3 0.00 0.81 0.00 0.04 0.30 0.01 -0.03 0.97 0.03 0.04 0.83 0.01 0.84 0.21 2.40 - 1 -0.18 0.30 0.04 -0.96 0.21 0.93 -0.12 0.39 0.03 -0.03 0.91 0.01 -23.87 0.18 574.96 - 0.5 0.04 0.62 0.01 -0.46 0.25 0.22 0.26 0.47 0.09 -0.13 1.03 0.02 -30.15 0.43 937.94 - 0 - - - 0.04 0.87 0.01 - - - 0.00 1 0.01 - - - - RJ 4 0.00 0.93 0.00 0.03 0.46 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.18 0.59 0.17 0.98 (0.07) 3 0.00 0.80 0.00 0.03 0.34 0.01 -0.03 0.97 0.03 0.04 0.83 0.01 0.89 0.24 3.04 0.98 (0.05) 1 -0.08 0.93 0.02 -0.71 0.75 0.60 0.02 0.79 0.05 -0.04 0.93 0.01 -16.66 0.74 363.57 0.77 (0.24) 0.5 0.05 0.63 0.01 -0.45 0.31 0.21 0.29 0.55 0.12 -0.13 1.04 0.02 -29.14 0.49 886.92 0.95 (0.09) 0 - - - 0.03 0.89 0.01 - - - 0.00 1 0.01 - - - - RJ95 4 0.00 0.93 0.00 0.02 0.52 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.16 0.61 0.17 0.91 (0.12) 3 0.00 0.79 0.00 0.03 0.44 0.02 -0.03 0.97 0.03 0.04 0.85 0.01 1.15 0.38 7.14 0.90 (0.12) 1 0.04 1.14 0.03 -0.39 0.97 0.32 0.17 1.02 0.12 -0.05 0.95 0.01 -6.78 1.02 206.56 0.44 (0.26) 0.5 0.09 0.72 0.02 -0.41 0.49 0.20 0.34 0.72 0.17 -0.13 1.03 0.02 -26.27 0.65 755.77 0.79 (0.17) 0 - - - 0.02 0.89 0.01 - - - 0.00 1.00 0.01 - - - - Table 4.3: Scenario 3 simulation results where dose-levels 3 and 4 do not follow the assumed relationship. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. 70 pˆc µˆ1 µˆ2 σˆ e µ1 95th Percentile Proportion Model ES Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE in REL IND 4 0.00 0.02 0.00 0.00 0.20 0.04 -0.01 0.13 0.02 0.02 0.09 0.01 0.21 0.62 0.43 - 1.5 0.12 0.12 0.03 0.11 0.41 0.18 0.13 0.23 0.07 0.04 0.10 0.01 14.17 12.89 367.02 - 1 0.16 0.14 0.04 -0.06 0.42 0.18 0.30 0.3 0.18 -0.04 0.09 0.01 4.37 12.49 175.13 - 0.5 0.20 0.14 0.06 -0.3 0.35 0.21 0.47 0.32 0.33 -0.1 0.08 0.02 -14.82 12.44 374.27 - 0 - - - 0.00 0.10 0.01 - - - 0.00 0.07 0.01 - - - - REL 4 0.00 0.92 0.00 0.06 0.46 0.01 -0.01 0.98 0.02 0.02 0.98 0.01 0.24 0.62 0.21 - 1.5 -0.23 0.19 0.06 -1.44 0.23 2.09 -0.32 0.47 0.12 0.11 0.84 0.02 -16.48 0.08 272.66 - 1 -0.18 0.30 0.03 -0.94 0.22 0.90 -0.11 0.39 0.03 -0.03 0.91 0.01 -23.63 0.19 563.73 - 0.5 0.04 0.65 0.01 -0.44 0.27 0.21 0.27 0.49 0.10 -0.13 1.03 0.02 -29.46 0.45 899.40 - 0 - - - 0.06 0.93 0.01 - - - 0.01 1 0.01 - - - - RJ 4 0.00 0.93 0.00 0.04 0.49 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.20 0.62 0.19 0.98 (0.06) 1.5 -0.05 1.28 0.03 -0.67 1.47 0.82 -0.07 1.11 0.07 0.06 0.95 0.01 -1.74 1.02 174.71 0.55 (0.31) 1 -0.07 0.96 0.02 -0.68 0.77 0.57 0.04 0.85 0.06 -0.04 0.92 0.01 -16.02 0.80 357.47 0.76 (0.25) 0.5 0.06 0.65 0.01 -0.44 0.31 0.20 0.30 0.57 0.12 -0.13 1.03 0.02 -28.75 0.51 866.84 0.95 (0.09) 0 - - - 0.04 0.94 0.01 - - - 0.00 1 0.01 - - - - RJ95 4 0.00 0.94 0.00 0.02 0.54 0.01 -0.01 0.99 0.02 0.02 0.98 0.01 0.17 0.63 0.18 0.91 (0.12) 1.5 0.05 1.18 0.02 -0.19 1.30 0.32 0.06 1.07 0.06 0.04 0.97 0.01 8.00 1.04 242.01 0.24 (0.22) 1 0.04 1.12 0.03 -0.38 0.96 0.31 0.18 0.98 0.12 -0.05 0.95 0.01 -6.66 1.00 200.63 0.44 (0.26) 0.5 0.09 0.74 0.02 -0.40 0.51 0.20 0.35 0.72 0.18 -0.13 1.03 0.02 -25.99 0.67 744.54 0.79 (0.17) 0 - - - 0.03 0.94 0.01 - - - 0.00 1.00 0.01 - - - - Table 4.4: Scenario 4 simulation results where dose-levels 2, 3, and 4 do not follow the assumed relationship. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. 71 pˆc µˆ1 µˆ2 σˆ e µ1 95th Percentile Proportion Model ES Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE in REL IND 2 0.06 0.09 0.01 0.15 0.39 0.18 -0.01 0.18 0.03 0.10 0.12 0.02 14.41 12.64 367.49 - 1.5 0.12 0.12 0.03 0.12 0.41 0.18 0.14 0.23 0.07 0.04 0.10 0.01 14.26 12.82 367.68 - 1 0.16 0.14 0.04 -0.07 0.43 0.19 0.30 0.3 0.18 -0.04 0.09 0.01 4.22 12.54 175.04 - 0.5 0.19 0.14 0.06 -0.29 0.34 0.20 0.47 0.32 0.33 -0.1 0.08 0.02 -14.86 12.51 377.29 - 0 - - - 0.00 0.10 0.01 - - - 0.00 0.07 0.01 - - - - REL 2 -0.25 0.17 0.06 -1.91 0.26 3.67 -0.49 0.55 0.25 0.26 0.66 0.07 -10.79 0.04 116.70 - 1.5 -0.23 0.20 0.05 -1.41 0.26 2.01 -0.32 0.47 0.11 0.11 0.85 0.02 -16.28 0.09 266.15 - 1 -0.18 0.32 0.03 -0.91 0.24 0.84 -0.11 0.39 0.02 -0.03 0.92 0.01 -23.21 0.20 544.83 - 0.5 0.06 0.68 0.01 -0.41 0.30 0.18 0.29 0.51 0.11 -0.13 1.03 0.02 -28.20 0.49 833.09 - 0 - - - 0.09 1.03 0.02 - - - 0.01 1.01 0.01 - - - - RJ 2 -0.03 1.4 0.02 -0.52 1.9 0.84 -0.14 1.34 0.08 0.13 1.05 0.03 5.43 0.94 170.84 0.35 (0.31) 1.5 -0.06 1.23 0.02 -0.68 1.45 0.81 -0.08 1.07 0.07 0.06 0.97 0.01 -2.07 0.98 162.24 0.56 (0.31) 1 -0.07 0.97 0.02 -0.67 0.76 0.56 0.04 0.83 0.06 -0.04 0.93 0.01 -15.99 0.79 352.75 0.76 (0.25) 0.5 0.06 0.66 0.01 -0.42 0.33 0.19 0.30 0.53 0.12 -0.13 1.03 0.02 -28.19 0.51 835.49 0.95 (0.08) 0 - - - 0.06 1.05 0.01 - - - 0.01 1.01 0.01 - - - - RJ95 2 0.03 1.13 0.01 -0.06 1.34 0.28 -0.05 1.12 0.04 0.10 1.01 0.03 11.62 0.97 286.01 0.11 (0.15) 1.5 0.05 1.18 0.02 -0.20 1.32 0.33 0.06 1.07 0.06 0.04 0.99 0.01 7.73 1.04 238.01 0.24 (0.22) 1 0.04 1.09 0.02 -0.40 0.95 0.33 0.17 0.95 0.11 -0.05 0.96 0.01 -7.09 0.99 203.47 0.45 (0.26) 0.5 0.09 0.75 0.02 -0.40 0.52 0.19 0.35 0.72 0.18 -0.13 1.03 0.02 -25.79 0.67 734.85 0.79 (0.17) 0 - - - 0.04 1.04 0.01 - - - 0.00 1.00 0.01 - - - - Table 4.5: Scenario 5 simulation results where no dose-level follows the assumed relationship. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. 72 pˆc µˆ1 µˆ2 σˆ e µ1 95th Percentile Proportion Model ES Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE Bias SD* MSE in REL IND 6 0.00 0.00 0.00 0.00 0.19 0.03 -0.01 0.12 0.01 0.01 0.07 0.01 0.02 0.06 0.00 - 4.5 0.00 0.01 0.00 0.01 0.19 0.04 0.00 0.13 0.02 0.01 0.08 0.01 0.11 0.32 0.12 - 3 0.01 0.04 0.00 0.06 0.30 0.09 -0.03 0.16 0.03 0.06 0.13 0.02 3.25 6.18 48.73 - 1.5 0.12 0.12 0.03 0.11 0.43 0.19 0.13 0.22 0.07 0.04 0.10 0.01 14.45 13.03 378.69 - 0 - - - 0.00 0.10 0.01 - - - 0.00 0.07 0.01 - - - - REL 6 0.02 2.72 0.00 1.40 0.91 2.00 0.02 0.98 0.01 0.30 1.84 0.11 1.68 13.41 3.54 - 4.5 0.03 1.48 0.00 0.90 0.87 0.84 0.05 0.99 0.02 0.16 1.62 0.04 3.02 5.68 12.44 - 3 0.03 0.82 0.00 0.40 0.57 0.19 0.02 0.98 0.03 0.09 1.00 0.02 4.76 0.61 37.10 - 1.5 -0.01 0.65 0.01 -0.10 0.39 0.04 0.01 0.70 0.02 -0.01 0.97 0.01 -1.07 0.43 32.15 - 0 - - - -0.60 1.67 0.39 - - - 0.18 1.73 0.05 - - - - RJ 6 0.00 1.01 0.00 0.00 1.00 0.03 -0.01 1.00 0.01 0.01 1.00 0.01 0.02 1.00 0.00 0.00 (0.00) 4.5 0.00 1.01 0.00 0.01 1.03 0.04 0.00 1.00 0.02 0.01 1.01 0.01 0.13 1.21 0.17 0.00 (0.03) 3 0.04 1.20 0.00 0.40 1.57 0.37 -0.03 1.03 0.03 0.14 1.36 0.05 9.24 1.48 169.07 0.48 (0.38) 1.5 0.18 0.73 0.04 0.46 0.33 0.23 0.14 0.77 0.05 0.07 0.98 0.01 21.00 0.74 533.41 0.96 (0.07) 0 - - - -0.02 1.08 0.01 - - - 0.00 1.00 0.01 - - - - RJ95 6 0.00 1.01 0.00 0.00 1.00 0.03 -0.01 1.00 0.01 0.01 1.00 0.01 0.02 1.00 0.00 0.00 (0.00) 4.5 0.00 1.00 0.00 0.01 1.01 0.04 0.00 1.00 0.02 0.01 1.00 0.01 0.12 1.02 0.12 0.00 (0.01) 3 0.02 1.13 0.00 0.22 1.36 0.21 -0.03 1.00 0.03 0.09 1.18 0.03 5.76 1.21 88.66 0.26 (0.29) 1.5 0.17 0.79 0.04 0.39 0.54 0.21 0.14 0.79 0.05 0.06 0.98 0.01 19.67 0.76 484.76 0.83 (0.15) 0 - - - -0.01 1.04 0.01 - - - 0.00 1.00 0.01 - - - - Table 4.6: Scenario 6 simulation results the dose-level relationship underestimates the compliant means. Bias and MSE reported for proportion compliant (pˆc), compliant mean (µˆ1), non-compliant mean(µˆ2), shared standard deviation (σˆ), and the estimated 95th percentile for exp(µ1) for IND model, REL model, and model averaging between the two approaches with equal model priors (RJ) and priors favoring IND (RJ95). Estimated standard deviation (SD*) provided for the IND approach with the ratio of the SDSDIND for REL, RJ, and RJ95. 73 74 4.4 Application to a regulatory tobacco clinical trial This section provides a case study of our model averaging approach applied to the data from CENIC-p1. As mentioned in Section 4.1, previous research has identified TNE cut- offs that can be used to identify non-compliance among subjects randomized to the 0.4 mg/g groups. However, the investigators are also interested in understanding the effect of nicotine reduction among smokers that complied to the intervention in the intermediate nicotine content groups where biomarker thresholds for identifying non-compliance are not currently available. Our analysis will focus on total nicotine equivalents (TNE; a biomarker of nicotine exposure that measures most nicotine metabolites) as our biomarker for identifying non- compliance. For the purposes of this analysis, we will analyze TNE on the log scale and assume that all mixture components for the log-transformed biomarker values are normally distributed. CENIC-p1 included two control conditions: a usual brand (UB) condition, who received their preferred brand of commercially available cigarettes, and a 15.8 mg/g study cigarette (roughly equivalent to the nicotine content of commercial cigarettes). In addition, CENIC-p1 included five experimental conditions: a 5.2 mg/g group, a 2.4 mg/g group, a 1.3 mg/g group, a 0.4 mg/g group, and a 0.4 mg/g condition with elevated tar yield (0.4 mg/g (HT) group) to understand the impact of tar yield on the effect of nicotine reduction. For the purposes of this analysis, we combine the two controls conditions and the two 0.4 mg/g groups because these groups have the same nicotine content, respectively, which is the primary factor influencing the distribution of the biomarkers. The proposed relationship from Section 4.2.1 is used with wj = (0.4, 1.3, 2.4, 5.2, 15.8), corresponding to a 97.5%, 91.8%, 85.0%, and 66.7% reduction (effect size of -3.7, -2.5, -1.9, and -1.1) in the average biomarker value for the VLNC group, 1.3 mg/g group, 2.4 mg/g group, and 5.2 mg/g group, respectively, relative to the combined usual brand/15.8 mg/g control group. Table 4.7 provides summary statistics at week 6 by treatment group for all participants and restricted to those who self-reported compliance. Figure 4.1 provides histograms of log(TNE) by treatment group for self-reported compliers at trial completion (week 6 for CENIC-p1). A bimodal distribution can be observed for the 1.3 mg/g group and the combined VLNC group whereas a bimodal pattern is not easily identified for the 2.4 and 5.2 mg/g groups. This suggests that, while it may not be difficult to identify the compliant 75 and non-compliant subjects in the 1,3 mg/g and VLNC groups, there may not be any truly compliant subjects for the 2.4 and 5.2 mg/g groups, or, more likely, the distributions of log(TNE) for compliant and non-compliant subjects have sufficient overlap to make distinct bimodal patterns difficult to identify. Covariate UB 15.8 5.2 2.4 1.3 0.4 HT 0.4 All Subjects: (N=118) (N=119) (N=122) (N=119) (N=119) (N=123) (N=119) CPD Available 113 (95.8%) 111 (93.3%) 114 (93.4%) 107 (89.9%) 110 (92.4%) 116 (94.3%) 109 (91.6%) TNE Data Available 109 (92.4%) 108 (90.8%) 107 (87.7%) 107 (89.9%) 109 (91.6%) 113 (91.9%) 107 (89.9%) Study CPD 21.9 (13.0) 20.9 (12.4) 19.9 (14.6) 14.7 (10.5) 14.7 (10.7) 13.9 (9.65) 13.5 (10.1) Non-Study CPD 0.28 (1.4) 0.41 (1.2) 0.92 (2.47) 1.83 (3.97) 1.63 (4.16) 1.94 (5.05) 1.35 (3.45) TNE (nmol/mL) 59.5 (53.9) 49.9 (33.0) 39.6 (33.6) 34.8 (33.3) 35.2 (32.3) 36.1 (39.8) 34.7 (32.1) Self-Reported Compliers: (N=102) (N=80) (N=74) (N=57) (N=64) (N=67) (N=69) CPD Available 102 (100.0%) 80 (100.0%) 74 (100.0%) 57 (100.0%) 64 (100.0%) 67 (100.0%) 69 (100.0%) TNE Data Available 100 (98.0%) 78 (97.5%) 69 (93.2%) 56 (98.2%) 64 (100.0%) 65 (97.0%) 68 (98.6%) Study CPD 22.7 (13.1) 21.3 (12.6) 22.2 (15.6) 16.3 (10.9) 17.0 (10.8) 14.9 (9.82) 15.5 (10.7) Non-Study CPD 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) TNE (nmol/mL) 59.3 (53.4) 48.0 (31.9) 33.2 (29.6) 30.6 (36.2) 30.8 (31.5) 27.3 (36.8) 30.0 (33.7) Table 4.7: Summary statistics at week 6 by CENIC-p1 groups for all subjects and restricted to those self-reporting compliance at week 6. Mean (sd) for continuous measures, N (%) for categorical measures. 76 Usual Brand+15.8 mg/g log(TNE) D en si ty −2 0 2 4 6 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 5.2 mg/g log(TNE) D en si ty −2 0 2 4 6 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 2.4 mg/g log(TNE) D en si ty −2 0 2 4 6 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 1.3 mg/g log(TNE) D en si ty −2 0 2 4 6 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0.4 mg/g+0.4 mg/g (HT) log(TNE) D en si ty −2 0 2 4 6 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 Figure 4.1: Histograms for distribution of self-reported compliers of log(TNE) in each CENIC-p1 treatment grouping at week 6. For all j dose-levels, assume “non-informative” prior specifications of αj = βj = 1 for the beta prior on pj , aj = bj = 0.001 for the gamma prior on τj , and sµj = sθj = 0.00001 for the normal priors on µj and θj . The REL and IND models were fit on three chains with different, overdispersed initial values of length 20,000 with a burn-in of 2,000 iterations. 77 Chain length, burn-in, and MCMC convergence of the parameters were established by Gelman-Rubin diagnostic values below 1.05 (Gelman and Rubin, 1992), autocorrelation plots with a lag up to 50 iterations, and trace plots. The RJMCMC models were fit on three chains with a length of 400,000 with a burn-in of 200,000 iterations. Convergence is challenging for reversible jump algorithms since other methods for MCMC diagnostics are not applicable due to the changing model space (Green and Hastie, 2009), but utilizing longer chains and discarding the first half of the observations provide a greater opportunity to explore the total model space of 16 potential models. The results of applying both approaches and the RJMCMC algorithm assuming p(Ij) = 0.5 for each group j are presented in Table 4.8. Presented are the posterior means and 95% highest posterior density (HPD) intervals for the proportion compliant (pˆc), the compliant component mean for log(TNE) (µˆ1), the non-compliant component mean for log(TNE) (µˆ2), the shared standard deviation between the two components (σˆ), and the 90th and 95th percentile of the biomarker distribution for compliant subjects. The last quantities have been suggested as thresholds for identifying non-compliance in trials of reduced nicotine content cigarettes (Denlinger et al., 2016). Results for the IND and REL models are similar, in general, suggesting that the hy- pothesized relationship between nicotine dose and biomarkers of nicotine exposure may be reasonable. As a result, our RJMCMC algorithm chooses the relationship over 90% of the time for all dose-levels. Figure 4.2 presents histograms of log(TNE) by group and the fitted distributions from the RJMCMC algorithm. We see that our approach results in a reason- able fit to the data, with the possible exception of the 2.4 mg/g group, where it is difficult to identify the two components. The gains in efficiency from using the REL approach as compared to the IND approach can be seen in more precise estimates of the 90th and 95th percentiles across all groups, with reductions in the width of the HPD intervals ranging from 37-55% and 24-51% for the 90th and 95th percentile, respectively. In our simulation study, we observed that substantial overlap in the mixture components can lead to bias in the estimated marginal probability of compliance, which may explain why the estimated probability of compliance is higher for the 5.2 mg/g group than the other groups. Similarly, the 2.4 mg/g group lacks a clear bimodal distribution which makes estimation for the IND approach challenging and may introduce bias. 78 Boatman et al. (2017) showed that the probability of a subject being compliant, Cij , can be determined as a function of the observed biomarker values and written as a function of the mixture components: P (Cij |yij) = (1− pj)N (yij |µj , τj) (1− pj)N (yij |µj , τj) + pjN (yij |µj + θj , τj) . Figure 4.3 presents the estimated probability of compliance as a function of the biomarkers from the IND and RJMCMC models. With the exception of the 2.4 mg/g group, the curves for the RJMCMC approach are similar to the IND approach, suggesting that any bias due to borrowing is minimal and outweighed by the increased precision, as noted in Table 4.8. There is a substantial difference between the two curves for the 2.4 mg/g group, but it should be noted that both curves are below the curve for the 0.4 mg/g group for low biomarker values, which is inconsistent with our understanding of the underlying mechanism. Further work is needed to understand the discrepancy and to determine the validity of these results. Percentile for µ1 Proportion Group Approach pˆc µˆ1 µˆ2 σˆ 90 th 95th in REL VLNC IND 0.36 (0.27,0.45) 0.20 (-0.12,0.51) 3.44 (3.21,3.67) 0.92 (0.78,1.05) 4.03 (2.62,5.54) 5.63 (3.67,8.03) REL 0.34 (0.25,0.43) -0.09 (-0.25,0.06) 3.38 (3.15,3.60) 0.93 (0.80,1.08) 3.03 (2.40,3.71) 4.26 (3.23,5.39) RJ 0.34 (0.26,0.43) -0.07 (-0.28,0.14) 3.39 (3.16,3.62) 0.93 (0.80,1.07) 3.10 (2.35,3.95) 4.35 (3.17,5.72) 0.935 1.3 mg/g IND 0.27 (0.11,0.44) 0.77 (-0.01,1.48) 3.37 (2.95,3.76) 0.94 (0.70,1.25) 8.21 (2.53,15.45) 11.86 (3.57,23.10) REL 0.31 (0.16,0.45) 1.09 (0.93,1.24) 3.45 (3.07,3.80) 0.93 (0.71,1.20) 9.95 (6.83,14.12) 14.09 (8.74,21.46) RJ 0.31 (0.16,0.46) 1.08 (0.92,1.25) 3.44 (3.05,3.80) 0.93 (0.71,1.21) 9.97 (6.61,14.42) 14.15 (8.48,22.00) 0.990 2.4 mg/g IND 0.15 (0.01,0.35) 0.40 (-1.03,2.26) 3.10 (2.74,3.46) 1.00 (0.77,1.29) 8.51 (0.31,29.45) 12.72 (0.36,45.17) REL 0.27 (0.04,0.48) 1.70 (1.54,1.85) 3.21 (2.73,3.67) 1.10 (0.83,1.38) 23.03 (14.46,32.89) 34.81 (19.39,53.59) RJ 0.26 (0.04,0.47) 1.58 (0.22,1.94) 3.20 (2.74,3.66) 1.09 (0.82,1.38) 21.65 (1.61,32.60) 32.68 (2.11,53.04) 0.901 5.2 mg/g IND 0.43 (0.13,0.75) 2.35 (1.91,3.02) 3.69 (3.21,4.03) 0.63 (0.43,0.90) 25.41 (12.84,56.96) 32.56 (14.82,76.61) REL 0.44 (0.22,0.65) 2.47 (2.32,2.62) 3.71 (3.29,4.05) 0.64 (0.45,0.88) 27.49 (18.82,38.67) 35.02 (21.70,52.25) RJ 0.45 (0.23,0.66) 2.47 (2.31,2.63) 3.72 (3.30,4.06) 0.64 (0.45,0.88) 27.31 (18.39,38.94) 34.74 (21.55,52.93) 0.989 UB/15.8 mg/g IND - 3.57 (3.40,3.75) - 1.21 (1.09,1.34) - - REL - 3.58 (3.43,3.73) - 1.21 (1.08,1.34) - - RJ - 3.58 (3.42,3.74) - 1.21 (1.09,1.34) - - - Table 4.8: Mean (95% HPD interval) of proportion in compliant group (pˆc), compliant component mean log(TNE) (µˆ1), non-compliant component mean log(TNE) (µˆ2), shared standard deviation (σˆ), 90/95th percentile of µˆ1, and proportion of the chain spent in the REL approach for the RJMCMC. 79 80 RJMCMC, VLNC log(TNE) D en si ty −2 0 2 4 6 0. 0 0. 1 0. 2 0. 3 0. 4 Complier Component Non−complier Component Mixture Density RJMCMC, 1.3 mg/g log(TNE) D en si ty −2 0 2 4 6 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 Complier Component Non−complier Component Mixture Density RJMCMC, 2.4 mg/g log(TNE) D en si ty −2 0 2 4 6 0. 0 0. 2 0. 4 0. 6 0. 8 Complier Component Non−complier Component Mixture Density RJMCMC, 5.2 mg/g log(TNE) D en si ty −2 0 2 4 6 0. 0 0. 2 0. 4 0. 6 Complier Component Non−complier Component Mixture Density Figure 4.2: Histograms for distribution of self-reported compliers of log(TNE) in each CENIC-p1 treatment arm at week 6 with mixture densities for the RJ approach. 81 0 5 10 15 20 25 30 35 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 Estimated Compliance as a Function of TNE TNE (nmol/mL) Pr (C =1 ) 0.4 IND 0.4 RJ 1.3 IND 1.3 RJ 2.4 IND 2.4 RJ 5.2 IND 5.2 RJ IND 0.4 1.3 2.4 5.2 RJ 0.4 1.3 2.4 5.2 Figure 4.3: Predicted compliance (Pr(C=1)) as a function of observed TNE by study group for IND approach and model averaging with RJMCMC. 4.5 Discussion The ability to identify non-compliance has important implications for interpreting the re- sults of CENIC-p1. Previous work to identify non-compliance in CENIC-p1 leveraged the availability of auxiliary data to determine cut-offs for identifying non-compliance and to 82 estimate the probability of compliance conditional on the observed biomarker values for the VLNC group, only (Boatman et al., 2017). However, auxiliary data is not available for the intermediate dose-levels. The method proposed by Boatman et al. (2017) could be used to estimate the probability of non-compliance for the intermediate dose-levels, but the estimates will be inefficient in the absence of auxiliary data from known compliers. We propose a fully Bayesian approach to estimating the mixture components across all groups simultaneously by averaging over the model space, which included different assumptions regarding the association between the nicotine content of the cigarettes and biomarkers of nicotine exposure. Simulation results were encouraging, with model averaging achieving more precise estimates of the mixture components by choosing the state assuming the hy- pothesized relationship when the relationship was true and downweighting the hypothesized relationship when the relationship was misspecified. One of the objectives of CENIC-p1 was to collect data that would inform the FDA as they consider new product standards for cigarettes, including a potential reduction in the nicotine content of cigarettes. The intention-to-treat analysis provides an estimate of the effect of an intervention, in practice, which may include non-compliance to the intervention, but it does not reflect the potential future reality where policy choices may lead to previous options disappearing from the market due to regulation (i.e., the FDA using its power to lower nicotine content of cigarettes). This motivates the desire to estimate the causal effect of nicotine reduction, which requires that investigators are able to accurately identify non- compliance to randomized treatment assignment. Previously, investigators were only able to identify non-compliance in the VLNC group, but the method proposed in this chapter provides a framework for identifying non-compliance at other dose-levels, as well. This is particularly important because a number of randomized trials of reduced nicotine content cigarettes are currently under way and some use dose-levels other than the 0.4 mg/g dose. Mixture models can be challenging to estimate in practice, with problems such as label switching or unstable estimates. In the Bayesian context these can be addressed through assumptions regarding the relationships between components (i.e., forcing one component to have a larger mean than the other) and by monitoring the behavior of the resulting MCMC chains to ensure convergence to a steady state. While our Gaussian mixture model is for a finite, two-component mixture model, our approach could be extended to 83 any finite number of components. For example, our analysis only considered subjects that self-report compliance, in which case there are two groups: subjects that honestly self-report compliance and subjects that self-report compliance but were actually non- compliant. Alternately, we could analyze all subjects, which would result in a third mixture component for subjects that honestly self-report non-compliance. The proposed Bayesian framework represents a flexible approach to identifying non- compliance in regulatory tobacco trials, but could be applied to other settings as well. For instance, the approach may be beneficial in therapeutic clinical trials where multiple doses are considered, but pharmacokinetic data are only available for one dose. In addition, future work will extend our proposed modeling framework to a regression setting where biomarkers can be modeled as a function of other covariates of interest that may impact the observed biomarker values, such as cigarettes per day. Additionally, the results of this chapter could be used to estimate graphical compliance networks, which can be combined with causal inference techniques to estimate causal effect for the intermediate dose levels. Chapter 5 Conclusion 5.1 Summary of developments In this thesis we developed multi-source exchangeability models, a general Bayesian frame- work for integrating multiple, potentially non-exchangeable, supplemental data sources into the analysis of a primary source. MEMs yield source-specific smoothing parameters that can be estimated by the data to facilitate a dynamic multi-resolution smoothed estimator that is asymptotically consistent while reducing the dimensionality of the prior space. The general MEM framework, as developed in Chapter 2, was first applied to the context of Gaussian-distributed data with a known mean and unknown variance. The asymptotic consistency of MEM posterior weights, and consequently the marginal posterior distributions, were presented along with extensive simulation studies to exhibit the small sample properties. An application to estimate the reduction in cigarettes smoked per day in a regulatory tobacco clinical trial demonstrated the potential increase in efficiency that results from incorporating supplemental sources into the analysis of a clinical trial. Chapter 3 extended the MEM framework to binary outcome data in the context of a proposed multi-source adaptive platform design. Motivated by the recent Ebola epidemic, the MEM framework is incorporated into the standard platform trial design to enable the incorporation of information from previous segments into the analysis of the current segment to improve efficiency. Maintaining a fixed randomization ratio may induce an imbalance of information between study groups when supplemental information is used 84 85 during data analysis. To address this potential for information imbalance, an adaptive randomization scheme was presented which adapts the allocation ratio for future partic- ipants and targets balanced information across treatment groups within a segment. We demonstrated via simulation that our proposed multi-source adaptive platform design us- ing MEMs resulted in increased power and reasonable type-I error rates as compared to a standard design with no borrowing. We returned to the Gaussian setting in Chapter 4 and considered the situation where there are multiple proposed models for estimating the components of mixture distribu- tions in the context of identifying non-compliance in a regulatory tobacco clinical trial, and then applied the RJMCMC algorithm to induce model averaging over multiple candi- date models. Simulation studies indicated that the RJMCMC algorithm properly weights the proposed models and results in more efficient estimates of the mixture component pa- rameters when borrowing is appropriate. The application to CENIC-p1 was able to more efficiently estimate thresholds for identifying non-compliance at intermediate dose-levels, which were not previously available. 5.2 Significance of the work While a number of methods for adaptively incorporating supplemental information into data analysis have been proposed in the literature, existing approaches typically rely on the data to inform a single parameter, which determines the extent of influence or shrink- age from all sources, risking considerable bias or minimal borrowing in the presence of heterogeneous supplemental data sources. The MEM framework is a general Bayesian hierarchical approach which effectuates source-specific smoothing parameters that can be estimated from the data. By estimating source-specific smoothing parameters we are able to accommodate multiple, potentially non-exchangeable sources without pre-specifying the smoothing parameters or relying on a single parameter to inform borrowing. The multi-source adaptive platform design proposed in Chapter 3 addresses some of the concerns over the “traditional” drug development process which was seen by some as too time consuming and unable to quickly respond to a potentially dynamic, fast- changing disease outbreak. Simulations demonstrated increased power with reasonable inflation of the type-I error rate as compared to a standard platform design, which only 86 considered contemporaneous controls. Our proposed multi-source adaptive platform trial was strongly motivated by the recent Ebola outbreak in West Africa, but future infectious disease epidemics are inevitable and our proposed design represents an advancement in the tools available to respond to future epidemics. The model averaging approach described in Chapter 4 represents an intuitive approach to borrowing information across multiple sources of information in the presence of a hypoth- esized relationship across sources. If the relationship is true, incorporating our biological understanding of the relationship between the groups can increase efficiency, while model averaging allows the flexibility to downweight the information from other groups when the relationship does not hold. Additionally, posterior model weights can provide evidence for the appropriateness of the hypothesized relationship separately for each group. Finally, the use of effective supplemental sample size and posterior model weights pro- vides a convenient way to convey the amount of supplemental information incorporated into a primary data analysis. Further, the fact that the posterior weights are constrained to be positive and sum to one provides an intuitive measure which can promote a greater understanding of how supplemental information is utilized in posterior calculations that may not be as apparent in other approaches. This can be viewed as a beneficial feature of our proposed framework when working with collaborators who may be wary of incor- porating supplemental information or who want to be informed of how influential each supplemental source is on the analysis. 5.3 Future work and considerations Throughout this dissertation, we only considered models that evaluate exchangeability based on a single endpoint, but source inclusion probabilities could also be determined by considering multiple endpoints, simultaneously. In practice, clinical trials in a common disease area will often have multiple endpoints in common. For example, the CENIC-p1 study in Chapters 2 and 4 is part of a larger network of Tobacco Centers of Regulatory Science (TCORS), a joint initiative of the FDA’s Center for Tobacco Products (CTP) and the NIH. These centers are planning to conduct multiple trials relating to nicotine reduction and are actively working to identify a common set of outcome measures for evaluating the impact of tobacco product regulation. 87 Expanding MEMs to multiple endpoints is methodologically interesting in that different types of endpoints may be collected across multiple trials. For example, endpoints could be related to binary, count, or continuous data and all could be beneficial for evaluating exchangeability. In accounting for multiple endpoints one could evaluate each endpoint individually and then utilize the separate source inclusion probabilities, but jointly model- ing endpoints to evaluate exchangeability could provide stronger evidence that sources are exchangeable rather than evaluating endpoints one-by-one. Accounting for these correlated endpoints could be achieved through many different approaches. One approach would be to induce correlation on the outcomes via copula models and use the resulting density to calculate the posterior model weights (Nelsen, 2006; Embrechts et al., 2001; Durrleman et al., 2000). If the multiple endpoints are all Gaussian distributed, a Gaussian copula would provide an intuitive structure. Alternatively, if there are non-Gaussian endpoints or a mixture of endpoints with different distributions, more flexible copula classes, such as Archimedean copulas, could account for these different distributions while enabling the ability to model the dependence of the endpoints through one parameter. An alternate approach would be to use generalized linear mixed models for each outcome where de- pendence across the outcomes is accounted for by specifying a joint distribution of the random effects in place of fitting separate univariate models for each outcome (Koma´rek and Koma´rkova´, 2013). References Adebamowo, C., Bah-Sow, O., Binka, F., Bruzzone, R., Caplan, A., Delfraissy, J.-F., Heymann, D., Horby, P., Kaleebu, P., Tamfum, J.-J. M., et al. (2014). Randomised controlled trials for ebola: practical and ethical issues. Lancet 384, 1423. Benowitz, N. L., Nardone, N., Hatsukami, D. K., and Donny, E. C. (2015). Biochemical estimation of noncompliance with smoking of very low nicotine content cigarettes. Cancer Epidemiology and Prevention Biomarkers 24, 331–335. Berry, D. A. and Eick, S. G. (1995). Adaptive assignment versus balanced randomization in clinical trials: A decision analysis. Statistics in Medicine 14, 231–246. Boatman, J. A., Vock, D. M., Koopmeiners, J. S., and Donny, E. C. (2017). Estimating causal effects from a randomized clinical trial when noncompliance is measured with error. Biostatistics . Browne, W. J. and Draper, D. (2006). A comparison of bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis 1, 473–514. Chen, M.-H., Ibrahim, J. G., Lam, P., Yu, A., and Zhang, Y. (2011). Bayesian design of noninferiority trials for medical devices using historical data. Biometrics 67, 1163–1170. Chib, S. (1996). Calculating posterior distributions and modal estimates in markov mixture models. Journal of Econometrics 75, 79–97. Daniels, M. J. (1999). A prior for the variance in hierarchical models. The Canadian Journal of Statistics 27, 567–578. 88 89 De Santis, F. (2006). Power priors and their use in clinical trials. The American Statistician 60, 122–129. Denlinger, R. L., Smith, T. T., Murphy, S. E., Koopmeiners, J. S., Benowitz, N. L., Hatsukami, D. K., Pacek, L. R., Colino, C., Cwalina, S. N., and Donny, E. C. (2016). Nicotine and anatabine exposure from very low nicotine content cigarettes. Tobacco Regulatory Science 2, 186–203. Dodd, L. E., Proschan, M. A., Neuhaus, J., Koopmeiners, J. S., Neaton, J., Beigel, J. D., Barrett, K., Lane, H. C., and Davey Jr., R. T. (2016). Design of a randomized controlled trial for ebola virus disease medical countermeasures: PREVAIL II, the ebola MCM study. The Journal of Infectious Diseases 213, 1906–1913. Doi, S., Barendregt, J., and Mozurkewich, E. (2011). Meta-analysis of heterogeneous clinical trials: An empirical example. Contemporary Clinical Trials 32, 288–298. Donny, E. C., Denlinger, R. L., Tidey, J. W., Koopmeiners, J. K., et al. (2015). Randomized trial of reduced-nicotine standard for cigarettes. New England Journal of Medicine 373, 1340–1349. Durrleman, V., Nikeghbali, A., and Roncalli, T. (2000). Which copula is the right one? Eicher, T. S., Papageorgiou, C., and Raftery, A. E. (2011). Default priors and predic- tive performance in bayesian model averaging, with application to growth determinants. Journal of Applied Econometrics 26, 30–55. Embrechts, P., Lindskog, F., and McNeil, A. (2001). Modelling dependence with copulas. Rapport technique, De´partement de mathe´matiques, Institut Fe´de´ral de Technologie de Zurich, Zurich . Ferna´ndez, C., Ley, E., and Steel, M. F. (2001). Benchmark priors for bayesian model averaging. Journal of Econometrics 100, 381–427. French, J. L., Thomas, N., and Wang, C. (2012). Using historical data with bayesian methods in early clinical trial monitoring. Statistics in Biopharmaceutical Research 4, 384–394. 90 Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1, 515–534. Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science pages 457–472. Goodman, S. N. and Sladky, J. T. (2005). A bayesian approach to randomized controlled trials in children utilizing information from adults: the case of guillain-barre. Clinical Trials 2, 305–310. Green, P. J. (1995). Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika 82, 711–732. Green, P. J. and Hastie, D. I. (2009). Reversible jump mcmc. Genetics 155, 1391–1403. Hatsukami, D. K., Hertsgaard, L. A., Vogel, R. I., Jensen, J. A., Murphy, S. E., Hecht, S. S., Carmella, S. G., al’Absi, M., Joseph, A. M., and Allen, S. S. (2013). Reduced nicotine content cigarettes and nicotine patch. Cancer Epidemiology, Biomarkers and Prevention 22, 1015–1024. Hatsukami, D. K., Kotlyar, M., Hertsgaard, L. A., Zhang, Y., Carmella, S. G., Jensen, J. A., Allen, S. S., Shields, P. G., Murphy, S. E., Stepanov, I., and Hecht, S. S. (2010). Reduced nicotine content cigarettes: Effects on toxicant exposure, dependence, and cessation. Addiction 105, 343–355. Hobbs, B. P., Carlin, B. P., Mandrekar, S. J., and Sargent, D. J. (2011). Hierarchical commensurate and power prior models for adaptive incorporation of historic information in clinical trials. Biometrics 67, 1047–1056. Hobbs, B. P., Carlin, B. P., and Sargent, D. J. (2013). Adaptive adjustment of the ran- domization ratio using historical control data. Clinical Trials 10, 430–440. Hobbs, B. P., Sargent, D. J., and Carlin, B. P. (2012). Commensurate priors for incorpo- rating historical information in clinical trials using general and generalized linear models. Bayesian Analysis 7, 639–674. 91 Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science 14, 382–417. Ibrahim, J. G. and Chen, M.-H. (2000). Power prior distributions for regression models. Statistical Science 15, 46–60. Ippolito, G., Lanini, S., Brouqui, P., Di Caro, A., Vairo, F., Fusco, F. M., Krishna, S., Capobianchi, M. R., Kyobe-Bosa, H., Puro, V., et al. (2016). Non-randomised ebola trialslessons for optimal outbreak research. The Lancet Infectious Diseases 16, 407–408. Jasra, A., Holmes, C. C., and Stephens, D. A. (2005). Markov chain monte carlo methods and the label switching problem in bayesian mixture modeling. Statistical Science pages 50–67. Kaizer, A. M., Koopmeiners, J. S., and Hobbs, B. P. (2017). Bayesian hierarchical modeling based on multi-source exchangeability. Biostatistics . Kass, R. E. and Natarajan, R. (2006). A default conjugate prior for variance components in generalized linear mixed models. Bayesian Analysis 1, 535–542. Koma´rek, A. and Koma´rkova´, L. (2013). Clustering for multivariate continuous and dis- crete longitudinal data. The Annals of Applied Statistics 7, 177–200. Madigan, D., York, J., and Allard, D. (1995). Bayesian graphical models for discrete data. International Statistical Review 63, 215–232. Marin, J.-M., Mengersen, K., and Robert, C. P. (2005). Bayesian modelling and inference on mixtures of distributions. Handbook of statistics 25, 459–507. Morita, S., Thall, P. F., and Mu¨ller, P. (2008). Determining the effective sample size of a parametric prior. Biometrics 64, 595–602. Murray, T. A., Hobbs, B. P., Lystig, T. C., and Carlin, B. P. (2014). Semiparametric bayesian commensurate survival model for post-market medical device surveillance with non-exchangeable historical data. Biometrics 70, 185–191. 92 Nardone, N., Donny, E. C., Hatsukami, D. K., Koopmeiners, J. S., Murphy, S. E., Strasser, A. A., Tidey, J. W., Vandrey, R., and Benowitz, N. L. (2016). Estimations and predictors of non-compliance in switchers to reduced nicotine content cigarettes. Addiction 111, 2208–2216. Natarajan, R. and Kass, R. E. (2000). Reference bayesian methods for generalized linear mixed models. Journal of the American Statistical Association 95, 227–237. Neelon, B. and O’Malley, A. J. (2010). Bayesian analysis using power priors with applica- tion to pediatric quality of care. Journal of Biometrics and Biostatistics 1,. Nelsen, R. B. (2006). An Introduction to Copulas. Springer, 2 edition. Neuenschwander, B., Capkun-Niggli, G., Branson, M., and Spiegelhalter, D. J. (2010). Summarizing historic information on controls in clinical trials. Clinical Trials 7, 5–18. Ning, J. and Huang, X. (2010). Response-adaptive randomization for clinical trials with adjustment for covariate imbalance. Statistics in Medicine 29, 1761–1768. Pennello, G. and Thompson, L. (2008). Experience with reviewing bayesian medical device trials. Journal of Biopharmaceutical Statistics 18, 81–115. Pocock, S. J. (1976). The combination of randomized and historical controls in clinical trials. Journal of Chronic Diseases 29, 175–188. PREVAIL II Writing Group, for the Multi-National PREVAIL II Study Team (2016). A randomized, controlled trial of zmapp for ebola virus infection. The New England Journal of Medicine 375, 1448. Proschan, M. A., Dodd, L. E., and Price, D. (2016). Statistical considerations for a trial of ebola virus disease therapeutics. Clinical Trials 13, 39–48. R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology 25, 111–164. 93 Raftery, A. E., Madigan, D., and Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association 92, 179–191. Renfro, L. and Sargent, D. (2016). Statistical controversies in clinical research: basket trials, umbrella trials, and other master protocols: a review and examples. Annals of Oncology 28, 34–43. Rietbergen, C., Klugkist, I., Janssen, K., Moons, K., and Hoijtink, H. (2011). Incorporation of historical data in the analysis of randomized therapeutic trials. Contemporary Clinical Trials 32, 848–855. Rupprecht, L. E., Koopmeiners, J. S., Dermody, S. S., Oliver, J. A., Al’Absi, M., Benowitz, N. L., Denlinger-Apte, R., Drobes, D. J., Hatsukami, D., McClernon, F. J., et al. (2017). Reducing nicotine exposure results in weight gain in smokers randomised to very low nicotine content cigarettes. Tobacco Control pages e43–e48. Schieffelin, J. S., Shaffer, J. G., Goba, A., Gbakie, M., Gire, S. K., Colubri, A., Sealfon, R. S., Kanneh, L., Moigboi, A., Momoh, M., et al. (2014). Clinical illness and outcomes in patients with ebola in sierra leone. New England Journal of Medicine 371, 2092–2100. Smith, T. C., Spiegelhalter, D. J., and Thomas, A. (1995). Bayesian approaches to random- effects meta-analysis: A comparative study. Statistics in Medicine 14, 2685–2699. Spiegelhalter, D. J. (2001). Bayesian methods for cluster randomized trials with continuous responses. Statistics in Medicine 20, 435–452. Spiegelhalter, D. J., Abrams, K. R., and Myles, J. P. (2004). Bayesian Approaches to Clinical Trials and Health-Care Evaluation, volume 13. John Wiley & Sons. Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62, 795–809. Thall, P., Fox, P., and Wathen, J. (2015). Statistical controversies in clinical research: Sci- entific and ethical problems with adaptive randomization in comparative clinical trials. Annals of Oncology 26, 1621–1628. 94 Thall, P. F. and Wathen, J. K. (2007). Practical bayesian adaptive randomisation in clinical trials. European Journal of Cancer 43, 859–866. Tidey, J. W., Pacek, L. R., Koopmeiners, J. S., Vandrey, R., Nardone, N., Drobes, D. J., Benowitz, N. L., Dermody, S. S., Lemieux, A., Denlinger, R. L., et al. (2017). Effects of 6-week use of reduced-nicotine content cigarettes in smokers with and without elevated depressive symptoms. Nicotine & Tobacco Research 19, 59–67. U.S. Food and Drug Administration (2001). Guidance for industry: E 10 choice of control group and related issues in clinical trials. Whitehead, J., Valde´s-Ma´rquez, E., Johnson, P., and Graham, G. (2008). Bayesian sample size for exploratory clinical trials incorporating historical data. Statistics in Medicine 27, 2307–2327. WHO (2016). Ebola virus disease. http://apps.who.int/mediacentre/factsheets/ fs103/en/index.html. Accessed: 2017-01-04. Williams, R. J., Tse, T., DiPiazza, K., and Zarin, D. A. (2015). Terminated trials in the clinicaltrials.gov results database: Evaluation of availability of primary outcome data and reasons for termination. PLoS One 10,. Yin, G., Chen, N., and Lee, J. J. (2012). Phase II trial design with bayesian adaptive randomization and predictive probability. Journal of the Royal Statistical Society: Series C (Applied Statistics) 61, 219–235. Zhou, X., Liu, S., Kim, E. S., Herbst, R. S., and Lee, J. J. (2008). Bayesian adaptive design for targeted therapy development in lung cancer–a step toward personalized medicine. Clinical Trials 5, 181–193. Appendix A Proof of theorem from Chapter 2 A.1 Proofs for Convergence of Model Weights Proof of Theorem 1: WLOG assume n = n1 = . . . = nh. First, note that we can re-write the model-specific posterior inclusion probabilities as follows: ωk = p(Ωk|D) = p(D|Ωk)pi(Ωk)∑K j=I p(D|Ωj)pi(Ωj) (A.1) = 1 +∑ j 6=k p(D|Ωj)pi(Ωj) p(D|Ωk)pi(Ωk) −1 . (A.2) We will show that ωk → 1 when Ωk = Ωk∗ and ωk → 0 when Ωk 6= Ωk∗ as n → ∞. This can be demonstrated by showing that the ratios of the marginal likelihoods in (A.2) converge to 0 or ∞, as desired, and by showing that pi(Ωj)pi(Ωk) converges to a constant for the two priors under consideration for the MEM framework for all j 6= k. A.1.1 Convergence of the marginal likelihoods When Ωk = Ωk∗ we must show that all p(D|Ωj) p(D|Ωk) terms in (A.2) converge to 0 as n → ∞. To demonstrate the convergence of the model weight when Ωk 6= Ωk∗ we must show that at least one term of p(D|Ωj) p(D|Ωk) in (A.2) converges to ∞ and therefore ωk → 0 as n→∞. As long as one term converges to ∞, the convergence of the other terms is trivial since it can 95 96 be shown that p(D|Ωj) p(D|Ωk) must converge within the range of 0 to ∞. It is helpful to first describe the convergence of the integrated marginal likelihoods. Rewriting (3.5) we have p(D|Ωk) = ( √ 2pi)(H+1)− ∑H h=1{sh,k}√( 1 v + ∑H i=1 { si,k vi })(∏H j=1 {[ 1 vj ]1−sj,k}) × H∏ l=1 H∏ l nburn){ zj.sum <- lapply(1:j, function(x) zj[[x]] + zj.sum[[x ]]) #add this iterations zj values to sum } draws[iter,] <- c(pj,mu_compj,mu_noncj,sigmaj) } 125 ret <- list( samp=draws[-(1:nburn),], zj.sum=zj.sum ) return(ret) } D.2 Reversible-jump MCMC algorithm functions D.2.1 Gibbs sampler to flexibly estimate with either approach for RJM- CMC gibbs_normalmixture_rjmcmc <- function(yj,nj,j,pvecj,muj,thetaj,tauj,hpj, model,nic){ ###This function provides the code for updating the Gibbs sampler for a mixture of two normal distributions where all components are updated across j-levels, where each level may be estimated using either Gibbs sampler approach j.all <- which( model==0 ) #identify levels estimated using model not assuming the relationship j.rel <- which( model==1 ) #identify levels estimated using model assuming the relationship dj <- log(nic/nic[length(nic)]) zj <- lapply(1:j, function(x) c(0,rbinom(n=(nj[x]-2), size=1, pvecj [[x]][2:(nj[x]-1)]),1) ) #generate new latent indicators from Bernoulli with smallest observation fixed as compliant and largest fixed as non-compliant zj[[j]] <- rep(0, length(zj[[j]])) #if highest group/level is only compliant, convert all indicators to 0 ncj <- sapply(1:j, function(x) sum(zj[[x]]==0) ) 126 nnj <- sapply(1:j, function(x) sum(zj[[x]]==1) ) ycj <- lapply(1:j, function(x) yj[[x]][which(zj[[x]]==0)] ) ynj <- lapply(1:j, function(x) yj[[x]][which(zj[[x]]==1)] ) ###calculate component means and variances for each group #compliant components ycm <- sapply(1:j, function(x) mean(ycj[[x]]) ) ycv <- sapply(1:j, function(x) if( length(ycj[[x]])==1 ){ 0 }else{ var(ycj[[x]]) }) #place variance of 0 if only one observation to avoid errors #non-compliant components ynm <- sapply(1:j, function(x) if( length(ynj[[x]])==0 ){0}else{ mean(ynj[[x]]) }) ynv <- sapply(1:j, function(x) if( length(ynj[[x]]) == 0 | length( ynj[[x]]) == 1 ){ 0 }else{ var(ynj[[x]]) }) #place variance of 0 if only one observation to avoid errors # sampling p: probability of membership in group 2 (non-compliers) pj <- rbeta(j, nnj + hpj$p.alpha, ncj + hpj$p.beta) # sampling tau shape.val <- (nj/2) + hpj$tau.a rate.val <- 0.5 * ((ncj-1)*ycv + (nnj-1)*ynv + ncj*(ycm-muj)^2 + nnj*(ynm-(muj+thetaj))^2 + 2*hpj$tau.b ) tauj <- rgamma(j,shape=shape.val,rate=rate.val) sigmaj <- 1/sqrt(tauj) ### sampling mu #levels estimating compliant mean with own information only 127 m.mean_all <- -( tauj * (nnj*(thetaj - ynm) - ncj*ycm) ) / ( nj* tauj + hpj$mu.prec ) m.prec_all <- ( nj*tauj + hpj$mu.prec ) muj_all <- rnorm( length(j.all) , mean=m.mean_all[j.all], sd=sqrt (1/m.prec_all[j.all])) #groups calculating compliant mean based on relationship to mu_J g <- tauj[j.rel]*(nj[j.rel]*dj[j.rel] + nnj[j.rel]*thetaj[j.rel] - ncj[j.rel]*ycm[j.rel] - nnj[j.rel]*ynm[j.rel]) H1 <- sum(tauj[j.rel]*nj[j.rel]) + hpj$mu.prec[j] H2 <- sum(g[0:(length(j.rel)-1)]) - tauj[j]*nj[j]*ycm[j] m.mean_rel <- -H2/H1 m.prec_rel <- H1 muJ <- rnorm(1, mean=m.mean_rel, sd=sqrt(1/m.prec_rel)) muj_rel <- muJ+dj[j.rel] #combine muj estimates from both approaches muj[j.all] <- muj_all muj[j.rel] <- muj_rel # sampling theta t.mean <- -( nnj*tauj*(muj - ynm) ) / (nnj*tauj + hpj$theta.prec) t.prec <- (nnj*tauj + hpj$theta.prec) thetaj <- abs( rnorm(j, mean=t.mean, sd=sqrt(1/t.prec))) thetaj[j] <- 0 #replace largest group theta with 0 if notmix is TRUE ret <- list(mu_comp=muj, mu_nonc=muj+thetaj, mu=muj, theta=thetaj, tau=tauj, sigma=sigmaj, z=zj, p=pj) } 128 D.2.2 RJMCMC update step updatemodel <- function(yj,nj,j,pj,tauj,zj,muj,thetaj,model,hpj,prob,nic){ ###Function to complete the "reversible jump" portion of the RJMCMC to move between our two proposed models #pj: current estimate for proportion in non-compliant group for each level j #zj: latent variable classifying y_i subject in compliant or non-compliant group #prob: prior probability for level j using all components model (model 0) ncj <- sapply(1:j, function(x) sum(zj[[x]]==0) ) nnj <- sapply(1:j, function(x) sum(zj[[x]]==1) ) ycj <- sapply(1:j, function(x) yj[[x]][which(zj[[x]]==0)] ) ynj <- sapply(1:j, function(x) yj[[x]][which(zj[[x]]==1)] ) ###calculate component means and variances for each group #compliant components ycm <- sapply(1:j, function(x) mean(ycj[[x]]) ) ycv <- sapply(1:j, function(x) if( length(ycj[[x]])==1 ){ 0 }else{ var(ycj[[x]]) }) #place variance of 0 if only one observation to avoid errors #non-compliant components ynm <- sapply(1:j, function(x) if( length(ynj[[x]])==0 ){0}else{ mean(ynj[[x]]) }) ynv <- sapply(1:j, function(x) if( length(ynj[[x]])==0 | length(ynj [[x]])==1 ){ 0 }else{ var(ynj[[x]]) }) #place variance of 0 if only one observation to avoid errors ###Identify current model state and propose new variables for other model 129 #Determine which level to propose different model for r <- sample(1:(j-1),1) r.state <- model[r] muj_prop <- muj #create proposal muj to update with value below depending on current r.state model_prop <- model; if(r.state==0){model_prop[r] <- 1}else{model_ prop[r] <- 0} m.mean_r <- (-( tauj * (nnj*(thetaj - ynm) - ncj*ycm) ) / ( nj*tauj + hpj$mu.prec ))[r] m.prec_r <- ( nj*tauj + hpj$mu.prec )[r] if(r.state==0){ #model 0 is all components, so propose values under relationship assumption muj_prop[r] <- muj[j] + log(nic[r]/nic[length(nic)]) }else{ #model 1 is relationship, so propose values under all components model muj_prop[r] <- rnorm(1, mean=m.mean_r, sd=sqrt(1/m.prec_r)) } modprob.num <- prod( ( prob^abs(model_prop-1) * (1-prob)^model_prop )[1:(j-1)] ) modprob.den <- prod( ( prob^abs(model-1) * (1-prob)^model)[1:(j-1)] ) ###Calculate acceptance probability #Calculate log-likelihoods loglikj <- sapply(1:j, function(x) sum( log( (1-pj[x])*dnorm(yj[[x ]], mean=muj[x], sd=sqrt(1/tauj[x])) + pj[x]*dnorm(yj[[x]], mean =muj[x]+thetaj[x], sd=sqrt(1/tauj[x])) ) ) ) loglik <- sum(loglikj) 130 newloglikj <- sapply(1:j, function(x) sum( log( (1-pj[x])*dnorm(yj [[x]], mean=muj_prop[x], sd=sqrt(1/tauj[x])) + pj[x]*dnorm(yj[[x ]], mean=muj_prop[x]+thetaj[x], sd=sqrt(1/tauj[x])) ) ) ) newloglik <- sum(newloglikj) #if currently in model 0 (est all), look at num being proposed move for level r to model 1 (relationship assumed) if(r.state == 0){ num <- newloglik + log(dnorm(muj_prop[r], mean=m.mean_r, sd= sqrt(1/m.prec_r))) + log(modprob.num) den <- loglik + log(dnorm(muj[r], mean=0, sd=sqrt(1/hpj$mu. prec[r]))) + log(modprob.den) }else{ #if currently in model 1 (relationship), look at num being proposed move for level r to model 0 (estimate level with level’ s data) num <- newloglik + log(dnorm(muj_prop[r], mean=0, sd=sqrt(1/ hpj$mu.prec[r]))) + log(modprob.num) den <- loglik + log(dnorm(muj[r], mean=m.mean_r, sd=sqrt(1/m .prec_r))) + log(modprob.den) } ###Acceptance probability A <- min(1, exp(num-den)) u <- runif(1) if(u <= A){ if(r.state == 0){model[r] <- 1}else{model[r] <- 0} muj <- muj_prop } ret <- list(model=model, muj=muj) 131 } D.2.3 Function keep track of RJMCMC model state baseKto10 <- function(x,K=2){ ###Function to convert base K vector of values into base 10 number #x: vector with values in base K (e.g., base 2 has 0/1) #K: base to convert from, default is base 2 sum(x * K^((length(x)-1):0)) } D.2.4 Implement entire RJMCMC algorithm rjmcmc_2normals = function(y,inits,hp,prob,M,seed,nburn,grp,nic){ ###Gibbs samplers for mixture of 2 normal distributions assuming equal variance #inits: list of initial values to use for lambda, theta, tau, p [note: tau is precision] since proposal distribution is conditional posterior of mu_j) #prob: prior probability for level j using all components model (model 0) set.seed(seed) yj <- lapply( split(y, grp), sort) #split data and sort in one step j <- length(yj) #number of groups nj <- sapply( yj, length) #calculate number of observations within each group draws=matrix(NA,M,(7*j+1)) colnames(draws) <- c(’model’,paste0(’p’,1:j),paste0(’mu_comp’,1:j), paste0(’mu_nonc’,1:j),paste0(’sigma’,1:j),paste0(’c80_’,1:j), paste0(’c90_’,1:j),paste0(’c95_’,1:j)) 132 #initial values (if only one given, assume it is initial value for each level/group) if( length(inits$mu_comp)==1 ){ mu_compj <- rep(inits$mu_comp, j) } else{ mu_compj <- inits$mu_comp } if( length(inits$mu_nonc)==1 ){ mu_noncj <- rep(inits$mu_nonc, j) } else{ mu_noncj <- inits$mu_nonc } if( length(inits$tau)==1 ){ tauj <- rep(inits$tau, j) }else{ tauj <- inits$tau } if( length(inits$p)==1 ){ pj <- rep(inits$p, j) }else{ pj <- inits$ p } if( length(inits$model)==1 ){model <- c(rep(inits$model, (j-1)),1)} else{ model <- c(inits$model[1:(j-1)],1) } #always assume Jth model is 1 for relationship if( (length(mu_compj)!=j | length(mu_noncj)!=j | length(tauj)!=j | length(pj)!=j | length(model)!=j ) == TRUE ){ stop(’Please either give inits of length 1 or length j. At least one is not equal to 1 or j!’) } #break from function if inits are not of correct length #Convert hyperparameter values to be for each group if separate values not provided for each group (if only one given, assume it is used for each level/group) if( length(hp$tau.a)==1 ){ hp$tau.a <- rep(hp$tau.a,j) } if( length(hp$tau.b)==1 ){ hp$tau.b <- rep(hp$tau.b,j) } if( length(hp$p.alpha)==1 ){ hp$p.alpha <- rep(hp$p.alpha,j) } if( length(hp$p.beta)==1 ){ hp$p.beta <- rep(hp$p.beta,j) } if( length(hp$mu.prec)==1 ){ hp$mu.prec <- rep(hp$mu.prec,j) } if( length(hp$theta.prec)==1 ){ hp$theta.prec <- rep(hp$theta.prec, j) } 133 if( (length(hp$tau.a)!=j | length(hp$tau.b)!=j | length(hp$p.alpha) !=j | length(hp$p.beta)!=j | length(hp$mu.prec)!=j | length(hp$ theta.prec)!=j) == TRUE ){ stop(’Please give hp of length 1 or length j. At least one is not equal to 1 or j!’)} #intialize latent indicator variable zj <- lapply(1:length(yj), function(x) c(0, rep(NA, nj[x]-2), 1) ) #initialize latent indicator for each group zj.sum <- lapply(1:j, function(x) rep(0, nj[x]) ) #list keeping track of number of iterations after burn-in where y_ij is classified as k=1 for (iter in 1:M){ #Update MCMC according to model state currently in pvecj <- lapply(1:length(yj), function(x) pj[x]*dnorm(yj[[x ]], mean=mu_noncj[x], sd=sqrt(1/tauj[x])) / ( (1-pj[x])* dnorm(yj[[x]], mean=mu_compj[x], sd=sqrt(1/tauj[x])) + pj[x]*dnorm(yj[[x]], mean=mu_noncj[x], sd=sqrt(1/tauj[x ])) ) ) pvecj[[j]] <- rep(0, nj[j] ) #replace group J with all compliant indicators #update step with Gibbs sampler for given model assumed for each level gibbs.step <- gibbs_normalmixture_rjmcmc(yj=yj,nj=nj,pvecj= pvecj,muj=mu_compj,thetaj=mu_noncj-mu_compj,tauj=tauj, hpj=hp,j=j,model=model,nic=nic) mu_compj <- gibbs.step$mu_comp mu_noncj <- gibbs.step$mu_nonc 134 tauj <- gibbs.step$tau sigmaj <- gibbs.step$sigma zj <- gibbs.step$z pj <- gibbs.step$p if(iter > nburn){ zj.sum <- lapply(1:j, function(x) zj[[x]] + zj.sum[[x ]]) #add this iterations zj values to sum } #Complete RJ step to determine if we switch models and update rjstep <- updatemodel(yj=yj,pj=pj,j=j,nj=nj,tauj=tauj,zj=zj, muj=mu_compj,thetaj=mu_noncj-mu_compj,model=model,hpj=hp ,prob=prob,nic=nic) model <- rjstep$model mu_comp <- rjstep$muj mod10 <- baseKto10(x=model[1:(j-1)],K=2) draws[iter,] <- c(mod10,pj,mu_compj,mu_noncj,sigmaj) } ret <- list( samp=draws[-(1:nburn),], zj.sum=zj.sum ) return(ret) } Appendix E Acronyms Care has been taken in this thesis to minimize the use of acronyms, but this cannot always be achieved. This appendix contains a table of some of the more frequently occurring acronyms and their meaning. Table E.1: Acronyms Acronym Meaning AR Adaptive randomization BMA Bayesian model averaging CENIC-p1 Center for the Evaluation of Nicotine in Cigarettes, project 1 CP Commensurate prior CPD Cigarettes (smoked) per day EB Empirical Bayes(ian) ESSS Effective supplemental sample size EVD Ebola virus disease MEM Multi-source exchangeability model(s) PREVAIL II Partnership for Research on Ebola Vaccines in Liberia SHM Standard hierarchical model SOC/oSOC Standard of care/optimal standard of care VLNC Very low nicotine content 135