Zhang, Yuan2021-10-132021-10-132021-08https://hdl.handle.net/11299/225006University of Minnesota Ph.D. dissertation. 2021. Major: Biostatistics. Advisors: Thomas Murray, David Vock. 1 computer file (PDF); xiii, 103 pages.With an emerging interest in personalized medicine and quality healthcare, the design of clinical trials that incorporates multiple stages of randomization and intervention, for example, a sequential multiple assignment randomized trial (SMART), has become a popular choice for investigators as it facilitates the construction and analysis of dynamic treatment regimes (DTRs). There exists a comprehensive body of literature on various statistical methods to analyze data collected from such trials and estimate the optimal DTR for an individual subject, among which Q-learning with linear regression is widely used due to its simplicity and ease of interpretation. This thesis discusses three important challenges that cause problems in the implementation of Q-learning and proposes multiple modifications of Q-learning to address them.The first challenge arises from the repeatedly monitored outcome of interest at intermediate stages of randomization and at longer follow-up intervals after the final stage of randomization. Clinical investigators are usually interested in identifying the optimal DTR and estimating the outcome trajectory under the optimal DTR. However, in the presence of stagewise repeated-measures outcomes, standard Q-learning fails to provide point estimates of the optimal trajectory with time-specific heterogeneous causal effects. To address this problem, we propose a modified algorithm of Q-learning with a generalized estimating equation to estimate each Q-function. The second challenge is model misspecification. Model misspecification is a common problem in Q-learning, but little attention has been given to its impact when treatment effects are heterogeneous across subjects. We describe the integrative impact of two possible types of model misspecification related to treatment effect heterogeneity: unexplained early-stage treatment effects in late-stage main effect model, and misspecified linearity between pseudo-outcomes and predictors as a result of the optimization operation. The proposed method, aiming to deal with both types of misspecification concomitantly, builds interactive models into residual-modified parametric Q-learning. The third challenge is generalizing modified Q-learning to dichotomous outcomes. It is difficult to include informative residuals from estimation of late-stage models into early-stage pseudo-outcomes due to the non-identity link function. We propose a modification based on monotonicity of preferences to address model misspecification in Q-learning with probit regression. The improvement in robustness of the proposed modification is subject to the extent of model misspecification and can be limited. Thus, we take a latent variable approach and propose a novel algorithm using sampled surrogates of the underlying continuous outcome conditional on the binary observations. The methods proposed in this thesis are assessed via simulations and illustrated using the M-bridge study, a SMART with embedded tailoring which develops and evaluates adaptive interventions for preventing binge drinking among college students.enDynamic treatment regimesQ-learningSequential multiple assignment randomized trialsModifications of Q-learning to Optimize Dynamic Treatment RegimesThesis or Dissertation