Modifications of Q-learning to Optimize Dynamic Treatment Regimes
2021-08
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Modifications of Q-learning to Optimize Dynamic Treatment Regimes
Authors
Published Date
2021-08
Publisher
Type
Thesis or Dissertation
Abstract
With an emerging interest in personalized medicine and quality healthcare, the design of clinical trials that incorporates multiple stages of randomization and intervention, for example, a sequential multiple assignment randomized trial (SMART), has become a popular choice for investigators as it facilitates the construction and analysis of dynamic treatment regimes (DTRs). There exists a comprehensive body of literature on various statistical methods to analyze data collected from such trials and estimate the optimal DTR for an individual subject, among which Q-learning with linear regression is widely used due to its simplicity and ease of interpretation. This thesis discusses three important challenges that cause problems in the implementation of Q-learning and proposes multiple modifications of Q-learning to address them.The first challenge arises from the repeatedly monitored outcome of interest at intermediate stages of randomization and at longer follow-up intervals after the final stage of randomization. Clinical investigators are usually interested in identifying the optimal DTR and estimating the outcome trajectory under the optimal DTR. However, in the presence of stagewise repeated-measures outcomes, standard Q-learning fails to provide point estimates of the optimal trajectory with time-specific heterogeneous causal effects. To address this problem, we propose a modified algorithm of Q-learning with a generalized estimating equation to estimate each Q-function. The second challenge is model misspecification. Model misspecification is a common problem in Q-learning, but little attention has been given to its impact when treatment effects are heterogeneous across subjects. We describe the integrative impact of two possible types of model misspecification related to treatment effect heterogeneity: unexplained early-stage treatment effects in late-stage main effect model, and misspecified linearity between pseudo-outcomes and predictors as a result of the optimization operation. The proposed method, aiming to deal with both types of misspecification concomitantly, builds interactive models into residual-modified parametric Q-learning. The third challenge is generalizing modified Q-learning to dichotomous outcomes. It is difficult to include informative residuals from estimation of late-stage models into early-stage pseudo-outcomes due to the non-identity link function. We propose a modification based on monotonicity of preferences to address model misspecification in Q-learning with probit regression. The improvement in robustness of the proposed modification is subject to the extent of model misspecification and can be limited. Thus, we take a latent variable approach and propose a novel algorithm using sampled surrogates of the underlying continuous outcome conditional on the binary observations. The methods proposed in this thesis are assessed via simulations and illustrated using the M-bridge study, a SMART with embedded tailoring which develops and evaluates adaptive interventions for preventing binge drinking among college students.
Description
University of Minnesota Ph.D. dissertation. 2021. Major: Biostatistics. Advisors: Thomas Murray, David Vock. 1 computer file (PDF); xiii, 103 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Zhang, Yuan. (2021). Modifications of Q-learning to Optimize Dynamic Treatment Regimes. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/225006.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.