Modifications of Q-learning to Optimize Dynamic Treatment Regimes

With an emerging interest in personalized medicine and quality healthcare, the design of clinical trials that incorporates multiple stages of randomization and intervention, for example, a sequential multiple assignment randomized trial (SMART), has become a popular choice for investigators as it facilitates the construction and analysis of dynamic treatment regimes (DTRs). There exists a comprehensive body of literature on various statistical methods to analyze data collected from such trials and estimate the optimal DTR for an individual subject, among which Q-learning with linear regression is widely used due to its simplicity and ease of interpretation. This thesis discusses three important challenges that cause problems in the implementation of Q-learning and proposes multiple modifications of Q-learning to address them.The first challenge arises from the repeatedly monitored outcome of interest at intermediate stages of randomization and at longer follow-up intervals after the final stage of randomization. Clinical investigators are usually interested in identifying the optimal DTR and estimating the outcome trajectory under the optimal DTR. However, in the presence of stagewise repeated-measures outcomes, standard Q-learning fails to provide point estimates of the optimal trajectory with time-specific heterogeneous causal effects. To address this problem, we propose a modified algorithm of Q-learning with a generalized estimating equation to estimate each Q-function. The second challenge is model misspecification. Model misspecification is a common problem in Q-learning, but little attention has been given to its impact when treatment effects are heterogeneous across subjects. We describe the integrative impact of two possible types of model misspecification related to treatment effect heterogeneity: unexplained early-stage treatment effects in late-stage main effect model, and misspecified linearity between pseudo-outcomes and predictors as a result of the optimization operation. The proposed method, aiming to deal with both types of misspecification concomitantly, builds interactive models into residual-modified parametric Q-learning. The third challenge is generalizing modified Q-learning to dichotomous outcomes. It is difficult to include informative residuals from estimation of late-stage models into early-stage pseudo-outcomes due to the non-identity link function. We propose a modification based on monotonicity of preferences to address model misspecification in Q-learning with probit regression. The improvement in robustness of the proposed modification is subject to the extent of model misspecification and can be limited. Thus, we take a latent variable approach and propose a novel algorithm using sampled surrogates of the underlying continuous outcome conditional on the binary observations. The methods proposed in this thesis are assessed via simulations and illustrated using the M-bridge study, a SMART with embedded tailoring which develops and evaluates adaptive interventions for preventing binge drinking among college students.

Keywords

Dynamic treatment regimes

Q-learning

Sequential multiple assignment randomized trials

Description

University of Minnesota Ph.D. dissertation. 2021. Major: Biostatistics. Advisors: Thomas Murray, David Vock. 1 computer file (PDF); xiii, 103 pages.

Collections

Dissertations

Suggested citation

Zhang, Yuan. (2021). Modifications of Q-learning to Optimize Dynamic Treatment Regimes. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/225006.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.

University Digital Conservancy

Modifications of Q-learning to Optimize Dynamic Treatment Regimes

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation

University Digital Conservancy

University of Minnesota Twin Cities

Modifications of Q-learning to Optimize Dynamic Treatment Regimes

View/Download File

Persistent link to this item

Statistics

Journal Title

Journal ISSN

Volume Title

Title

Alternative title

Authors

Published Date

Publisher

Type

Abstract

Keywords

Description

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Other identifiers

Suggested citation