Contextual Bandits With Delayed Feedback Using Randomized Allocation

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Contextual Bandits With Delayed Feedback Using Randomized Allocation

Published Date

2020-05

Publisher

Type

Thesis or Dissertation

Abstract

Contextual bandit problems are important for sequential learning in various practical settings that require balancing the exploration-exploitation trade-off to maximize total rewards. Motivated by applications in health care, we consider a multi-armed bandit setting with covariates and allow for delay in observing the rewards (treatment outcomes) as would most likely be the case in a medical setting. We focus on developing randomized allocation strategies that incorporate delayed rewards using nonparametric regression methods for estimating the mean reward functions. Although there has been substantial work on handling delays in standard multi-armed bandit problems, the field of contextual bandits with delayed feedback, especially with nonparametric estimation tools, remains largely unexplored. In the first part of the dissertation, we study a simple randomized allocation strategy incorporating delayed feedback, and establish strong consistency. Our setup is widely applicable as we allow for delays to be random and unbounded with mild assumptions, an important setting that is usually not considered in previous works. We study how different hyperparameters controlling the amount of exploration and exploitation in a randomized allocation strategy should be updated based on the extent of delays and underlying complexities of the problem, in order to enhance the overall performance of the strategy. We provide theoretical guarantees of the proposed methodology by establishing asymptotic strong consistency and finite-time regret bounds. We also conduct simulations and real data evaluations to illustrate the performance of the proposed strategies. In addition, we consider the problem of integrating expert opinion into a randomized allocation strategy for contextual bandits. This is also motivated by applications in health care, where a doctor's opinion is crucial in the treatment decision making process. Therefore, although contextual bandit algorithms are proven to work both theoretically and empirically in many practical settings, it is crucial to incorporate doctor's judgment to build an adaptive bandit strategy. We propose a randomized allocation strategy incorporating doctor's interventions and show that it is strongly consistent.

Description

University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); ix, 154 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Arya, Sakshi. (2020). Contextual Bandits With Delayed Feedback Using Randomized Allocation. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215062.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.