Contextual Bandits With Delayed Feedback Using Randomized Allocation
2020-05
Loading...
View/Download File
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Contextual Bandits With Delayed Feedback Using Randomized Allocation
Authors
Published Date
2020-05
Publisher
Type
Thesis or Dissertation
Abstract
Contextual bandit problems are important for sequential learning in various practical settings that require balancing the exploration-exploitation trade-off to maximize total rewards. Motivated by applications in health care, we consider a multi-armed bandit setting with covariates and allow for delay in observing the rewards (treatment outcomes) as would most likely be the case in a medical setting. We focus on developing randomized allocation strategies that incorporate delayed rewards using nonparametric regression methods for estimating the mean reward functions. Although there has been substantial work on handling delays in standard multi-armed bandit problems, the field of contextual bandits with delayed feedback, especially with nonparametric estimation tools, remains largely unexplored. In the first part of the dissertation, we study a simple randomized allocation strategy incorporating delayed feedback, and establish strong consistency. Our setup is widely applicable as we allow for delays to be random and unbounded with mild assumptions, an important setting that is usually not considered in previous works. We study how different hyperparameters controlling the amount of exploration and exploitation in a randomized allocation strategy should be updated based on the extent of delays and underlying complexities of the problem, in order to enhance the overall performance of the strategy. We provide theoretical guarantees of the proposed methodology by establishing asymptotic strong consistency and finite-time regret bounds. We also conduct simulations and real data evaluations to illustrate the performance of the proposed strategies. In addition, we consider the problem of integrating expert opinion into a randomized allocation strategy for contextual bandits. This is also motivated by applications in health care, where a doctor's opinion is crucial in the treatment decision making process. Therefore, although contextual bandit algorithms are proven to work both theoretically and empirically in many practical settings, it is crucial to incorporate doctor's judgment to build an adaptive bandit strategy. We propose a randomized allocation strategy incorporating doctor's interventions and show that it is strongly consistent.
Description
University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); ix, 154 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Arya, Sakshi. (2020). Contextual Bandits With Delayed Feedback Using Randomized Allocation. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215062.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.