Arya, Sakshi2020-08-252020-08-252020-05https://hdl.handle.net/11299/215062University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); ix, 154 pages.Contextual bandit problems are important for sequential learning in various practical settings that require balancing the exploration-exploitation trade-off to maximize total rewards. Motivated by applications in health care, we consider a multi-armed bandit setting with covariates and allow for delay in observing the rewards (treatment outcomes) as would most likely be the case in a medical setting. We focus on developing randomized allocation strategies that incorporate delayed rewards using nonparametric regression methods for estimating the mean reward functions. Although there has been substantial work on handling delays in standard multi-armed bandit problems, the field of contextual bandits with delayed feedback, especially with nonparametric estimation tools, remains largely unexplored. In the first part of the dissertation, we study a simple randomized allocation strategy incorporating delayed feedback, and establish strong consistency. Our setup is widely applicable as we allow for delays to be random and unbounded with mild assumptions, an important setting that is usually not considered in previous works. We study how different hyperparameters controlling the amount of exploration and exploitation in a randomized allocation strategy should be updated based on the extent of delays and underlying complexities of the problem, in order to enhance the overall performance of the strategy. We provide theoretical guarantees of the proposed methodology by establishing asymptotic strong consistency and finite-time regret bounds. We also conduct simulations and real data evaluations to illustrate the performance of the proposed strategies. In addition, we consider the problem of integrating expert opinion into a randomized allocation strategy for contextual bandits. This is also motivated by applications in health care, where a doctor's opinion is crucial in the treatment decision making process. Therefore, although contextual bandit algorithms are proven to work both theoretically and empirically in many practical settings, it is crucial to incorporate doctor's judgment to build an adaptive bandit strategy. We propose a randomized allocation strategy incorporating doctor's interventions and show that it is strongly consistent.encontextualMulti-armed bandit problemnonparametric regressionregretsequential analysisstrong consistencyContextual Bandits With Delayed Feedback Using Randomized AllocationThesis or Dissertation