Nonparametric Estimation and Model Combination in a Bandit Problem with Covariates
2014-07
Loading...
Persistent link to this item
Statistics
View StatisticsJournal Title
Journal ISSN
Volume Title
Title
Nonparametric Estimation and Model Combination in a Bandit Problem with Covariates
Alternative title
Authors
Published Date
2014-07
Publisher
Type
Thesis or Dissertation
Abstract
Multi-armed bandit problem is an important optimization game that requires an exploration-exploitation tradeoff to achieve optimal total reward. Motivated from industrial applications such as online advertising and clinical research, we consider a setting where the rewards of bandit machines are associated with covariates. Under a flexible problem setup, we focus on a sequential randomized allocation strategy, under which, the plug-in regression methods for the estimation of mean reward functions play an important role in the algorithm performance. In the first part of the dissertation, we study the kernel estimation based randomized allocation strategy, and establish asymptotic strong consistency and finite-time regret analysis. In addition, although many nonparametric and parametric estimation methods in supervised learning may be applied to the randomized allocation strategy in a convenient plug-in fashion, guidance on how to choose among these estimation methods is generally unavailable. In the second part of the dissertation, we study a model combining allocation strategy for adaptive performance, and establish its asymptotic strong consistency. Simulations and a real data evaluation are conducted to illustrate the performance of the proposed combining strategy. In the existing literature of nonparametric bandit problem with covariates, it is generally assumed that the smoothness parameters of a Holder condition for the reward functions are known. Also, the finite-time regret analysis in the first part of the dissertation remains minimax sub-optimal. In the third part of the dissertation, we address these two issues by proposing a multi-stage randomized allocation strategy with arm elimination. In particular, when the smoothness parameter is unknown, we equip the algorithm with a smoothness parameter selector based on Lepski's method, and show that the regret minimax rate is achieved up to a logarithmic factor.
Keywords
Description
University of Minnesota Ph.D. dissertation. July 2014. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); vii, 73 pages.
Related to
Replaces
License
Collections
Series/Report Number
Funding information
Isbn identifier
Doi identifier
Previously Published Citation
Other identifiers
Suggested citation
Qian, Wei. (2014). Nonparametric Estimation and Model Combination in a Bandit Problem with Covariates. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/175307.
Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.