Nonparametric Estimation and Model Combination in a Bandit Problem with Covariates

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Nonparametric Estimation and Model Combination in a Bandit Problem with Covariates

Published Date

2014-07

Publisher

Type

Thesis or Dissertation

Abstract

Multi-armed bandit problem is an important optimization game that requires an exploration-exploitation tradeoff to achieve optimal total reward. Motivated from industrial applications such as online advertising and clinical research, we consider a setting where the rewards of bandit machines are associated with covariates. Under a flexible problem setup, we focus on a sequential randomized allocation strategy, under which, the plug-in regression methods for the estimation of mean reward functions play an important role in the algorithm performance. In the first part of the dissertation, we study the kernel estimation based randomized allocation strategy, and establish asymptotic strong consistency and finite-time regret analysis. In addition, although many nonparametric and parametric estimation methods in supervised learning may be applied to the randomized allocation strategy in a convenient plug-in fashion, guidance on how to choose among these estimation methods is generally unavailable. In the second part of the dissertation, we study a model combining allocation strategy for adaptive performance, and establish its asymptotic strong consistency. Simulations and a real data evaluation are conducted to illustrate the performance of the proposed combining strategy. In the existing literature of nonparametric bandit problem with covariates, it is generally assumed that the smoothness parameters of a Holder condition for the reward functions are known. Also, the finite-time regret analysis in the first part of the dissertation remains minimax sub-optimal. In the third part of the dissertation, we address these two issues by proposing a multi-stage randomized allocation strategy with arm elimination. In particular, when the smoothness parameter is unknown, we equip the algorithm with a smoothness parameter selector based on Lepski's method, and show that the regret minimax rate is achieved up to a logarithmic factor.

Keywords

Description

University of Minnesota Ph.D. dissertation. July 2014. Major: Statistics. Advisor: Yuhong Yang. 1 computer file (PDF); vii, 73 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Qian, Wei. (2014). Nonparametric Estimation and Model Combination in a Bandit Problem with Covariates. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/175307.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.