Nonparametric Box-Cox model in high-dimensional regression

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Authors

Published Date

Publisher

Abstract

The mainstream theory for high-dimensional regression typically assumes that the underlying true model is a low-dimensional linear regression model, which can be overly restrictive for practical applications. Drawing inspiration from traditional Box-Cox techniques used to mitigate anomalies such as non-additivity and heteroscedasticity in regression analysis, we introduce a more flexible framework---a nonparametric Box-Cox model with an unspecified monotone transformation function---to address model mis-specification in high-dimensional regression. Model fitting and computation become much more challenging than the usual penalized regression method. We introduce a two-step methodology for the estimation of the nonparametric Box-Cox model in high-dimensional settings. First, we propose a novel technique called composite probit regression (CPR) and use the folded concave penalized CPR for estimating the regression parameters. The strong oracle property of the estimator is established without knowing the nonparametric transformation function. Next, the nonparametric function is estimated by conducting univariate monotone regression. The computation is done efficiently by using a coordinate-majorization-descent algorithm. Extensive simulation studies show that the proposed method performs well in various settings. Our analysis of the supermarket data demonstrates the superior performance of the proposed method over the standard high-dimensional regression method. In high-dimensional linear hypothesis testing, model mis-specification can lead to misleading conclusions. The nonparametric Box-Cox model offers a flexible framework to partially address this issue while preserving the interpretability of regression coefficients. We develop composite likelihood inference theories for this model. Specifically, We propose the constrained partial penalized composite probit regression under the null hypothesis and investigate its statistical properties. We derive the partial penalized composite likelihood ratio test, composite likelihood score test, and Wald test, and show that their limiting distributions under null and local alternatives follow generalized chi-squared distributions and a noncentral generalized chi-squared distribution with the same degrees of freedom, respectively. For efficient implementation, we use augmented Lagrangian and coordinate majorization descent to compute the test statistics. Extensive simulation studies are conducted to examine the finite sample performance of the three proposed Box-Cox tests. We use supermarket data to illustrate benefits of using the Box-Cox tests over the existing tests. Furthermore, we address the selection of optimal weights within the general framework of composite likelihood methods, which remains mostly an open question in the literature. We define the optimal weights as those yielding the smallest total asymptotic variance. We propose an optimization-based and completely data-driven technique for deriving the optimal weights for general composite likelihood estimation. For the optimization part, we employ the mirror descent algorithm with the Kullback-Leibler divergence as the Bregman's distance function which provides explicit and computationally efficient updates. To illustrate the performance of our proposal, we focus on the composite probit regression method for estimating the nonparametric Box-Cox regression model. Numerical studies show that the optimal weighted composite estimator significantly improves mean-squared errors by reducing variances. The improved estimation accuracy increases the power of the composite likelihood ratio test in linear hypothesis testing. Empirical analyses of supermarket data demonstrate prediction improvement due to the optimal weights.

Description

University of Minnesota Ph.D. dissertation. July 2024. Major: Statistics. Advisor: Hui Zou. 1 computer file (PDF); x, 167 pages.

Related to

item.page.replaces

License

Collections

Series/Report Number

Funding Information

item.page.isbn

DOI identifier

Previously Published Citation

Other identifiers

Suggested Citation

Zhou, He. (2024). Nonparametric Box-Cox model in high-dimensional regression. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/277417.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.