Variable Selection for Sparse Logistic Regression Model with Errors in Covariates

Zanhua Yin; Zhichao Wang

American Journal of Applied Mathematics and Statistics. 2025, 13(2), 24-29
DOI: 10.12691/ajams-13-2-1

Open AccessArticle

Variable Selection for Sparse Logistic Regression Model with Errors in Covariates

Zanhua Yin^1, and Zhichao Wang¹

¹School of Mathematical and Computer Sciences, Gannan Normal University, Ganzhou, People’s Republic of China

Pub. Date: April 08, 2025

View Full Text Full Text PDF (230 KB) Full Text ePUB(263 KB)

Cite this paper:
Zanhua Yin and Zhichao Wang. Variable Selection for Sparse Logistic Regression Model with Errors in Covariates. American Journal of Applied Mathematics and Statistics. 2025; 13(2):24-29. doi: 10.12691/ajams-13-2-1

Abstract

This paper addresses variable selection problems in sparse logistic regression model with errors-in-covariates. We propose a corrected score Lasso method, which combines the weighted score Lasso approach with a projected gradient descent algorithm, to handle the challenges posed by measurement errors. The weighted score Lasso introduces a correction-amenable score function, enabling direct extension to measurement error scenarios through subsequent score correction. Our method bridges the gap between rigorous measurement error correction and practical high-dimensional implementation, establishing a framework extensible to other generalized linear models with exponential family structure. Numerical studies demonstrate the superior performance of the corrected score Lasso in error correction scenarios, highlighting its potential as a robust tool for high-dimensional data analysis with measurement error.

Keywords:
Corrected score Lasso logistic regression model measurement error sparse

This work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]	Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288, 1996.

[2]	Fan, J. and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96 (456), 1348-1360, 2001.

[3]	Zou, H. and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2), 301-320, 2005.

[4]	Yuan, M. and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49-67, 2006.

[5]	Candes, E. and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351, 2007.

[6]	Bühlmann, P. and S. Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.

[7]	Zou, H. The adaptive lasso and its oracle properties. Journal of the American statistical association 101 (476), 1418-1429, 2006.

[8]	Loh, P.-L. and M. J. Wainwright. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems, pp. 476-484, 2013.

[9]	Yin, Z. Variable selection for sparse logistic regression. Metrika 83 (7), 821-836, 2020.

[10]	Zhong, M., Z. Yin, and Z. Wang. Variable selection for sparse logistic regression with grouped variables. Mathematics 11 (24), 4979, 2023.

[11]	Cornilly, D., L. Tubex, S. Van Aelst, and T. Verdonck. Robust and sparse logistic regression. Advances in Data Analysis and Classification 18 (3), 663-679, 2024.

[12]	Basu, A., A. Ghosh, M. Jaenada, and L. Pardo. Robust adaptive lasso in high-dimensional logistic regression. Statistical Methods & Applications 33 (5), 1217-1249, 2024.

[13]	Feng, J., S. Megerian, and M. Potkonjak. Model-based calibration for sensor networks. In Sensors, Proceedings of IEEE, Volume 2, pp. 737-742, 2003.

[14]	Purdom, E. and S. P. Holmes. Error distribution for gene expression data. Statistical applications in genetics and molecular biology 4 (1), 2005.

[15]	Benjamini, Y. and T. P. Speed. Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic acids research 40 (10), e72-e72, 2012.

[16]	Carroll, R. J., D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu. Measurement error in nonlinear models: a modern perspective. CRC press, 2006.

[17]	Rosenbaum, M., A. B. Tsybakov, et al. Sparse recovery under matrix uncertainty. The Annals of Statistics 38 (5), 2620-2651, 2010.

[18]	Rosenbaum, M. and A. B. Tsybakov. Improved matrix uncertainty selector. 9, 276-291, 2013.

[19]	Loh, P.-L. and M. J. Wainwright. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. In Advances in Neural Information Processing Systems, pp. 2726-2734, 2011.

[20]	Chen, Y. and C. Caramanis. Orthogonal matching pursuit with noisy and missing data: Low and high dimensional results. arXiv preprint arXiv:1206.0823, 2012.

[21]	Datta, A. and H. Zou. Cocolasso for high-dimensional error-in-variables regression. Annals of Statistics 45 (6), 2400-2426, 2017.

[22]	Escribe, C., T. Lu, J. Keller-Baruch, V. Forgetta, B. Xiao, J. B. Richards, S. Bhatnagar, K. Oualkacha, and C. M. Greenwood. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genetic Epidemiology 45 (8), 874-890, 2021.

[23]	Sørensen, Ø., A. Frigessi, and M. Thoresen. Measurement error in lasso: Impact and likelihood bias correction. Statistica Sinica, 809-829, 2015.

[24]	Sørensen, Ø., K. H. Hellton, A. Frigessi, and M. Thoresen. Covariate selection in high-dimensional generalized linear models with measurement error. Journal of Computational and Graphical Statistics 27 (4), 739-749, 2018.

[25]	Chen, L.-P. Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics 33 (2), 407-420, 2024.

[26]	Stefanski, L. A. Unbiased estimation of a nonlinear function a normal mean with application to measurement error of models. Communications in Statistics-Theory and Methods 18 (12), 4335-4358, 1989.

[27]	Friedman, J., T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33 (1), 1-22, 2010.

[28]	Duchi, J., S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pp. 272-279, 2008. ACM.