American Journal of Applied Mathematics and Statistics
ISSN (Print): 2328-7306 ISSN (Online): 2328-7292 Website: https://www.sciepub.com/journal/ajams Editor-in-chief: Mohamed Seddeek
Open Access
Journal Browser
Go
American Journal of Applied Mathematics and Statistics. 2025, 13(2), 24-29
DOI: 10.12691/ajams-13-2-1
Open AccessArticle

Variable Selection for Sparse Logistic Regression Model with Errors in Covariates

Zanhua Yin1, and Zhichao Wang1

1School of Mathematical and Computer Sciences, Gannan Normal University, Ganzhou, People’s Republic of China

Pub. Date: April 08, 2025

Cite this paper:
Zanhua Yin and Zhichao Wang. Variable Selection for Sparse Logistic Regression Model with Errors in Covariates. American Journal of Applied Mathematics and Statistics. 2025; 13(2):24-29. doi: 10.12691/ajams-13-2-1

Abstract

This paper addresses variable selection problems in sparse logistic regression model with errors-in-covariates. We propose a corrected score Lasso method, which combines the weighted score Lasso approach with a projected gradient descent algorithm, to handle the challenges posed by measurement errors. The weighted score Lasso introduces a correction-amenable score function, enabling direct extension to measurement error scenarios through subsequent score correction. Our method bridges the gap between rigorous measurement error correction and practical high-dimensional implementation, establishing a framework extensible to other generalized linear models with exponential family structure. Numerical studies demonstrate the superior performance of the corrected score Lasso in error correction scenarios, highlighting its potential as a robust tool for high-dimensional data analysis with measurement error.

Keywords:
Corrected scoreLasso logistic regression modelmeasurement error sparse

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]  Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288, 1996.
 
[2]  Fan, J. and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96 (456), 1348-1360, 2001.
 
[3]  Zou, H. and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2), 301-320, 2005.
 
[4]  Yuan, M. and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49-67, 2006.
 
[5]  Candes, E. and T. Tao. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 2313-2351, 2007.
 
[6]  Bühlmann, P. and S. Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
 
[7]  Zou, H. The adaptive lasso and its oracle properties. Journal of the American statistical association 101 (476), 1418-1429, 2006.
 
[8]  Loh, P.-L. and M. J. Wainwright. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima. In Advances in Neural Information Processing Systems, pp. 476-484, 2013.
 
[9]  Yin, Z. Variable selection for sparse logistic regression. Metrika 83 (7), 821-836, 2020.
 
[10]  Zhong, M., Z. Yin, and Z. Wang. Variable selection for sparse logistic regression with grouped variables. Mathematics 11 (24), 4979, 2023.
 
[11]  Cornilly, D., L. Tubex, S. Van Aelst, and T. Verdonck. Robust and sparse logistic regression. Advances in Data Analysis and Classification 18 (3), 663-679, 2024.
 
[12]  Basu, A., A. Ghosh, M. Jaenada, and L. Pardo. Robust adaptive lasso in high-dimensional logistic regression. Statistical Methods & Applications 33 (5), 1217-1249, 2024.
 
[13]  Feng, J., S. Megerian, and M. Potkonjak. Model-based calibration for sensor networks. In Sensors, Proceedings of IEEE, Volume 2, pp. 737-742, 2003.
 
[14]  Purdom, E. and S. P. Holmes. Error distribution for gene expression data. Statistical applications in genetics and molecular biology 4 (1), 2005.
 
[15]  Benjamini, Y. and T. P. Speed. Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic acids research 40 (10), e72-e72, 2012.
 
[16]  Carroll, R. J., D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu. Measurement error in nonlinear models: a modern perspective. CRC press, 2006.
 
[17]  Rosenbaum, M., A. B. Tsybakov, et al. Sparse recovery under matrix uncertainty. The Annals of Statistics 38 (5), 2620-2651, 2010.
 
[18]  Rosenbaum, M. and A. B. Tsybakov. Improved matrix uncertainty selector. 9, 276-291, 2013.
 
[19]  Loh, P.-L. and M. J. Wainwright. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. In Advances in Neural Information Processing Systems, pp. 2726-2734, 2011.
 
[20]  Chen, Y. and C. Caramanis. Orthogonal matching pursuit with noisy and missing data: Low and high dimensional results. arXiv preprint arXiv:1206.0823, 2012.
 
[21]  Datta, A. and H. Zou. Cocolasso for high-dimensional error-in-variables regression. Annals of Statistics 45 (6), 2400-2426, 2017.
 
[22]  Escribe, C., T. Lu, J. Keller-Baruch, V. Forgetta, B. Xiao, J. B. Richards, S. Bhatnagar, K. Oualkacha, and C. M. Greenwood. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genetic Epidemiology 45 (8), 874-890, 2021.
 
[23]  Sørensen, Ø., A. Frigessi, and M. Thoresen. Measurement error in lasso: Impact and likelihood bias correction. Statistica Sinica, 809-829, 2015.
 
[24]  Sørensen, Ø., K. H. Hellton, A. Frigessi, and M. Thoresen. Covariate selection in high-dimensional generalized linear models with measurement error. Journal of Computational and Graphical Statistics 27 (4), 739-749, 2018.
 
[25]  Chen, L.-P. Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors. Journal of Computational and Graphical Statistics 33 (2), 407-420, 2024.
 
[26]  Stefanski, L. A. Unbiased estimation of a nonlinear function a normal mean with application to measurement error of models. Communications in Statistics-Theory and Methods 18 (12), 4335-4358, 1989.
 
[27]  Friedman, J., T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33 (1), 1-22, 2010.
 
[28]  Duchi, J., S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pp. 272-279, 2008. ACM.