American Journal of Applied Mathematics and Statistics
ISSN (Print): 2328-7306 ISSN (Online): 2328-7292 Website: http://www.sciepub.com/journal/ajams Editor-in-chief: Mohamed Seddeek
Open Access
Journal Browser
Go
American Journal of Applied Mathematics and Statistics. 2018, 6(4), 126-134
DOI: 10.12691/ajams-6-4-2
Open AccessArticle

Evaluating Methods of Assessing “Optimism” in Regression Models

Daniel Thoya1, , Antony Waititu1, Thomas Magheto1 and Antony Ngunyi2

1Department of Statistics and Actuarial Science, Jommo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

2Department of Statistics and Actuarial Science, Dedan Kimathi University of Science and Technology, Nyeri, Kenya

Pub. Date: July 20, 2018

Cite this paper:
Daniel Thoya, Antony Waititu, Thomas Magheto and Antony Ngunyi. Evaluating Methods of Assessing “Optimism” in Regression Models. American Journal of Applied Mathematics and Statistics. 2018; 6(4):126-134. doi: 10.12691/ajams-6-4-2

Abstract

The purpose of this study was to evaluate the methods used to assess “optimism” in regression models. Particularly, focus was on the use of pseudo R2 values of cox &snail and the Nagelkerke to identify the best statistic for measuring “optimism” in regression models, measure model performance and determine the relationship between “optimism” and over fitting. Different underlying data sets assume different models that fit their data accurately. However, the fitted regression models usually fit the data they are based on better than new data. This is what we call ‘optimism’. Specific focus will be on determining the best statistic for measuring optimism in regression models, assess model performance using ‘optimism’ through cross-validation and also determining the relationship between optimism and over fitting of regression models. The study focus on three models (Cox-regression, Logistic regression and Linear Regression) and bootstrap procedure was used.

Keywords:
optimism pseudo-r-square cox & snell Nagelkerke

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]  Curtis, K. (2012). Book Review: Spatial Regression Models Ward M.D.GleditschK.S.2008. Spatial Regression Models. Thousand Oaks, CA: Sage. ISBN 978-1-4129-5415-0. Sociological Methods & Research, 41(4), 671-674.
 
[2]  Fahrmeir, L. (2013).Regression. Berlin; Springer.
 
[3]  Bartlett, J. (2014). Adjusting For Optimism/Overfitting in Measures of Predictive Ability Using Bootstrapping.
 
[4]  Kasza, J., & Wolfe, R. (2013). Interpretation of commonly used statistical regression models. Respirology, 19(1), 14-21.
 
[5]  J. Rispoli, F., & Shah, V. (2015). Using Simulation to Test the Reliability of Regression Models. Energy and Environment Research, 5(1).
 
[6]  Sugiyama, M. (2016). Model Selection for Maximum Likelihood Estimation. Introduction to Statistical Machine Learning, 147-156.
 
[7]  Ziegel, E. R., & Staff, S. I. (1996). Logistic Regression Examples Using the SAS System. Technometrics, 38(1), 86.
 
[8]  Shingleton, J. (2003). Crime Trend Prediction Using Re Gression Models For Salinas, California.
 
[9]  Christensen, E. (1997). Prognostic models in chronic liver disease: validity, usefulness and future role.
 
[10]  Smith, H. (2014). Regression Models, Types of. Wiley StatsRef: Statistics Reference Online.
 
[11]  Mannan, H. R., & McNeil, J. J. (2012). Computer programs to estimate overoptimism in measures of discrimination for predicting the risk of cardiovascular diseases. Journal of Evaluation in Clinical Practice, 19(2), 358-362.
 
[12]  Leon, L., & Cai, T. (2012). Model checking techniques for assessing functional form specifications in censored linear regression models. Statistica Sinica, 22(2).
 
[13]  Steyerberg, E. (1999). Stepwise Selection in Small Data Sets A Simulation Study of Bias in Logistic Regression Analysis. Journal of Clinical Epidemiology, 52(10), 935-942.
 
[14]  Kazak, A., & Kazak, R. (2003). Does cross validation provide additional information in the evaluation of regression models? Canadian Journal of Forest Research, 33(6), 976-987.