American Journal of Applied Mathematics and Statistics
ISSN (Print): 2328-7306 ISSN (Online): 2328-7292 Website: https://www.sciepub.com/journal/ajams Editor-in-chief: Mohamed Seddeek
Open Access
Journal Browser
Go
American Journal of Applied Mathematics and Statistics. 2022, 10(1), 28-38
DOI: 10.12691/ajams-10-1-5
Open AccessArticle

Analysis of Mixed Discrete and Heavy Tailed Longitudinal Data with Non-random Missingness Using Stochastic Variants of the EM Algorithm

Abdallah S. A. Yaseen1,

1The National Centre for Social and Criminological Research, Cairo, Egypt

Pub. Date: April 10, 2022

Cite this paper:
Abdallah S. A. Yaseen. Analysis of Mixed Discrete and Heavy Tailed Longitudinal Data with Non-random Missingness Using Stochastic Variants of the EM Algorithm. American Journal of Applied Mathematics and Statistics. 2022; 10(1):28-38. doi: 10.12691/ajams-10-1-5

Abstract

Interstitial cystitis (IC) is a chronic inflammatory condition that results in recurring discomfort or pain in the bladder and the surrounding pelvic region. In interstitial cystitis data base (ICDB) cohort study, the main target is to determine the influence of covariates, such as the demographic clinical characteristics of patients, on the longitudinal outcomes including the pain score (p), urinary urgency (u) and urinary frequency (f) which are three main indices reflecting IC symptoms. The ICDB data are mixed (discrete and continuous) longitudinal data. In longitudinal studies the continuous response may be non-normal, heavy tailed for example. The analysis of mixed longitudinal data is challenging due to several inherent features: (1) more than one outcome are followed for each subject over a period of time. (2) The longitudinal outcomes are subject to missingness that may be missing not at random (MNAR). This article proposes the analysis of mixed discrete and heavy tailed longitudinal outcomes subject to MNAR missingness using two different alternative algorithms. The continuous outcome is assumed to follow non-normal heavy tailed distribution. The proposed methodology is an extension of [1] and [2]. The proposed techniques are applied to Interstitial Cystitis data. Also, three simulation studies are conducted to validate the proposed techniques.

Keywords:
stochastic expectation maximization parametric fractional imputation interstitial cystitis longitudinal data maximum likelihood missing data

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

References:

[1]  Yang, Y. and J. Kang, Joint analysis of mixed Poisson and continuous longitudinal data with nonignorable missing values. Computational Statistics & Data Analysis, 2010. 54(1): p. 193-207.
 
[2]  Yaseen, A.S.A. and A.M. Gad, A stochastic variant of the EM algorithm to fit mixed (discrete and continuous) longitudinal data with nonignorable missingness. Communications in Statistics - Theory and Methods, 2019.
 
[3]  Olkin, I. and R.F. Tate, Multivariate correlation models with mixed discrete and continuous variables. The Annals of Mathematical Statistics, 1961: p. 448-465.
 
[4]  Celeux, G. and J. Diebolt, The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational statistics quarterly, 1985. 2(1): p. 73-82.
 
[5]  Gad, A.M. and A.S. Ahmed, Analysis of longitudinal data with intermittent missing values using the stochastic EM algorithm. Computational Statistics & Data Analysis, 2006. 50(10): p. 2702-2714.
 
[6]  Gad, A.M. and A.S. Ahmed, Sensitivity analysis of longitudinal data with intermittent missing values. Statistical Methodology, 2007. 4(2): p. 217-226.
 
[7]  Gad, A.M. and N.I. EL-Zayat, Fitting Multivariate Linear Mixed Model for Multiple Outcomes Longitudinal Data with Non-ignorable Dropout. International Journal of Probability and Statistics, 2018. 7(4): p. 97-105.
 
[8]  Kim, J.K. and W. Fuller, Parametric fractional imputation for missing data analysis. Joint Statistical Meeting Proceedings, 2008: p. 158-169.
 
[9]  Kim, J.K., Parametric fractional imputation for missing data analysis. Biometrika, 2011. 98(1): p. 119-132.
 
[10]  Kim, J.Y. and J.K. Kim, Parametric fractional imputation for nonignorable missing data. Journal of the Korean Statistical Society, 2012. 41(3): p. 291-303.
 
[11]  Kim, J.K. and M. Hong, Imputation for statistical inference with coarse data. Canadian Journal of Statistics, 2012. 40(3): p. 604-618.
 
[12]  Yang, S., J.-K. Kim, and Z. Zhu, Parametric fractional imputation for mixed models with nonignorable missing data. Statistics and Its Interface, 2013. 6(3): p. 339-347.
 
[13]  Yaseen, A. S., Gad, A. M., & Ahmed, A. S, Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation. American Journal of Applied Mathematics and Statistics, 2016. 4(3): p. 59-66.
 
[14]  Shen, S., C. Beunckens, C. Mallinckrodt, and G. Molenberghs, A local influence sensitivity analysis for incomplete longitudinal depression data. Journal of biopharmaceutical statistics, 2006. 16(3): p. 365-384.
 
[15]  Pinheiro, J.C., C. Liu, and Y.N. Wu, Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. Journal of Computational and Graphical Statistics, 2001. 10(2): p. 249-276.
 
[16]  Wang, W.-L. and T.-H. Fan, Estimation in multivariate t linear mixed models for multiple longitudinal data. Statistica Sinica, 2011: p. 1857-1880.
 
[17]  Luo, S., J. Ma, and K.D. Kieburtz, Robust Bayesian inference for multivariate longitudinal data by using normal/independent distributions. Statistics in medicine, 2013. 32(22): p. 3812-3828.
 
[18]  Wang, W.-L., T.-I. Lin, and V.H. Lachos, Extending multivariate-t linear mixed models for multiple longitudinal data with censored responses and heavy tails. Statistical methods in medical research, 2018. 27(1): p. 48-64.
 
[19]  Achcar, J.A., E.A. Coelho-Barros, J.R.T. Cuevas, and J. Mazucheli, Use of Lèvy distribution to analyze longitudinal data with asymmetric distribution and presence of left censored data. Communications for Statistical Applications and Methods, 2018. 25(1): p. 43-60.
 
[20]  Lee, D., Y. Lee, M.C. Paik, and M.G. Kenward, Robust inference using hierarchical likelihood approach for heavy-tailed longitudinal outcomes with missing data: An alternative to inverse probability weighted generalized estimating equations. Computational statistics & data analysis, 2013. 59: p. 171-179.
 
[21]  Wang, W.-L., Mixture of multivariate t nonlinear mixed models for multiple longitudinal data with heterogeneity and missing values. TEST, 2018. 28: p. 1-27.
 
[22]  Wang, W.-L. and T.-H. Fan, Bayesian analysis of multivariate t linear mixed models using a combination of IBF and Gibbs samplers. Journal of Multivariate Analysis, 2012. 105(1): p. 300-310.
 
[23]  Wang, W.L. and T.I. Lin, Multivariate t nonlinear mixed-effects models for multi-outcome longitudinal data with missing values. Statistics in medicine, 2014. 33(17): p. 3029-3046.
 
[24]  Peel, D. and G.J. McLachlan, Robust mixture modelling using the t distribution. Statistics and computing, 2000. 10(4): p. 339-348.
 
[25]  McLachlan, G. and T. Krishnan, The EM algorithm and extensions. Vol. 382. 2007: John Wiley & Sons.
 
[26]  Andrews, D.F. and C.L. Mallows, Scale mixtures of normal distributions. Journal of the Royal Statistical Society. Series B (Methodological), 1974: p. 99-102.
 
[27]  Meza, C., F. Osorio, and R. De la Cruz, Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Statistics and Computing, 2012. 22(1): p. 121-139.
 
[28]  Forbes, F. and D. Wraith, A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering. Statistics and Computing, 2014. 24(6): p. 971-984.
 
[29]  Kenward, M.G., Selection models for repeated measurements with non-random dropout: an illustration of sensitivity. Statistics in medicine, 1998. 17(23): p. 2723-2732.
 
[30]  Diggle, P. and M.G. Kenward, Informative drop-out in longitudinal data analysis. Applied statistics, 1994: p. 49-93.
 
[31]  Little, R.J. and D.B. Rubin, statistical analysis with missing data. 1987, New York: Wiley.
 
[32]  Meng, X.-L. and D.B. Rubin, Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 1993. 80(2): p. 267-278.
 
[33]  Council, N.R., Principles and methods of sensitivity analyses, in The Prevention and Treatment of Missing Data in Clinical Trials. 2010, National Academies Press (US).
 
[34]  Daniels, M.J., D. Jackson, W. Feng, and I.R. White, Pattern mixture models for the analysis of repeated attempt designs. Biometrics, 2015. 71(4): p. 1160-1167.
 
[35]  Jennrich, R.I. and M.D. Schluchter, Unbalanced repeated-measures models with structured covariance matrices. Biometrics, 1986. 42: p. 805-820.
 
[36]  Propert, K.J., A.J. Schaeffer, C.M. Brensinger, J.W. Kusek, L.M. Nyberg, and J.R. Landis, A prospective study of interstitial cystitis: results of longitudinal followup of the interstitial cystitis data base cohort. The Journal of urology, 2000. 163(5): p. 1434-1439.
 
[37]  Karlis, D. and L. Meligkotsidou, Multivariate Poisson regression with covariance structure. Statistics and Computing, 2005. 15(4): p. 255-265.
 
[38]  Shi, P. and E.A. Valdez, Multivariate negative binomial models for insurance claim counts. Insurance: Mathematics and Economics, 2014. 55: p. 18-29.