American Journal of Applied Mathematics and Statistics
ISSN (Print): 2328-7306 ISSN (Online): 2328-7292 Website: http://www.sciepub.com/journal/ajams Editor-in-chief: Mohamed Seddeek
Open Access
Journal Browser
Go
American Journal of Applied Mathematics and Statistics. 2019, 7(6), 196-204
DOI: 10.12691/ajams-7-6-2
Open AccessArticle

Predictive Modelling of Benign and Malignant Tumors Using Binary Logistic, Support Vector Machine and Extreme Gradient Boosting Models

Peter Gachoki1, , Moses Mburu2 and Moses Muraya3

1Department of Physical Sciences, Chuka University, P.O Box 109-60400, Chuka, Kenya

2KEMRI-Wellcome Trust Kilifi, P.O Box 230-80108, Kilifi, Kenya

3Department of Plant Sciences, Chuka University, P.O Box 109-60400, Chuka, Kenya

Pub. Date: November 26, 2019

Cite this paper:
Peter Gachoki, Moses Mburu and Moses Muraya. Predictive Modelling of Benign and Malignant Tumors Using Binary Logistic, Support Vector Machine and Extreme Gradient Boosting Models. American Journal of Applied Mathematics and Statistics. 2019; 7(6):196-204. doi: 10.12691/ajams-7-6-2

Abstract

Breast cancer is the leading type of cancer among women worldwide, with about 2 million new cases and 627,000 deaths every year. The breast tumors can be malignant or benign. Medical screening can be used to detect the type of a diagnosed tumor. Alternatively, predictive modelling can also be used to predict whether a tumor is malignant or benign. However, the accuracy of the prediction algorithms is important since any incidence of false negatives may have dire consequence since a person cannot be put under medication, which can lead to death. Moreover, cases of false positives may subject an individual to unnecessary stress and medication. Therefore, this study sought to develop and validate a new predictive model based on binary logistic, support vector machine and extreme gradient boosting models in order to improve the prediction accuracy of the cancer tumors. This study used the Breast Cancer Wilcosin data set available on Kaggle. The dependent variable was whether a tumor is malignant or benign. The regressors were the tumor features such as radius, texture, area, perimeter, smoothness, compactness, concavity, concave points, symmetry and fractional dimension of the tumor. Data analysis was done using the R-statistical software and it involved, generation of descriptive statistics, data reduction, feature selection and model fitting. Before model fitting was done, the reduced data was split into the train set and the validation set. The results showed that the binary logistic, support vector machine and extreme gradient boosting models had predictive accuracies of 96.97%, 98.01% and 97.73%. This showed an improvement compared to already existing models. The results of this study showed that support vector machine and extreme gradient boosting have better prediction power for cancer tumors compared to binary logistic. This study recommends the use of support vector machine and extreme gradient boosting in cancer tumor prediction and also recommends further investigations for other algorithms that can improve prediction.

Keywords:
benign malignant binary logistic support vector machine extreme gradient boosting

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Figures

Figure of 11

References:

[1]  Akram, M., Iqbal, M., Daniyal, M., & Khan, A. U. (2017). Awareness and current knowledge of breast cancer. Biological research, 50(1), 33.
 
[2]  American Cancer Society (2018). Breast Cancer Facts and Figures 2017-2018. https://www.cancer.org/content/dam/cancer- org/research/cancer-facts-and-statistics/breast-cancer-facts-and- figures/breast-cancer-facts-and-figures-2017-2018.pdf.
 
[3]  Chaurasia, V., & Pal, S. (2014). Performance analysis of data mining algorithms for diagnosis and prediction of heart and breast cancer disease. Review Of Research, 3(8).
 
[4]  Chaurasia, V., & Pal, S. (2017). A novel approach for breast cancer detection using data mining techniques. International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol, 2.
 
[5]  Chaurasia, V., Pal, S., & Tiwari, B. B. (2018). Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 12(2), 119-126.
 
[6]  Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). ACM.
 
[7]  Paul, L. C., Suman, A. A., & Sultan, N. (2013). Methodological analysis of principal component analysis (PCA) method. International Journal of Computational Engineering & Management, 16(2), 32-38.
 
[8]  Rivera-Franco, M. M., & Leon-Rodriguez, E. (2018). Delays in breast cancer detection and treatment in developing countries. Breast cancer: basic and clinical research, 12, 1178223417752677.
 
[9]  Shawe-Taylor, J., & Sun, S. (2011). A review of optimization methodologies in support vector machines. Neurocomputing, 74(17), 3609-3618.
 
[10]  Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia medica: Biochemia medica, 24(1), 12-18.