Journal of Computer Sciences and Applications
ISSN (Print): 2328-7268 ISSN (Online): 2328-725X Website: https://www.sciepub.com/journal/jcsa Editor-in-chief: Minhua Ma, Patricia Goncalves
Open Access
Journal Browser
Go
Journal of Computer Sciences and Applications. 2024, 12(1), 17-24
DOI: 10.12691/jcsa-12-1-3
Open AccessArticle

Facing the Clinical Trial Annotation Problem on Breast Cancer: Natural Language Processing & Machine Learning Models Selection

Pablo Eliseo Reynoso-Aguirre1, and Pedro Flores-Pérez2

1Computer Science Department, Universitat Politècnica de Catalunya UPC, Barcelona, Spain

2Mathematics Department, University of Sonora, Hermosillo, México

Pub. Date: August 25, 2024

Cite this paper:
Pablo Eliseo Reynoso-Aguirre and Pedro Flores-Pérez. Facing the Clinical Trial Annotation Problem on Breast Cancer: Natural Language Processing & Machine Learning Models Selection. Journal of Computer Sciences and Applications. 2024; 12(1):17-24. doi: 10.12691/jcsa-12-1-3

Abstract

Clinical trial classification problem (CTCP) is one of the cutting-edge real-life applications in biomedical informatics, especially in the domain considered in this paper, namely breast cancer. The task consists in the development of models able to discriminate patient’s eligibility profile at breast cancer trials based on performance status (PS) labels. The task has gained relevance at medical research and practice in the framework of decision support systems. Besides, the task has been considered a meaningful instrument for an accurate selection of participants at experimentations resulting in no health-behavioral drug side effects on participants.

Keywords:
ECOG KPS performance status eligibility criteria clinical trial classification multinomial linear regression multinomial naive bayes multilayer perceptron support vector machines

Creative CommonsThis work is licensed under a Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Figures

Figure of 11

References:

[1]  Demner-Fushman D., Chapman WW., McDonald CJ. What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, Number 42, Vol. 5 (2009).
 
[2]  National Institute of Health. Breast Cancer Clinical Trials. (2017)
 
[3]  Clinical Trials Governmental Organization. Protocol Registration Data Element Definitions for Interventional and Observational Studies. http://prsinfo.clinicaltrials.gov/definitions.html, (2017).
 
[4]  Melnikov M., Vorobkalov P. Metrics in Ontologies in the Medical Domain. (2014).
 
[5]  Jain J., Kumari A., Somvanshi P., Grover A., Pai S., Sunil S. In silico analysis of natural compounds targeting structural and nonstructural proteins of chikungunya virus. F1000Research, Number 1, Vol. 1, (2017).
 
[6]  National Institutes of Health. BioPortal Ontology. https://bioportal.bioontology.org/ontologies, (2011).
 
[7]  Goodwin TR., Harabagiu SM. Medical Question Answering for Clinical Decision Support. Processing ACM Interantional Conference Information Knowledge Management, Number 1, Vol. 1, Pages = 297- 306, (2016).
 
[8]  Medbravo Barcelona. MedBravo Programming Interview Task. https://stackoverflow.com/jobs, (2015).
 
[9]  Ecog-Acrin Organization. ECOG Performance Status Specifications. http://ecog- acrin.org/resources/ecog-performance-status, (2017).
 
[10]  Zubrod, Charles G. et al. Appraisal of methods for the study of chemotherapy of cancer in man: Comparative therapeutic trial of nitrogen mustard and triethylene thiophosphoramide. Journal of Clinical Epidemiology, Number 1, Vol. 11, Pages = 7-33, (1960).
 
[11]  Karnofsky D., Burchenal J. Evaluation of chemotherapeutic agents: The clinical evaluation of chemotherapeutic agents in cancer. Evaluation of Chemotherapeutic Agents, Number 1, Vol. 11, Pages = 191-205, (1949).
 
[12]  National Institute of Health, ClincalTrial.org. Clinical Trials XML Data Finder. https://clinicaltrials.gov, (2018).
 
[13]  Peus D., Newcomb N., Hofer S. Appraisal of the Karnofsky Performance Status and proposal of a simple algorithmic system for its evaluation. BMC Medical Informatics and Decision Making, Number 1, Vol. 13, Pages = 1-7, (2013).
 
[14]  P. M. Rodda Text Mining: Automatic Retrieval, Annotation and Visualisation of Clinical Trials Text using Ontology. Master thesis. University of Manchester (2010).
 
[15]  Kiritchenko, S., de Bruijn, B., Carini, S., Martin, J., Sim, I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making, Number 10, Vol. 56, (2010).
 
[16]  Millian et al. Eligibility Criteria Text Extraction. (2013).
 
[17]  Cao X., Maloney K., Brusic V. Data mining of cancer vaccine trials, a bird’s eye view. Immunome Research 2008, Number 4, Vol. 7, (2008).
 
[18]  Reynoso-Aguirre P., Rodriguez-Hontoria H., Belanche Mun˜oz Ll. (2018). Natural Language Processing and Machine Learning Techniques to Solve a Breast Can- cer Clinical Trial ECOG-Classification Problem (Master’s Thesis). Retrieved from https:// upcommons.upc.edu/bitstream/handle/2117/118759/131668.pdf.
 
[19]  Anderson P., Thor A., Benik J., Raschid L., Vidal. ME. PAnG: finding patterns in annotation graphs. SIGMOD Conference, (2012).
 
[20]  Cotik V., Rodriguez H., Vivaldi J. Semantic tagging of French medical entities using distant learning. (2015).
 
[21]  Vivaldi J., Rodrguez H. Using Wikipedia for term extraction in the biomedical domain: first experience. In Procesamiento del Lenguaje Natural 45, Number 1, Vol. 1, Pages = 251-254, (2011).
 
[22]  OConnor B. R2 is rescaled mean squared error. (2009).
 
[23]  Hiar J., Ringle C., Sarstedt M. Partial Least Squares Structural Equation Modeling: Rigorous Applica- tions, Better Results and Higher Acceptance. Long Range Planning, Number 1-2, Vol. 46 (2013).
 
[24]  Ruineihart D., Hint. G., Williams R. Learning Internal Representations by Error Propagation. Parallel Distributed Processing: Explorations in the Micro structure of Cognition, Number 1, Vol. 1, Pages = 1-33, (1985).
 
[25]  Raschka, S. Python Machine Learning. Packt Publishing, ISBN: 9781783555130, (2015).
 
[26]  Pedregosa F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Number 1, Vol. 12, Pages = 2825–2830, (2011).
 
[27]  Yetisgen M., Gunn M., Xia F., Payne T. A text processing pipeline to extract recommendations from radiology reports. Journal of Biomedical Informatics, Number 2, Vol. 46, Pages = 354-362, (2013).
 
[28]  Jia Y. Singular Value Decomposition. (2017).
 
[29]  Wold H. Path models with latent variables: The NIPALS approach. Quantitative sociology: International perspectives on mathematical and statistical modeling, Number 1, Vol. 1, Pages = 307-357, (1975).
 
[30]  Landauer T., Foltz P., Laham D. An Introduction to Latent Semantic Analysis. (1998).
 
[31]  Albisua I., Arbelaitz O., Gurrutxaga I., Lasargueren A., Muguerza J., M. Perez J. The quest for the op- timal class distribution: an approach for enhancing the effectiveness of learning via resampling methods for imbalanced data sets 2008, Number 2, Vol. 45, (2013).