文章基本信息

标题：Cardiotocography Data Analysis to Predict Fetal Health Risks with Tree-Based Ensemble Learning
本地全文：下载
作者：Pankaj Bhowmik ; Pulak Chandra Bhowmik ; U. A. Md. Ehsan Ali 等
期刊名称：International Journal of Information Technology and Computer Science
印刷版ISSN：2074-9007
电子版ISSN：2074-9015
出版年度：2021
卷号：13
期号：5
DOI：10.5815/ijitcs.2021.05.03
语种：English
出版社：MECS Publisher
摘要：A sizeable number of women face difficulties during pregnancy, which eventually can lead the fetus towards serious health problems. However, early detection of these risks can save both the invaluable life of infants and mothers. Cardiotocography (CTG) data provides sophisticated information by monitoring the heart rate signal of the fetus, is used to predict the potential risks of fetal wellbeing and for making clinical conclusions. This paper proposed to analyze the antepartum CTG data (available on UCI Machine Learning Repository) and develop an efficient tree-based ensemble learning (EL) classifier model to predict fetal health status. In this study, EL considers the Stacking approach, and a concise overview of this approach is discussed and developed accordingly. The study also endeavors to apply distinct machine learning algorithmic techniques on the CTG dataset and determine their performances. The Stacking EL technique, in this paper, involves four tree-based machine learning algorithms, namely, Random Forest classifier, Decision Tree classifier, Extra Trees classifier, and Deep Forest classifier as base learners. The CTG dataset contains 21 features, but only 10 most important features are selected from the dataset with the Chi-square method for this experiment, and then the features are normalized with Min-Max scaling. Following that, Grid Search is applied for tuning the hyperparameters of the base algorithms. Subsequently, 10-folds cross validation is performed to select the meta learner of the EL classifier model. However, a comparative model assessment is made between the individual base learning algorithms and the EL classifier model; and the finding depicts EL classifiers’ superiority in fetal health risks prediction with securing the accuracy of about 96.05%. Eventually, this study concludes that the Stacking EL approach can be a substantial paradigm in machine learning studies to improve models’ accuracy and reduce the error rate.
关键词：Ensemble Learning;Stacking;Cardiotocography;Hyperparameter Tuning;Feature Selection;Cross Validation;Random Forest classifier