首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach
  • 本地全文:下载
  • 作者:Victor Olago ; Mazvita Muchengeti ; Elvira Singh
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2020
  • 卷号:11
  • 期号:9
  • 页码:455
  • DOI:10.3390/info11090455
  • 出版社:MDPI Publishing
  • 摘要:We explored various Machine Learning (ML) models to evaluate how each model performs in the task of classifying histopathology reports. We trained, optimized, and performed classification with Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Adaptive Boosting (AB), Decision Trees (DT), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), and Dummy classifier. We started with 60,083 histopathology reports, which reduced to 60,069 after pre-processing. The F1-scores for SVM, SGD KNN, RF, DT, LR, AB, and GNB were 97%, 96%, 96%, 96%, 92%, 96%, 84%, and 88%, respectively, while the misclassification rates were 3.31%, 5.25%, 4.39%, 1.75%, 3.5%, 4.26%, 23.9%, and 19.94%, respectively. The approximate run times were 2 h, 20 min, 40 min, 8 h, 40 min, 10 min, 50 min, and 4 min, respectively. RF had the longest run time but the lowest misclassification rate on the labeled data. Our study demonstrated the possibility of applying ML techniques in the processing of free-text pathology reports for cancer registries for cancer incidence reporting in a Sub-Saharan Africa setting. This is an important consideration for the resource-constrained environments to leverage ML techniques to reduce workloads and improve the timeliness of reporting of cancer statistics.
  • 关键词:machine learning; multi-model supervised machine learning; text mining; text classification; natural language processing; cancer coding; flagging malignant reports machine learning ; multi-model supervised machine learning ; text mining ; text classification ; natural language processing ; cancer coding ; flagging malignant reports
国家哲学社会科学文献中心版权所有