文章基本信息

标题：Empirical Oversampling Threshold Strategy for Machine Learning Performance Optimisation in Insurance Fraud Detection
其他标题：Empirical Oversampling Threshold Strategy for Machine Learning Performance Optimisation in Insurance Fraud Detection
本地全文：下载
作者：Bouzgarne Itri ; Youssfi Mohamed ; Bouattane Omar 等
期刊名称：International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN：2158-107X
电子版ISSN：2156-5570
出版年度：2020
卷号：11
期号：10
DOI：10.14569/IJACSA.2020.0111054
出版社：Science and Information Society (SAI)
摘要：Insurance fraud is one of the most practiced frauds in the sectors of the economy. Faced with increasingly imaginative underwriters to create fraud scenarios and the emergence of organized crime groups, the fraud detection process based on artificial intelligence remains one of the most effective approaches. Real world datasets are usually unbalanced and are mainly composed of "no-fraudulent" class with a very small percentage of "fraudulent" examples to train our model, thus prediction models see their performance severely degraded when the target class appears so poorly represented. Therefore, the present work aims to propose an approach that improves the relevance of the results of the best-known machine learning algorithms and deals with imbalanced classes in classification problems for prediction against insurance fraud. We use one of the most efficient approaches to re-balance training data: SMOTE. We adopted the supervised method applied to automobile claims dataset "carclaims.txt". We compare the results of the different measurements and question the results and relevance of the measurements in the field of study of unbalanced and labeled datasets. This work shows that the SMOTE Method with the KNN Algorithm can achieve better classifier performance in a True Positive Rate than the previous research. The goal of this work is to lead a study of algorithm selections and performance evaluation among different ML classification algorithms, as well as to propose a new approach TH-SMOTE for performance improvement using the SMOTE method by defining the optimum oversampling threshold according to the G-mean measure.
关键词：Machine learning; oversampling; SMOTE; insurance fraud