期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2018
卷号:18
期号:4
页码:36-43
出版社:International Journal of Computer Science and Network Security
摘要:Outliers are unusual data points which are inconsistent with other observations in a dataset. Outlier detection method has been researched in diverse application domains and recently it has been realized that there is a direct mapping between outliers in data and real world anomalies. The importance of outlier detection is due to the fact that outliers in data sometimes interpret to significant information in a wide variety of application domains (Chandola et al. 2007). Several types of outlier detection methods are developed and a number of surveys and reviews are performed to distinguish their advantages and disadvantages. Outlier detection methods are highly domain oriented therefore an evaluation is needed to find an appropriate one for the intended domain. In this study we evaluate widely used multivariate outlier detection methods namely distance based, statistical based and clustering based for medical datasets. Five benchmark medical datasets of Heart disease, Breast Cancer Pima Indian Diabetes, Liver Disorders and Thyroid Gland are used for experiments. To identify the effectiveness of mentioned outlier detection methods, the above datasets are classified and their total variances are calculated before and after outlier detection. Eight well-known individual and ensemble classifiers are used for data classification. Finally a comparative review is performed to distinguish the advantages and disadvantages of each method and their respective effects on accuracy of classifiers.
关键词:Outlier Detection; Data Mining; Machine Learning; Data Clustering; Pattern Recognition