文章基本信息

标题：A New Approach for Detecting Concept Drift and Measuring its Intensity in Large Datasets
本地全文：下载
作者：Hisham Ogbah ; Abdallah Alashqur
期刊名称：International Journal of Computer Science and Network Security
印刷版ISSN：1738-7906
出版年度：2016
卷号：16
期号：12
页码：109-116
出版社：International Journal of Computer Science and Network Security
摘要：The importance of data mining in general and classification in particular has increased in recent years due to the overwhelming amount of digital data that is produced world-wide on a daily basis. In classification, data tuples are mapped to a limited number of classes. The classifier learns (or derives) a classification model from a pre-classified dataset. The learned classification model can be represented in different forms such as a decision tree, set of rules, or support vector machines, to name a few. After the classifier completes the learning phase, it can predict the class of newly added data based on the model that it learned. Quite often a concept drift may occur due to changes in the environment, style, trend, or for many other reasons. Data that used to map to, say, class_a before the drift, now maps to class_b. But based on the knowledge embodied in the model, the system will still wrongfully predict class_a for the same data. This difference between what the model would predict and the actual classification is a sign that a concept drift has occurred and the classification model has become obsolete. In this case, a new model needs to be generated. In this paper we introduce a new efficient algorithm for detecting the occurrence of a concept drift and introduce a way of measuring the intensity of the drift. Measuring the intensity of the drift is important because it impacts how we may choose to deal with it going forward.
关键词：Classification; Concept Drift; Drift detection; Big Data