文章基本信息

标题：A New Incremental Genetic Algorithm Based Classification Model to Mine Data with Concept Drift
本地全文：下载
作者：P.vivekanandan ; Dr. R. Nedunchezhian
期刊名称：Journal of Theoretical and Applied Information Technology
印刷版ISSN：1992-8645
电子版ISSN：1817-3195
出版年度：2010
卷号：21
期号：01
出版社：Journal of Theoretical and Applied
摘要：
In a database the data concepts changes over time and this is called concept drift. Genetic algorithm is widely used for mining classification rules. If the data set is of three or four years old, the mined rules may not reflect the current concept due to concept drift. Applying the incremental genetic (IGA) algorithm in a batch mode can mine accurate rules reflecting the current concept. But applying genetic algorithm without monitoring for a change in an incremental manner repeatedly on arriving data will result in an unnecessary increase in the learning cost. There is also another problem, Due to change in the data distribution some of the rules which are generated may be lost when we apply genetic algorithm in an incremental fashion. In this paper a new incremental genetic algorithm is proposed to rectify the above problems. The New IGA applies the Genetic algorithm iteration step only when required, so that learning cost may be reduced. The new method also keeps track of the rules which are generated earlier and which would have been lost due to change in data distribution. In the proposed method each record of the incoming dataset is monitored. If they are correctly classified they are dropped and misclassified records are added to a window. When the window is full, the genetic algorithm is applied to the records in the window and new rules are generated based only on the misclassified examples and on the examples of new classes. The invalid rules are replaced with the newly generated valid rules. The new method ensures that the next iteration of genetic algorithm is called only when there is a concept drift or when there is a change in the data distribution and sufficient number of records is available. This will reduce the learning cost particularly when there is no concept drift or when there is a slow drift and also ensures that no rule is lost due to change in data distribution.
关键词：Classification; Incremental Genetic Algorithm; Concept Drif