期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN:2347-6710
电子版ISSN:2319-8753
出版年度:2014
期号:ICETS
页码:473
出版社:S&S Publications
摘要:Data stream classification poses manychallenges to the data mining community. In thispaper, we address four such major challenges,namely, infinite length, concept-drift, conceptevolution,and feature-evolution. Since a data streamis theoretically infinite in length, it is impractical tostore and use all the historical data for training.Concept-drift is a common phenomenon in datastreams, which occurs as a result of changes in theunderlying concepts. Concept-evolution occurs as aresult of new classes evolving in the stream. Featureevolutionis a frequently occurring process in manystreams, such as text streams, in which new features(i.e., words or phrases) appear as the streamprogresses. Most existing data stream classificationtechniques address only the first two challenges, andignore the latter two. In this paper, we propose anensemble classification framework, where eachclassifier is equipped with a novel class detector, toaddress concept-drift and concept-evolution. Toaddress feature-evolution, we propose a feature sethomogenization technique. We also enhance the novelclass detection module by making it more adaptive tothe evolving. Stream and enabling it to detect morethan one novel class at a time. Comparison with stateof-the-art data stream classification techniquesestablishes the effectiveness of the proposedapproach.