期刊名称:International Journal of Data Mining & Knowledge Management Process
印刷版ISSN:2231-007X
电子版ISSN:2230-9608
出版年度:2014
卷号:4
期号:6
页码:15
DOI:10.5121/ijdkp.2014.4602
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:Recently, stream data mining applications has drawn vital attention from several research communities.Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machinelearning area has been developing learning algorithms that have certain assumptions on underlyingdistribution of data such as data should have predetermined distribution. Such constraints on the problemdomain lead the way for development of smart learning algorithms performance is theoretically verifiable.Real-word situations are different than this restricted model. Applications usually suffers from problemssuch as unbalanced data distribution. Additionally, data picked from non-stationary environments are alsousual in real world applications, resulting in the “concept drift” which is related with data streamexamples. These issues have been separately addressed by the researchers, also, it is observed that jointproblem of class imbalance and concept drift has got relatively little research. If the final objective ofclever machine learning techniques is to be able to address a broad spectrum of real world applications,then the necessity for a universal framework for learning from and tailoring (adapting) to, environmentwhere drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.In this paper, we first present an overview of issues that are observed in stream data mining scenarios,followed by a complete review of recent research in dealing with each of the issue.
关键词:Incremental learning; class imbalance; concept class; concept drift; missing features