期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2007
卷号:7
期号:8
页码:226-231
出版社:International Journal of Computer Science and Network Security
摘要:Emerging technologies of semi-structured data have attracted wide attention of networks, e-commerce, information retrieval and databases. In these applications, the data are modeled not as static collections but as transient data streams, where the data source is an unbounded stream of individual data items. It is becoming increasingly popular to send heterogeneous and ill-structured data through networks. Since traditional database technologies are not directly applicable to such data streams, it is important to study efficient information extraction methods for semi-structured data. Hence there has been increasing demand for automatic methods for extracting useful information, particularly, for discovering rules or patterns from large collection of semi-structured data, namely, semi-structured data mining. In this survey paper we begin by reviewing popular data mining techniques like association rules, clustering and prediction for semi-structured data. We provide a brief description of each technique as well as efficient algorithms for implementing the technique. Then we talk about the applications of semi-structured data. Finally, we conclude by listing research challenges that need to be addressed in the area of semi-structured data mining.
关键词:semi-structured data mining; association; clustering; prediction; graph based data structure