期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2013
卷号:13
期号:10
页码:119-127
出版社:International Journal of Computer Science and Network Security
摘要:The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent values from both structured and semi structured datasets. Extracting information from semi structured documents is a very hard task, and is going to become more and more critical as the amount of digital information available on the Internet grows. Documents are often so large that the data set returned as answer to a query may be too big to convey required knowledge. Thus an approach to mine the data using Tree-based association rules from XML documents. TAR rules will provide information on both the structure and the content of XML documents. Moreover, they can be stored in XML format to be queried later on. The mined knowledge is approximate, intensional knowledge used to provide the quick, approximate answers to queries and also the information about structural regularities that can be used as data guides for document querying. An association rule is useful for discovering interesting relationship hidden in the datasets. The indexing mechanism improves the speed of data retrieval operations. Thus an approach called Path Based Indexing is used to retrieve the Query in an efficient way.
关键词:XML; approximate query-answering; Data Mining; TAR; Path Based Indexing.