摘要:In recent times, the mining of association rules from XML databases has received attention because of its wide applicability and flexibility. Many mining methods have been proposed. Because of the inherent flexibility of the structures and the semantics of the documents, however, these methods are challenging to use. In order to accomplish the mining, an XML document must first be converted into a relational dataset, and an index table with node encoding is created to extract transactions and interesting items. In this paper, we propose a new method to mine association rules from XML documents using a new type of node encoding scheme that employs a Unique Identifier (UID) to extract the important items. The node scheme modified with UID encoding speeds up the mining process. A significance measure is used to identify the important rules found in the XML database. Finally, the mining procedure calculates the confidence that the identified rules are indeed meaningful. Experiments are conducted using XML databases available in the XML data repository. The results illustrate that the proposed method is efficient in terms of computation time and memory usage.
关键词:XML mining; XML documents; Semantics; Node encoding; Rule mining