期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2017
卷号:95
期号:17
页码:4221
出版社:Journal of Theoretical and Applied
摘要:Without any doubt, XML data model considered the most dominant document type over the web with more than 60% of the total; nevertheless, their quality is not as expected. XML integrity constraint just as its relational counterpart played an important role to keep XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is old-fashioned data dependencies introduced mainly to keep schema consistent rather than data consistent. In this paper, a conditional version of XML inclusion dependencies (XCIND) is proposed for data quality issues and justify the ability to use inclusion dependencies for data quality issues. XCIND Notations will extend XIND and shift its mission from schema design to data quality by providing pattern tableaus. Moreover, a set of minimal XCIND dependencies will be discovered and learned using a set of mining algorithms. Finally, the ability to use XCIND to detect data inconsistencies will be inspected using denial quires between mined rules and XML tree.
关键词:XML; Data Quality; Data Cleaning; Integrity Constraints