期刊名称:International Journal of Computer Science & Technology
印刷版ISSN:2229-4333
电子版ISSN:0976-8491
出版年度:2012
卷号:3
期号:2
页码:1118-1122
语种:English
出版社:Ayushmaan Technologies
摘要:Poor data quality has been a major problem in many organizations. Erroneous and inconsistent data has costed US business hundreds of billions of dollars because of poor business decisions resulting from the poor data quality [1]. Recently, Conditional Functional Dependencies (CFDs) have shown great potential for detecting and repairing inconsistent data in relational data sets. In this paper, we have studied the problem of discovering the minimal set of constant CFDs that hold in some given data. As in previous work, we take advantage of the observations that constant CFDs essentially are 100% confidence association rules, and that the minimal set of CFDs can be produced from the set of minimal generators and their closures. We proposed new pruning criteria to further reduce the search space, removing unnecessary generators and closures. We designed an efficient algorithm based on the new pruning criteria and we evaluated it on real data sets. According to the results, the proposed algorithm is faster than the currently most efficient constant CFD discovery algorithm. We also showed how chi square can be used to measure the interestingness of CFDs.