期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2016
卷号:39
期号:2
页码:21
出版社:IEEE Computer Society
摘要:Completeness and consistency are two important dimensions for the quality of data, in particular rela-tional data. This is true because most data sets found in practice are both incomplete and inconsistent.The simplest yet arguably most important integrity constraint are keys. Recently, certain keys were in-troduced for incomplete relations. Certain keys can efficiently manage the integrity of entities while stillpermitting incompleteness in columns of the key. It is therefore an important task to discover the set ofcertain keys that hold in a given incomplete relation. However, if the given incomplete relation is alsoinconsistent with respect to some meaningful certain keys, algorithms that discover keys cannot succeed.As meaningful keys are likely to have a small number of violations, we propose an algorithm that dis-covers certain keys that do not exceed a given number of violations. We illustrate the effectiveness andefficiency of our algorithm in discovering meaningful certain keys from publicly available data sets.