期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2018
卷号:41
期号:2
页码:91-103
出版社:IEEE Computer Society
摘要:Entity resolution (ER) seeks to identify which records in a data set refer to the same real-world entity.Given the diversity of ways in which entities can be represented, ER is a challenging task for automatedstrategies, but relatively easier for expert humans. We abstract the knowledge of experts with the notionof a boolean oracle, that can answer questions of the form “do records u and v refer to the same entity?”,and formally address the problem of maximizing progressive recall and F-measure in an online setting..