首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Human-in-the-loop Rule Learning for Data Integration
  • 本地全文:下载
  • 作者:Ju Fan ; Guoliang Li
  • 期刊名称:Bulletin of the Technical Committee on Data Engineering
  • 出版年度:2018
  • 卷号:41
  • 期号:2
  • 页码:104-115
  • 出版社:IEEE Computer Society
  • 摘要:Rule-based data integration approaches are widely adopted due to its better interpretability and effectiveinteractive debugging. However, it is very challenging to generate high-quality rules for data integrationtasks. Hand-crafted rules from domain experts are usually reliable, but they are not scalable: it is timeand effort consuming to handcraft many rules with large coverage over the data. On the other hand,weak-supervision rules automatically generated from machines, such as distant supervision rules, canlargely cover the items; however, they may be very noisy that provide many wrong results. To addressthe problem, we propose a human-in-the-loop rule learning approach with high coverage and highquality. The approach first generates a set of candidate rules, and proposes a machine-based methodto learn a confidence for each rule using generative adversarial networks. Then, it devises a game-based crowdsourcing framework to refine the rules, and develops a budget-constraint crowdsourcingalgorithm for rule refinement at affordable cost. Finally, it applies the rules to produce high-quality dataintegration results..
国家哲学社会科学文献中心版权所有