期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2018
卷号:41
期号:2
页码:2-2
出版社:IEEE Computer Society
摘要:Data integration is a long-standing problem over the past few decades. As more than 80% time of a data scienceproject is spent on data integration, it becomes an indispensable part in data analysis. In recent few years,tremendous progress on data integration has been made from systems to algorithms. In this issue, we reviewthe challenges of data integration, survey data integration systems (e.g., Tamr, BigGorilla, Trifacta, PyData),report recent progresses (e.g., explaining data integration, data discovery on open data, big data integrationpipeline for product specifications, integrated querying of table data and S3 data, crowd-based entity resolutionand human-in-the-loop rule learning), and discuss the future of this filed..