期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2018
卷号:41
期号:2
页码:35-46
出版社:IEEE Computer Society
摘要:We argue that the data integration (DI) community should devote far more effort to building systems,in order to truly advance the field. We discuss the limitations of current DI systems, and point out thatthere is already an existing popular DI “system” out there, which is PyData, the open-source ecosystemof 138,000+ interoperable Python packages. We argue that rather than building isolated monolithic DIsystems, we should consider extending this PyData “system”, by developing more Python packages thatsolve DI problems for the users of PyData. We discuss how extending PyData enables us to pursue anintegrated agenda of research, system development, education, and outreach in DI, which in turn canposition our community to become a key player in data science. Finally, we discuss ongoing work atWisconsin, which suggests that this agenda is highly promising and raises many interesting challenges..