期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2017
卷号:40
期号:3
页码:42
出版社:IEEE Computer Society
摘要:Data management and machine learning are two important tasks in data science. However, they havebeen independently studied so far. We argue that they should be complementary to each other. On theone hand, machine learning requires data management techniques to extract, integrate, clean the data,to support scalable and usable machine learning, making it user-friendly and easily deployable. Onthe other hand, data management relies on machine learning techniques to curate data and improve itsquality. This requires database systems to treat machine learning algorithms as their basic operators,or at the very least, optimizable stored procedures. It poses new challenges as machine learning taskstend be iterative and recursive in nature, and some models have to be tweaked and retrained. This callsfor a reexamination of database design to make it machine learning friendly.In this position paper, we present a preliminary design of a graph model for supporting both datamanagement and usable machine learning. To make machine learning usable, we provide a declarativequery language, that extends SQL to support data management and machine learning operators,and provide visualization tools. To optimize data management procedures, we devise graph optimizationtechniques to support a finer-grained optimization than traditional tree-based optimization model.We also present a workflow to support machine learning (ML) as a service to facilitate model reuseand implementation, making it more usable and discuss emerging research challenges in unifying datamanagement and machine learning.