期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2018
卷号:41
期号:4
页码:39-45
出版社:IEEE Computer Society
摘要:Machine learning development creates multiple new challenges that are not present in a traditionalsoftware development lifecycle. These include keeping track of the myriad inputs to an ML application(e.g., data versions, code and tuning parameters), reproducing results, and production deployment. Inthis paper, we summarize these challenges from our experience with Databricks customers, and describeMLflow, an open source platform we recently launched to streamline the machine learning lifecycle.MLflow covers three key challenges: experimentation, reproducibility, and model deployment, usinggeneric APIs that work with any ML library, algorithm and programming language. The project has arapidly growing open source community, with over 50 contributors since its launch in June 2018..