期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2018
卷号:41
期号:2
页码:23-34
出版社:IEEE Computer Society
摘要:It is widely accepted that the majority of time in any data analysis project is devoted to preparing the data [25].In 2012, noted data science leader DJ Patil put the fraction of time spent on data preparation at 80%, based oninformal discussions in his team at LinkedIn [28]. Analysts we interviewed in an academic study around thesame time put the percent time “munging” or “wrangling” data at “greater than half” [19]. Judging from theseuser stories, the inefficiency of data preparation is the single biggest problem in data analytics..