首页    期刊浏览 2025年02月26日 星期三
登录注册

文章基本信息

  • 标题:Exploring What not to Clean in Urban Data: A Study Using New York City Taxi Trips
  • 本地全文:下载
  • 作者:Juliana Freire ; Aline Bessa ; Fernando Chirigati
  • 期刊名称:Bulletin of the Technical Committee on Data Engineering
  • 出版年度:2016
  • 卷号:39
  • 期号:2
  • 页码:63
  • 出版社:IEEE Computer Society
  • 摘要:Traditionally, data cleaning has been performed as a pre-processing task: after all data are selectedfor a study (or application), they are cleaned and loaded into a database or data warehouse. In this pa-per, we argue that data cleaning should be an integral part of data exploration. Especially for complex,spatio-temporal data, it is only by exploring a dataset that one can discover which constraints should bechecked. In addition, in many instances, seemingly erroneous data may actually reflect interesting fea-tures. Distinguishing a feature from a data quality issue requires detailed analyses which often includesbringing in new datasets. We present a series of case studies using the NYC taxi data that illustrate datacleaning challenges that arise for spatial-temporal urban data and suggest methodologies to addressthese challenges.
国家哲学社会科学文献中心版权所有