摘要:AbstractData analysis has become an everyday business and advancements of data management routines open up new opportunities. Nevertheless, transforming and assembling newly acquired data into a suitable form remains tedious. It is often stated, that data cleaning is a critical part of the overall process, but also consumes sublime amounts of time and resources. Data Wrangling is not only about transforming and cleaning procedures. Many other aspects like data quality, merging of different sources, reproducible processes, and managing data provenance have to be considered. Although various tools designed for specific tasks are available, software solutions accompanying the whole process are still rare.In this paper, some aspects of this first phase of most data driven projects, also known as data wrangling, data munging or janitorial work are described. Beginning with an overview on the topic and current problems, concrete common tasks as well as selected software solutions and techniques are discussed.
关键词:KeywordsData acquisitionDatabasesBad data identificationData wrangling