期刊名称:Case Studies in Business, Industry and Government Statistics
印刷版ISSN:2152-372X
出版年度:2007
卷号:1
期号:2
页码:130-138
出版社:Bentley University
摘要:Statistics education reformers have for years called for the use of real data in teaching introductory statistics (Ballman, 1997; Garfield et al., 2004; Hogg, 1991). Instructors now have ready access to cases, textbook problems and other exercises with accompanying well-documented sets of real or realistic data. On-line portals and data libraries provide a huge array of real data sets keyed variously to substantive topics and statistical techniques suitable for introductory students. The vast majority of these real datasets tend to have already been cleaned up by their preparers. As enriching as these resources are, relatively few of them offer students first-hand experience with the essential messiness of “real” real data. There is a good case to be made that data cleaning and preparation belong in introductory courses (Burger & Leopold, 2001). Certainly, problems of missing, dirty, and incomplete data are important topics within the field (Hoyle, 1971; Rubin, 1976; Wagner, 2002). Using field data from the Wright Brothers’ 1904 experiments, this case leads introductory or intermediate stu-dents through a process of data preparation, illustrating five common steps in data preparation and cleaning: standardizing the format of data records, deciding how to treat ambiguously recorded data, conversion of mea-surements to a single standard unit, detecting and resolving issues with outliers, and imputation of missing data.