首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Tales of Huffman: An Exercise in Dealing with Messy Data
  • 本地全文:下载
  • 作者:Robert H. Carver
  • 期刊名称:Case Studies in Business, Industry and Government Statistics
  • 印刷版ISSN:2152-372X
  • 出版年度:2007
  • 卷号:1
  • 期号:2
  • 页码:130-138
  • 出版社:Bentley University
  • 摘要:Statistics education reformers have for years called for the use of real data in teaching introductory statistics (Ballman, 1997; Garfield et al., 2004; Hogg, 1991). Instructors now have ready access to cases, textbook problems and other exercises with accompanying well-documented sets of real or realistic data. On-line portals and data libraries provide a huge array of real data sets keyed variously to substantive topics and statistical techniques suitable for introductory students. The vast majority of these real datasets tend to have already been cleaned up by their preparers. As enriching as these resources are, relatively few of them offer students first-hand experience with the essential messiness of “real” real data. There is a good case to be made that data cleaning and preparation belong in introductory courses (Burger & Leopold, 2001). Certainly, problems of missing, dirty, and incomplete data are important topics within the field (Hoyle, 1971; Rubin, 1976; Wagner, 2002). Using field data from the Wright Brothers’ 1904 experiments, this case leads introductory or intermediate stu-dents through a process of data preparation, illustrating five common steps in data preparation and cleaning: standardizing the format of data records, deciding how to treat ambiguously recorded data, conversion of mea-surements to a single standard unit, detecting and resolving issues with outliers, and imputation of missing data.
  • 关键词:Statistical Reasoning; Missing Data; Messy Data; Wright Brothers
国家哲学社会科学文献中心版权所有