首页    期刊浏览 2025年02月22日 星期六
登录注册

文章基本信息

  • 标题:Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R
  • 本地全文:下载
  • 作者:Michael Hahsler ; Matthew Bolaños ; John Forrest
  • 期刊名称:Journal of Statistical Software
  • 印刷版ISSN:1548-7660
  • 电子版ISSN:1548-7660
  • 出版年度:2017
  • 卷号:76
  • 期号:1
  • 页码:1-50
  • 语种:English
  • 出版社:University of California, Los Angeles
  • 摘要:In recent years, data streams have become an increasingly important area of research for the computer science, database and statistics communities. Data streams are ordered and potentially unbounded sequences of data points created by a typically non-stationary data generating process. Common data mining tasks associated with data streams include clustering, classification and frequent pattern mining. New algorithms for these types of data are proposed regularly and it is important to evaluate them thoroughly under standardized conditions. In this paper we introduce stream, a research tool that includes modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. In addition to data handling, plotting and easy scripting capabilities, R also provides many existing algorithms and enables users to interface code written in many programming languages popular among data mining researchers (e.g., C/C++, Java and Python). In this paper we describe the architecture of stream and focus on its use for data stream clustering research. stream was implemented with extensibility in mind and will be extended in the future to cover additional data stream mining tasks like classification and frequent pattern mining.
  • 关键词:data streams;data mining;clustering
国家哲学社会科学文献中心版权所有