首页    期刊浏览 2024年11月07日 星期四
登录注册

文章基本信息

  • 标题:Discretization of Continuous Valued Dimensions in OLAP Data Cubes
  • 本地全文:下载
  • 作者:Sellappan Palaniappan ; Tan Kim Hong
  • 期刊名称:International Journal of Computer Science and Network Security
  • 印刷版ISSN:1738-7906
  • 出版年度:2008
  • 卷号:8
  • 期号:11
  • 页码:116-126
  • 出版社:International Journal of Computer Science and Network Security
  • 摘要:

    Continuous valued dimensions in OLAP data cubes are usually grouped into countable disjoint intervals using na?ve methods such as equal width binning, histogram analysis, or splitting into intervals defined by domain experts according to their understanding of the data. This paper explores an integration of ‘intelligent’ discretization techniques currently available in data mining research into the construction of a SEER breast cancer survivability data cube with continuous dimension. Observational and empirical evaluations on the resulting cube with discretized intervals show that ‘intelligent’ discretization methods provide the same benefits to OLAP data cubes as in data mining algorithms, that is, they are able to simplify the data representation with minimal or no loss of information. Additionally, it was found that an unsupervised discretization method using k-means algorithm had exhibited equivalent performance as the supervised counterparts, namely, the entropy-based (ID3) and χ2?based (CHAID) methods.

  • 关键词:

    OLAP, data mining, discretization, entropy, ID3, CHAID, k-means

国家哲学社会科学文献中心版权所有