文章基本信息

标题：Discretization of Continuous Valued Dimensions in OLAP Data Cubes
本地全文：下载
作者：Sellappan Palaniappan ; Tan Kim Hong
期刊名称：International Journal of Computer Science and Network Security
印刷版ISSN：1738-7906
出版年度：2008
卷号：8
期号：11
页码：116-126
出版社：International Journal of Computer Science and Network Security
摘要：
Continuous valued dimensions in OLAP data cubes are usually grouped into countable disjoint intervals using na?ve methods such as equal width binning, histogram analysis, or splitting into intervals defined by domain experts according to their understanding of the data. This paper explores an integration of ‘intelligent’ discretization techniques currently available in data mining research into the construction of a SEER breast cancer survivability data cube with continuous dimension. Observational and empirical evaluations on the resulting cube with discretized intervals show that ‘intelligent’ discretization methods provide the same benefits to OLAP data cubes as in data mining algorithms, that is, they are able to simplify the data representation with minimal or no loss of information. Additionally, it was found that an unsupervised discretization method using k-means algorithm had exhibited equivalent performance as the supervised counterparts, namely, the entropy-based (ID3) and χ2?based (CHAID) methods.
关键词：
OLAP, data mining, discretization, entropy, ID3, CHAID, k-means