Continuous valued dimensions in OLAP data cubes are usually grouped into countable disjoint intervals using na?ve methods such as equal width binning, histogram analysis, or splitting into intervals defined by domain experts according to their understanding of the data. This paper explores an integration of ‘intelligent’ discretization techniques currently available in data mining research into the construction of a SEER breast cancer survivability data cube with continuous dimension. Observational and empirical evaluations on the resulting cube with discretized intervals show that ‘intelligent’ discretization methods provide the same benefits to OLAP data cubes as in data mining algorithms, that is, they are able to simplify the data representation with minimal or no loss of information. Additionally, it was found that an unsupervised discretization method using k-means algorithm had exhibited equivalent performance as the supervised counterparts, namely, the entropy-based (ID3) and χ2?based (CHAID) methods.
OLAP, data mining, discretization, entropy, ID3, CHAID, k-means