期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2014
卷号:18
期号:6
页码:272-275
DOI:10.14445/22312803/IJCTT-V18P157
出版社:Seventh Sense Research Group
摘要:To prepare a normalized data set from relational database for analysis requires significant efforts and it is time consuming task. The main reason is that, in general the database grows with many tables and views that must be joined, aggregated and transformed in order to build the required data set. As result, most of the SQL queries are written independently multiple times and in disorganize manner, which create problems in database evolution and software maintenance. To address this issue, we propose simple methods to generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation, instance or point (possibly varying over time) and every column is associated to a one variable or dimension. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g. pointdimension, observation variable, instancefeature) which is the standard layout required by most data mining algorithms. By providing these standard normalized dataset as an input to the Decision tree generation algorithm for generating Decision tree, similarly we can generate extended ER model.
关键词:Data mining; Transformation; Aggregation; Data preparation; Pivoting