期刊名称:International Journal of Computer Technology and Applications
电子版ISSN:2229-6093
出版年度:2012
卷号:3
期号:3
页码:1137-1149
出版社:Technopark Publications
摘要:Nowadays, Information retrieval plays an important role in the web. Many researches presented techniques for information retrieval process from databases. The previous work presented extended tree pattern clustering process for XML massive storages. This paper presents a new technique termed semantic data clustering (SDC) technique for combining the Data warehouse and web data for OLAP by retrieving the semantic data from DW (Data Warehouse). Usually, XML technologies are used to store, retrieve, integrate and combine the web data and the applications in Data warehouse. Using semantic data clustering technique, the semantic data repositories are retrieved from DW, which is the devise of multidimensional databases for XML data sources, and the XML additions of OnLine Analytical Processing (OLAP) techniques. SDC will efficiently tackle the information retrieval process in a DW to utilize text-rich document collections. For the XML data sources, the SDC will build the tree pattern for a clustered XML schema to retrieve the massive storage of data for OLAP. SDC uses clustering technique for building tree-pattern framework in order to use massive XML databases to data warehouse for OLAP. We also show the advantages of using semantic data clustering for building the tree-pattern in handling large amounts of XML documents for OLAP in data warehouse. A reliable performance improvement is achieved with the proposed SDC in XML database to data warehouse, compared to an existing ETC technique for XML storages, in terms of building time, query execution time for deriving the semantic data from DW, effectiveness of clustering process
关键词:Data warehouse; Repository; XML; OLAP; Tree pattern; clustering