期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2019
卷号:97
期号:11
页码:2942-2956
出版社:Journal of Theoretical and Applied
摘要:The integration of data from various sources is an important step to establish a data warehouse in order to form a decision support application. The problem is how to find and integrate optimally the various data from distributed heterogeneous database sources. The heterogeneity of data sources has a number of factors, including storing databases in various formats, using different software and hardware for database storage systems, designing in different data semantic models. There are currently two approaches to data integration: Global as View (GAV) and Local as View (LAV), but both have performance limitations and need to find ways to optimize them. Some of the key factors to be considered in making data integration optimal are query response time and understanding of structure of the data source (source schema). Query response time plays an important role as timely access to information and it is the basic requirement of successful business application. A data warehouse uses multiple materialized views (MV) to efficiently process a given set of queries. Query process requires important attention especially in source schema, because the results of cost-based query processes (access costs and stored costs) are influenced by the involvement of the number of attributes and sites visited. This paper gives the results of proposed minimize attribute involvement based MV selection algorithm for query processing. First, select MV by clustering the workload of the query. A query is decomposed into a sub-query that requires operations on a separate database and can determine the exact order of site access. From the query process query sequence, the operating costs for the query process will be minimal. When a query process in a distributed database occurs, query operations will look for data from various attributes in a scattered database table, whereas query processes often do not require all the attributes of the tables. Therefore, to optimize the query requires minimum operating cost requests (access costs and stored costs) by separating the use of unnecessary attributes. Second, a join index that is specifically adapted to the multidimensional architecture of warehouses. It eliminates join operations while preserving the information contained in the original warehouse. This approach can also to minimize the cost of the request in addition to separating attributes that are not required by the request, thereby reducing the amount of time store and access. In the separation of attributes, attributes are shared indiscriminately, because otherwise they will result in greater access fees and ultimately reduce the performance of the query process. To perform such attribute separation can be done by Vertical Fragmentation method. To validate this study, we measured response times from a set of decision support Queries through DBD data warehouse, with and without using our optimization techniques. Our experimental results show their efficiency, even when queries are complex and the data is relatively large.
关键词:Global as View; Local as View; Materialized View; Access Costs ; Stored Costs; Data Warehouse