文章基本信息

标题：The base testing activities proposal.
作者：Tanuska, Pavol ; Moravcik, Oliver ; Vazan, Pavel 等
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2009
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Building upon the definition 1 states our understanding of a test suite T that can be used to assess the quality of an application under test. We use [[DELTA].sub.f] to denote the externally visible state of the application under test. Informally, [[DELTA].sub.f] can be viewed as a set of pairs where the first value of each pair is a variable name and the second value of each pair is a value for the variable name. Eq. 1 formally defines [[DELTA].sub.f], the externally visible state after the execution of [T.sub.f]. In this equation, we use [var.sub.[DELTA]] and [val.sub.[DELTA]] to denote a variable name and a variable value in an external test state, respectively. Furthermore, we use [U.sub.[DELTA]] and VA to respectively denote the universe of valid variable names and variable values for externally visible test states. Finally, we require value ([var.sub.[DELTA]], f) to be a function that maps a variable name to the value for the variable name in a specified [[DELTA].sub.f]. An external test state [[DELTA].sub.f] would contain the global variable values within the program under test, and any variable values that are made accessible by live object instances. (Kapfhammer, 2004)
关键词：Data warehousing

The base testing activities proposal.

Tanuska, Pavol ; Moravcik, Oliver ; Vazan, Pavel 等

1. INTRODUCTION

Building upon the definition 1 states our understanding of a test suite T that can be used to assess the quality of an application under test. We use [[DELTA].sub.f] to denote the externally visible state of the application under test. Informally, [[DELTA].sub.f] can be viewed as a set of pairs where the first value of each pair is a variable name and the second value of each pair is a value for the variable name. Eq. 1 formally defines [[DELTA].sub.f], the externally visible state after the execution of [T.sub.f]. In this equation, we use [var.sub.[DELTA]] and [val.sub.[DELTA]] to denote a variable name and a variable value in an external test state, respectively. Furthermore, we use [U.sub.[DELTA]] and VA to respectively denote the universe of valid variable names and variable values for externally visible test states. Finally, we require value ([var.sub.[DELTA]], f) to be a function that maps a variable name to the value for the variable name in a specified [[DELTA].sub.f]. An external test state [[DELTA].sub.f] would contain the global variable values within the program under test, and any variable values that are made accessible by live object instances. (Kapfhammer, 2004)

Definition 1. A test suite T is a triple <[[DELTA].sub.0], <[T.sub.1] ,..., [T.sub.e]>, <[[DELTA].sub.1] ,..., [[DELTA].sub.e]>, consisting of an initial external test state, [[DELTA].sub.0], a test case sequence <[T.sub.1] ,..., [T.sub.e]> for state [[DELTA].sub.0], and expected external test states <[[DELTA].sub.1] ,..., [[DELTA].sub.e]> where [[DELTA].sub.f] = T f([[DELTA].sub.f-1] for f = 1, ..., e. (Kapfhammer, 2004)

[[DELTA].sub.f] = {([var.sub.[DELTA]], [val.sub.[DELTA]]) [member of] [U.sub.[DELTA]] x [V.sub.[DELTA]] | value ([var.sub.[DELTA]], f) = [val.sub.[DELTA]]} (1)

Definition 2 notes that a specific test [T.sub.f] [member of] ([T.sub.1] ,..., [T.sub.e]) can be viewed as a sequence of test operations that cause the application under test to enter into states that are only visible to [T.sub.f]. We used [[delta].sub.h] to denote the internal test state that is created after the execution of Tf's test case operation [o.sub.h]. Intuitively, [[delta].sub.h] can also be viewed as a set of pairs where the first value is a variable name and the second value is a value for the variable name. Eq. 2 formally defines [[delta].sub.h] in similar fashion to the definition of [[DELTA].sub.f] in eq. 1. An internal test state [[delta].sub.h] would contain the expected and actual values for the test operation [o.sub.h], the return value from the program method under test, and the values of any temporary testing variables.

Definition 2. A test case [T.sub.f] [member of] <[T.sub.1] ,..., [T.sub.e]>, is a triple <[[delta].sub.o], <[o.sub.1] ,..., [o.sub.g]>, <[[delta].sub.1], ..., [[delta].sub.g]>>, consisting of an initial internal test state, <[[delta].sub.0], a test operation sequence <[o.sub.1] ,..., [o.sub.g]> for state [[delta].sub.0], and expected internal test states <[[delta].sub.1] ,..., [[delta].sub.g]> where [[delta].sub.h] = [o.sub.h] ([[delta].sub.h-1]) for h = 1 ,..., g. (Kapfhammer, 2004)

[[delta].sub.h] = {([var.sub.[delta]], [val.sub.[delta]]) [U.sub.[delta]] x [V.sub.[delta]] | value ([var.sub.[delta]], h) = [val.sub.[delta]]} (2)

In Definition 3, we describe a restricted type of test suite where each test case returns the application under test back to the initial state, [[DELTA].sub.0], before it terminates. If a test suite T is not independent, we do not place any restrictions upon the <[[DELTA].sub.1] ,..., [[DELTA].sub.e]> produced by the test cases and we simply refer to it as a non-restricted test suite.

Definition 3. A test suite T is independent if and only if for all [gamma] [member of] {1,...,e}, [[DELTA].sub.[gamma]] = [A.sub.0]. (Kapfhammer, 2004).

2. THE BASE TESTING ACTIVITIES PROPOSAL

There are many testing methods and approaches available, yet individual standards do not go into details, providing thus a space for various desinformation. (Cooper & Arbuckle, 2002). As for the data warehouse design, it should be stated that the existing standards and guidelines do not cover the activities relevant for building a data warehouse (Inmon, 2002). There is no particular procedure or activity related with the process of the multidimensional database design, ETT process or optimization of scripts for OLAP reports. The following part therefore deals with the proposal of basic datawarehouse testing activities as a final part of datawarehouse testing methodology. The testing activities that must be implemented in the process of the database testing can be split into four logical units regarding the multidimensional database testing, data pump testing, metadata and OLAP.

When splitting them further, we get the following base activities:

Revision of the multidimensional database scheme

To achieve the best efficiency of SQL statements possible, it is necessary to keep to the following rules:

--make sure that the columns of each hierarchy level (fact table) are NOT NULL and that hierarchical integrity is kept to,

--columns of the hierarchy level cannot be associated with more than one dimension,

--structure of the columns in the dimension table should be in demoralized shape,

--hierarchy levels cannot be mutually interconnected, recourse must not occur.

Testing the batch processing response

The testing for the efficiency of a batch processing should be implemented by simulating the batch in the system loaded with real data with real infrastructure and operating at the same time as a real system.

Optimization of number of fact tables

In designing a multidimensional database, it is very important to decide if to implement a data warehouse as a whole or to start with the implementation of a smaller data warehouse and data marts. Regarding the efficiency and the speed of response of the whole system, it is more favorable to propose a smaller number of fact tables corresponding to a data mart or a smaller data warehouse. The current trend in building the data warehouses is to design a smaller compact data warehouse and several satellite data marts.

Problem of data explosion

The basic problem is that the size of a database is not equal to the amount of information stored in it.

A database explosion is primarily due to high data sparsity and the high number of derived members and aggregated dimensions in the consolidation hierarchies. This happens with the design which does not consider a higher number of fact and dimensional tables. The problem can be eliminated by designing several smaller compact data marts.

User-Triggered vs. System triggered

Most of the production system testing is the processing of individual transactions, which are driven by some input from the users (Application Form, Servicing Request.). There are very few test cycles, which cover the system-triggered scenarios (Like billing, Valuation.) In data Warehouse, most of the testing is system triggered as per the scripts for ETT, the view refresh scripts etc. (BiPM, 2009).

Testing the time shift (up to date)

It is necessary to pay attention to the possible time shift in processing the data from data warehouse to OLAP server.

Testing ETT processes

The testing of these processes (sometimes called data pump) represents a very important step in the process of building a data warehouse. It is the most complex and demanding part in building a data warehouse. Testing itself comprises the testing of each script, program and modules, modules' integrity, as well as consistency of the data being transformed into the data warehouse (Elias & Stremy, 2008). The testing stage of ETT (Extraction, Transformation and Transport) processes involves also the following activities:

--Testing for correctness of aggregation and summation of data In this testing stage, it is necessary to check the correctness of forming the data aggregation. After the reverse transformation, all the aggregated data being filled into the data warehouse should regain their original values.

--Check for reversibility of data from data warehouse into OLTP systems. This kind of test is closely connected with the previous activity. It is a control process of reverse transformation of the aggregated and summarised data into the operational databases.

--Check of distributed processing

With distributed operational systems, it is necessary to test the data for the condition of recency, i.e. if the replic in question will be transformed into the warehouse in the correct time horizon.

--Testing data types and metric units

It is the process of testing that must be implemented in the stage of the ETT process design.

--Testing of relationship

In the transformation of heterogeneous files into a data warehouse, it is necessary to check the correctness of the design of relationships which were not carried out (e.g. in DBF and XLS files transformations).

--Revision of converted data

Once stored in a system, the data must be checked for multidimensionality, such as consistency of data types, number of excluded lines, the reason for exclusion, as well as any logical errors in the process, which might result into the logical nonconsistency of data.

Testing the on-line time response

The testing should take place via initiating the pre-defined demands simulating the daily-anticipated efficiency of a data warehouse, while testing the time response.

Testing the data back up and recovery

Prior to starting a warehouse, the strategy of data back-up and recovery must be configurated. These tests must be executed with the volume of data equal to the one in the real system, in order to examine possible effects in the process of the full data recovery.

Possible number of testing scenarios

If a transaction system has hundred different scenarios, the valid and possible combination of those scenarios will not be unlimited. However, in case of Data Warehouse, the combinations can possibly test is virtually unlimited due to the core objective of Data Warehouse is to allow all possible views of Data. In other words, 'there is no possibility fully to test a data Warehouse' (BiPM, 2009).

Metadata administration

It is necessary to correctly define the metadata in the beginning of building a data warehouse, and administer them in the course of running the data warehouse. Metadata enable us to exactly identify the errors which may occur in the process of using a warehouse in practice, as well as to determine when individual processes such as ETT can be started. Prior to testing, all the related documents, approved test specifications and testing procedures must be available.

Testing time consistency

One of the most important factors with data warehouses is time consistency allowing the user to acquire real responses to their SQL statements. It is necessary therefore to decide on the right granularity closely related with the time consistency.

3. CONCLUSION

The testing phase as one of the stages of DW development lifecycle is very important, since the cost depleted for the elimination of a potential error or defect in a running data warehouse is much higher. The aim of this article has been to suggest basic datawarehouse testing activities as a final part of datawarehouse testing methodology.

This contribution as a part of the project No. 1/4078/07 was supported by VEGA (Vedecka a Edukacna Grantova Agentura), the Slovak Republic Ministry of Education's grant agency.

4. REFERENCES

Cooper, R. & Arbuckle, S. (2002). How to Thoroughly Test a Data Warehouse. Proceedings of the 10th International Conference on Software testing. Florida USA, May 13-17, STAREAST, Orlando

Elias, A. & Stremy, M. (2008). Usage of communication dispatcher for virtual devices. Proceedings of the 8th International Scientific-Technical Conference. Kouty, Czech Republic. ISBN 978-80-7395-077-4

Inmon, W.H. (2002). Building the Data Warehouse. John Willey and Sons. ISBN 0-471-08130-2, London

Kapfhammer G. M. (2004). The Computer Science Handbook. Available from: http://www.mendeley.com. Accessed: 2009-02-10.

*** BiPM (2009). Building Intelligent and Performing Enterprises. Available from: http://www.bipminstitute.com/data-warehouse. Accessed: 2009-03-18.