The base testing activities proposal.
Tanuska, Pavol ; Moravcik, Oliver ; Vazan, Pavel 等
1. INTRODUCTION
Building upon the definition 1 states our understanding of a test
suite T that can be used to assess the quality of an application under
test. We use [[DELTA].sub.f] to denote the externally visible state of
the application under test. Informally, [[DELTA].sub.f] can be viewed as
a set of pairs where the first value of each pair is a variable name and
the second value of each pair is a value for the variable name. Eq. 1
formally defines [[DELTA].sub.f], the externally visible state after the
execution of [T.sub.f]. In this equation, we use [var.sub.[DELTA]] and
[val.sub.[DELTA]] to denote a variable name and a variable value in an
external test state, respectively. Furthermore, we use [U.sub.[DELTA]]
and VA to respectively denote the universe of valid variable names and
variable values for externally visible test states. Finally, we require
value ([var.sub.[DELTA]], f) to be a function that maps a variable name
to the value for the variable name in a specified [[DELTA].sub.f]. An
external test state [[DELTA].sub.f] would contain the global variable
values within the program under test, and any variable values that are
made accessible by live object instances. (Kapfhammer, 2004)
Definition 1. A test suite T is a triple <[[DELTA].sub.0],
<[T.sub.1] ,..., [T.sub.e]>, <[[DELTA].sub.1] ,...,
[[DELTA].sub.e]>, consisting of an initial external test state,
[[DELTA].sub.0], a test case sequence <[T.sub.1] ,..., [T.sub.e]>
for state [[DELTA].sub.0], and expected external test states
<[[DELTA].sub.1] ,..., [[DELTA].sub.e]> where [[DELTA].sub.f] = T
f([[DELTA].sub.f-1] for f = 1, ..., e. (Kapfhammer, 2004)
[[DELTA].sub.f] = {([var.sub.[DELTA]], [val.sub.[DELTA]]) [member
of] [U.sub.[DELTA]] x [V.sub.[DELTA]] | value ([var.sub.[DELTA]], f) =
[val.sub.[DELTA]]} (1)
Definition 2 notes that a specific test [T.sub.f] [member of]
([T.sub.1] ,..., [T.sub.e]) can be viewed as a sequence of test
operations that cause the application under test to enter into states
that are only visible to [T.sub.f]. We used [[delta].sub.h] to denote
the internal test state that is created after the execution of Tf's
test case operation [o.sub.h]. Intuitively, [[delta].sub.h] can also be
viewed as a set of pairs where the first value is a variable name and
the second value is a value for the variable name. Eq. 2 formally
defines [[delta].sub.h] in similar fashion to the definition of
[[DELTA].sub.f] in eq. 1. An internal test state [[delta].sub.h] would
contain the expected and actual values for the test operation [o.sub.h],
the return value from the program method under test, and the values of
any temporary testing variables.
Definition 2. A test case [T.sub.f] [member of] <[T.sub.1] ,...,
[T.sub.e]>, is a triple <[[delta].sub.o], <[o.sub.1] ,...,
[o.sub.g]>, <[[delta].sub.1], ..., [[delta].sub.g]>>,
consisting of an initial internal test state, <[[delta].sub.0], a
test operation sequence <[o.sub.1] ,..., [o.sub.g]> for state
[[delta].sub.0], and expected internal test states <[[delta].sub.1]
,..., [[delta].sub.g]> where [[delta].sub.h] = [o.sub.h]
([[delta].sub.h-1]) for h = 1 ,..., g. (Kapfhammer, 2004)
[[delta].sub.h] = {([var.sub.[delta]], [val.sub.[delta]])
[U.sub.[delta]] x [V.sub.[delta]] | value ([var.sub.[delta]], h) =
[val.sub.[delta]]} (2)
In Definition 3, we describe a restricted type of test suite where
each test case returns the application under test back to the initial
state, [[DELTA].sub.0], before it terminates. If a test suite T is not
independent, we do not place any restrictions upon the
<[[DELTA].sub.1] ,..., [[DELTA].sub.e]> produced by the test cases
and we simply refer to it as a non-restricted test suite.
Definition 3. A test suite T is independent if and only if for all
[gamma] [member of] {1,...,e}, [[DELTA].sub.[gamma]] = [A.sub.0].
(Kapfhammer, 2004).
2. THE BASE TESTING ACTIVITIES PROPOSAL
There are many testing methods and approaches available, yet
individual standards do not go into details, providing thus a space for
various desinformation. (Cooper & Arbuckle, 2002). As for the data
warehouse design, it should be stated that the existing standards and
guidelines do not cover the activities relevant for building a data
warehouse (Inmon, 2002). There is no particular procedure or activity
related with the process of the multidimensional database design, ETT process or optimization of scripts for OLAP reports. The following part
therefore deals with the proposal of basic datawarehouse testing
activities as a final part of datawarehouse testing methodology. The
testing activities that must be implemented in the process of the
database testing can be split into four logical units regarding the
multidimensional database testing, data pump testing, metadata and OLAP.
When splitting them further, we get the following base activities:
Revision of the multidimensional database scheme
To achieve the best efficiency of SQL statements possible, it is
necessary to keep to the following rules:
--make sure that the columns of each hierarchy level (fact table)
are NOT NULL and that hierarchical integrity is kept to,
--columns of the hierarchy level cannot be associated with more
than one dimension,
--structure of the columns in the dimension table should be in
demoralized shape,
--hierarchy levels cannot be mutually interconnected, recourse must
not occur.
Testing the batch processing response
The testing for the efficiency of a batch processing should be
implemented by simulating the batch in the system loaded with real data
with real infrastructure and operating at the same time as a real
system.
Optimization of number of fact tables
In designing a multidimensional database, it is very important to
decide if to implement a data warehouse as a whole or to start with the
implementation of a smaller data warehouse and data marts. Regarding the
efficiency and the speed of response of the whole system, it is more
favorable to propose a smaller number of fact tables corresponding to a
data mart or a smaller data warehouse. The current trend in building the
data warehouses is to design a smaller compact data warehouse and
several satellite data marts.
Problem of data explosion
The basic problem is that the size of a database is not equal to
the amount of information stored in it.
A database explosion is primarily due to high data sparsity and the
high number of derived members and aggregated dimensions in the
consolidation hierarchies. This happens with the design which does not
consider a higher number of fact and dimensional tables. The problem can
be eliminated by designing several smaller compact data marts.
User-Triggered vs. System triggered
Most of the production system testing is the processing of
individual transactions, which are driven by some input from the users
(Application Form, Servicing Request.). There are very few test cycles,
which cover the system-triggered scenarios (Like billing, Valuation.) In
data Warehouse, most of the testing is system triggered as per the
scripts for ETT, the view refresh scripts etc. (BiPM, 2009).
Testing the time shift (up to date)
It is necessary to pay attention to the possible time shift in
processing the data from data warehouse to OLAP server.
Testing ETT processes
The testing of these processes (sometimes called data pump)
represents a very important step in the process of building a data
warehouse. It is the most complex and demanding part in building a data
warehouse. Testing itself comprises the testing of each script, program
and modules, modules' integrity, as well as consistency of the data
being transformed into the data warehouse (Elias & Stremy, 2008).
The testing stage of ETT (Extraction, Transformation and Transport)
processes involves also the following activities:
--Testing for correctness of aggregation and summation of data In
this testing stage, it is necessary to check the correctness of forming
the data aggregation. After the reverse transformation, all the
aggregated data being filled into the data warehouse should regain their
original values.
--Check for reversibility of data from data warehouse into OLTP systems. This kind of test is closely connected with the previous
activity. It is a control process of reverse transformation of the
aggregated and summarised data into the operational databases.
--Check of distributed processing
With distributed operational systems, it is necessary to test the
data for the condition of recency, i.e. if the replic in question will
be transformed into the warehouse in the correct time horizon.
--Testing data types and metric units
It is the process of testing that must be implemented in the stage
of the ETT process design.
--Testing of relationship
In the transformation of heterogeneous files into a data warehouse,
it is necessary to check the correctness of the design of relationships
which were not carried out (e.g. in DBF and XLS files transformations).
--Revision of converted data
Once stored in a system, the data must be checked for
multidimensionality, such as consistency of data types, number of
excluded lines, the reason for exclusion, as well as any logical errors
in the process, which might result into the logical nonconsistency of
data.
Testing the on-line time response
The testing should take place via initiating the pre-defined
demands simulating the daily-anticipated efficiency of a data warehouse,
while testing the time response.
Testing the data back up and recovery
Prior to starting a warehouse, the strategy of data back-up and
recovery must be configurated. These tests must be executed with the
volume of data equal to the one in the real system, in order to examine
possible effects in the process of the full data recovery.
Possible number of testing scenarios
If a transaction system has hundred different scenarios, the valid
and possible combination of those scenarios will not be unlimited.
However, in case of Data Warehouse, the combinations can possibly test
is virtually unlimited due to the core objective of Data Warehouse is to
allow all possible views of Data. In other words, 'there is no
possibility fully to test a data Warehouse' (BiPM, 2009).
Metadata administration
It is necessary to correctly define the metadata in the beginning
of building a data warehouse, and administer them in the course of
running the data warehouse. Metadata enable us to exactly identify the
errors which may occur in the process of using a warehouse in practice,
as well as to determine when individual processes such as ETT can be
started. Prior to testing, all the related documents, approved test
specifications and testing procedures must be available.
Testing time consistency
One of the most important factors with data warehouses is time
consistency allowing the user to acquire real responses to their SQL
statements. It is necessary therefore to decide on the right granularity
closely related with the time consistency.
3. CONCLUSION
The testing phase as one of the stages of DW development lifecycle
is very important, since the cost depleted for the elimination of a
potential error or defect in a running data warehouse is much higher.
The aim of this article has been to suggest basic datawarehouse testing
activities as a final part of datawarehouse testing methodology.
This contribution as a part of the project No. 1/4078/07 was
supported by VEGA (Vedecka a Edukacna Grantova Agentura), the Slovak
Republic Ministry of Education's grant agency.
4. REFERENCES
Cooper, R. & Arbuckle, S. (2002). How to Thoroughly Test a Data
Warehouse. Proceedings of the 10th International Conference on Software
testing. Florida USA, May 13-17, STAREAST, Orlando
Elias, A. & Stremy, M. (2008). Usage of communication
dispatcher for virtual devices. Proceedings of the 8th International
Scientific-Technical Conference. Kouty, Czech Republic. ISBN 978-80-7395-077-4
Inmon, W.H. (2002). Building the Data Warehouse. John Willey and
Sons. ISBN 0-471-08130-2, London
Kapfhammer G. M. (2004). The Computer Science Handbook. Available
from: http://www.mendeley.com. Accessed: 2009-02-10.
*** BiPM (2009). Building Intelligent and Performing Enterprises.
Available from: http://www.bipminstitute.com/data-warehouse. Accessed:
2009-03-18.