Possible particular abstract approach to validation.
Masar, Alojz ; Tanuska, Pavol ; Masarova, Renata 等
1. INTRODUCTION
There are a lot of points of view used to look at a validation.
Almost every computer program, computer system or software framework
deals with validation in a particular manner. A reason why validation is
investigated determines meaning of the words used to define it. The
words such as sensibility, reasonability, correctness, accuracy are
usually used to define validation. There is no commonly accepted
definition of the validation. Generally, the validation is a process of
ensuring that the system operates on clean, correct and useful data and
its operations are correct and useful too. Validation rules are realized
in the system for check of correctness and meaningfulness of data that
are input to the system or the viability of operations. The rules may be
implemented through automated facilities of a data dictionary, or by the
inclusion of explicit (application program) validation logic.
Suitable formalism for UML models was introduced by OMG. Object
Constraint Language (OCL) is a formal language used to describe
expressions (OMG, 2006). It could be used to describe invariants, pre-
and post-condition, guards and constraints on UML models. Other attempt
has been made by The Apache Software Foundation. Commons validator is a
framework which provide a configurable (typically XML) validation engine
and reusable simple validation methods (ASF, 2006). The Java Community
Process standardises data validation for Java by JSR-303 (JCP, 2009).
The proposed final draft is published. A reference implementation of
this standard is Hibernate validator 4.0.0 (Hibernate, 2009).
These efforts solve the validation problem partially and usually
are focused to the data validation. There is no common background, which
permit their orchestration. This paper deals with the validation in the
systems based on computer programs. Following sections set possible
approach to explore validations in the particular but sufficiently
abstract manner.
2. EXAMINATION
Validation is an automatic computer check to ensure that the data
entered or the executed operations are sensible and reasonable,
according to the rules defined in the system and the data realized in
the system or reachable by the system.
Differently, the validation does not check the accuracy of data or
operations according to the rules defined outside the system or to the
data unreachable by the system. The validation can be designed into
system with several differing approaches e.g. user interface code,
application code, or database constraints.
2.1 Data validation
At first it is necessary to say some words about data types. It is
assumed in this paper, that data has type. In a broad sense, the data
type defines a set of values and the allowable operations on those
values. Sufficient overview of common types is published on Wikipedia.
More detailed description is in (Wirth, 1985).
Basic data validation is based on the rules concerned with own
feature of the particular data. There are two basic types of data
validation.
* Range checking (in a broad sense)--e.g. less 3, in <2.5),
isUpper. Special case of range checking is data type validation--e.g.
isNumber, isChar, isTypeOf, isTypeOf.
* Presence checking--e.g. obligatory (isRequired), existence of
value (notNull)
Simply, the evaluation of these rules is based only on the check of
the value of particular data and no more pieces of other data are
needed.
Complex data validation is based on the more complex rules. Other
piece of data is required to evaluate these rules. Complex data
validation could be based on the result of another validation. This
piece of data has to be obtainable at the moment of evaluation. There
are two ways, how can be the additional piece of data reached.
* It is derived from already presented data in the system--the use
of unmodified value (or values) of the other data is a special case of
this.
* Obtained (and possibly derived) from outside the system--this
obtained data have to be temporally presented (during evaluation) in the
system. This sounds like preceding point, but in fact, interaction
outside the system is fundamentally different activity from processing
internal data.
2.2 Operation validation
There are two possible approaches. The first approach disallows
adding or removing operations. The operation in contrast to data is
fixed part of the system. Systems neither change the set of his
functions (represented by his operations) nor theirs nature. Therefore
the validation of the particular operation is reduced to detection of
the operation's viability. The viability of operation means the
system could execute particular operation or not (at the right moment).
It not express the fact the operation is runnable or not. The viability
of operation depends on the current set of data values at the moment of
validation. This type of systems is called static.
The second approach allows the modification of the set of
operations. The operation could be added to the system or could be
removed from it. In spite of this, the deterministic nature of operation
guarantee, that the function of the particular operation cannot be
changed (operation are added or removed in its entirety). The viability
of the operation depends on the current set of data values at the moment
of validation and on the current set of operations of the system. This
approach allows examining the presence of the particular operation. This
type of systems is called dynamic.
3. DEFINITION
3.1 Validation rule
Validation rule determines the validity condition of data (and
theirs values) or operations in the system. Condition evaluation is
called check. Check is set off by some event in the system or by
interaction of the system with something outside (e.g. another system or
user) of the system and determines if the data or the operations (or
both) of the system are valid or not.
Definition: Let S is set of all possible data, its values in the
system or reachable by the system and all operations of the system. Let
D is subset of the S. Then function v: D [right arrow] {0,1} is
primitive validation function.
The primitive validation function represents simple condition at
particular moment. The evaluation of this function represents the check.
The state of the system is determined by data values and by set of
operations at the moment. It is evident that the state could be
represented by D at some moment.
Let S' is a subset of S. The set S' consists of those
elements of S which are presented in the system or reachable by the
system at concrete moment. A reaction of the system to the event could
cause a change of the system state and so cause the change of the set
S'. It could be changed after the event has been set off. This is
the reason that the domain (in mathematical sense) of some primitive
validation function would not be the subset of the current state and the
function to become partially or whole undefined for this state (for each
element of the set S'). To avoid this awkward situation we extend
the codomain of primitive validation function to the set {0, 1,
[epsilon]} and define validation function for each element of the
complement of S' [intersection] D in S'. These elements are
imaged to the value [epsilon].
Definition: Let S is set of all possible data and its values in the
system or reachable by the system and all operations of the system. Let
D and S' are subsets of S. Let function v is primitive validation
function v: D [right arrow] {0,1}. Let [sup.c]S = (S'
[intersection] D)\S'. Function [bar.v]: S' [right arrow] {0,1,
[epsilon]} defined by [for all] x [member of] S' [intersection] D,
y [member of] [sup.c]S; V(x) = [bar.v](x), [bar.v](y) = [epsilon] is
validation function in S'.
There is the second motivation to put [epsilon] into game. The
value s represents exception in real systems. The well designed system
ought to be deal with exceptions and so the value [epsilon] simply
respects this fact.
3.2 State
As we mentioned earlier, the state consists of data, its values and
operations. We have silently assumed the system is "alive" and
could actively deal with the events coming to it or arising in it as a
consequence its own activity. This assumption permits us to deal with
dynamic behaviour of the system. The activity of the system could lead
to the interaction outside of the system or to the change of the system
itself (or both). The only way, how could be the system changed is
through change of its state. But the assumption of the live system is
not necessary. The state could be changed outside the system. System
could use data of another system or could activate functions of another
system or use operations prepared to it elsewhere (e.g. plugins). The
system operates only in his own internal space. Therefore it must have
some knowledge about external data or operations. There are two
approaches to this fact. Stricter one considers this knowledge the
integral part of the system and therefore does not allow
"temporary" states. Simply, every state is "regular"
state. Less strict one does not pay attention to the temporary states
between event and reaction. None of them is better. Which approach ought
to be used depend on specific conditions of examination of the system.
One way or the other, the state of the system consists of the sets of
the system data and its values, temporary data and its values, system
operations and temporary operations.
3.3 Data
Data represents a piece of information in the system. Data must be
identifiable and characterised by the type in order to be usable by the
system. The type is a tuple T = [M, O]; M is the set of allowed values
and O is set of operations on this values. Each type must be
recognizable for the system. The data is a tuple [identificator, T]
where the identificator is unique in the system and recognizable by the
system. Finally, the data value is a tuple [data, x] where x is the
element of the set M. This notation is a little bit tricky. The used
types are usually well known. Therefore we sometime use notation
identificator:T for data and idetificator:T=x for its value. If some
mistake could be avoided we use only identificator for data and
identificator = x for its value.
It is not important how the system indentifies the types, data or
values. This assumption respect the fact, that running software program
(in binary format) does not address data by the same way as a programmer
in the source form of the same program. Really, the programmer could
create program without this knowledge. This definition deals with
primitive types, data and values from the system's point of view.
This means, that the system knows them and they are elementary.
4. CONCLUSION
This adumbrate approach allows map the validation problem to the
well known results of set theory and algebra. On one hand, the degree of
abstraction enables us to direct our attention to principal
characteristics of the system, but on the other hand, it provides
mechanisms to deal with details. Verbal mapping of this formalism to the
(more or less) vague terms of software systems is very important part of
this approach. It enables interpretation of results in a more convenient
form and simplifies practical use.
We have exposed some fundamentals, but have left many questions
unanswered. Operation and events of the system have not been exposed at
all. Definition of these terms, constructing more complicated validation
functions and values, research a dynamics of the system, interaction of
checks and operations, mapping results to the real problems could be a
task of further work.
5. REFERENCES
ASF (2006), http://commons.apache.org/validator/--Common validators
version 1.3.0, Accessed: 2006-06-07
Hibernate (2009), http://www.hibernate.org/
hib_docs/validator/-Hibernate validator 4.0.0 Beta1, Accessed:
2009-05-05
JCP (2009), http://jcp.org/en/jsr/detail?id=303--JSR 303: Bean
Validation, Accessed: 2006-04-12
OMG (2006), http://www.omg.org/docs/formal/06-05-01.pdf-Object
Constraint Language Version 2.0, Accessed: 2006-06-07
Wirth, N. (1985), Algorithms and Data Structures, Prentice Hall,
ISBN: 978-0130220059