Modeling a Semantic Web management system.
Carstoiu, Dorin ; Cernian, Alexandra ; Spanoae, Gavrila 等
1. INTRODUCTION
The Web was designed as an information space, which should be
useful not only for human-human communication; machines should be able
to participate and help. One of the major obstacles to achieving this
goal has been the fact that most information on the Web is designed for
human consumption. Even if a certain company has got a system for
finance management, another system for human resources and perhaps a
separate one for process management, even if the information is derived
from a database with well defined meanings for its columns, the
structure of the data is not obvious for a machine. Even for an employer
it is difficult to find out where information in located. The Semantic
Web approach develops languages for expressing information in a form
that can be processed by a machine (Buraga, 1998).
Metadata (meta data, or sometimes meta information) is "data
about other data" of any sort. An item of metadata may describe an
individual content item or a collection of data including multiple
content items and hierarchical levels, for example a database schema. In
data processing, metadata is data that provides information about the
data managed within an application.
When structured into a hierarchical arrangement, metadata is more
properly called an ontology or schema. Both terms describe "what
exists" for some purpose. For instance, the arrangement of subject
headings in a library catalog serves not only as a guide to finding
books on a particular subject in the stacks, but also as a guide to what
subjects "exist" in the library's own ontology and how
more specialized topics are related to or derived from the more general
subject headings. Metadata is frequently stored in a central location
and used to help organizations standardize their data.
Therefore, if we define a system which combines the benefits of the
Resource Description Framework (RDF) with database metadata, we will get
better performances in retrieving information in real time.
The Semantic Web Management System (SWMS) we propose in this paper
enables the users to achieve their goals by transforming disparate,
fragmented data into viable information capable of answering key
questions. Data from across the web, enterprise, business units and
functional areas is reorganized and recombined to report key information
about information (such as: where is the useful information). It
provides real-time analysis capabilities that increase "the speed
to insight". The SWMS architecture is far more efficient in the
deployment process than a common application.
2. RELATED WORK
We have found different interesting approaches for creating an
infrastructure for the Semantic Web in (Dokulil et al., 2008) and
(Vitvar et al., 2007). In (Dokulil et al., 2008), the authors propose
Trisolda, a Semantic Web infrastructure built around a semantic
repository with importing, querying and data processing interfaces. The
authors also propose the TriQ RDF query language, designed especially
for complex semantic querying. In (Vitvar et al., 2007), the authors
present the WSMX environment, concerned with semantic web services. Its
purpose is to allow, mediation, invocation and inter-operation of the
services.
Our approach is more closely related to the Trisolda approach, both
of them creating a semantic repository where RDF files are stored. In
fact, storing RDF files into a database is not a novelty and there are
already many systems which allow this sort of operation. The novelty we
propose is to create a meta-model of the RDF files stored in the
repository. Thus, we generate a map of the information contained in
these files, which will improve the performances when browsing through
the data. This meta-model will also be stored in the database.
3. SYSTEM ARCHITECTURE
From an architectural point of view, the Semantic Web Management
System (SWMS) is composed of different modules working together.
The main goal of this concept is to integrate the Semantic Web with
the actual needs of a research, development or even corporate process.
The architecture consists of three main layers:
* a Consumer Layer which interfaces the resources (the RDF source
files)
* a Middle Layer which holds certain resources. This layer is the
mediator which enables a common communication language between the
source of information and the SWMS
* the Warehousing Component which stores and prepares the
information for future browsing or updates.
In our architecture, the Warehousing Component and the Consumer
Layer are permanently connected and certain jobs are executed on
scheduled program to ensure an automated process.
The system has the following components:
A. Importing Interface
This sub-component interprets the input files and once the RDF
format is identified, the corresponding tables in the database are
created.
B. Data Integration
Here, the information is integrated in the Semantic Meta Model and
in the Semantic data store. For this operation, the system uses a number
of views, such as the Data Access View from the database Importing
Interface layer, which based on triggering events and data composition
knows how to integrate the information in the database.
C. Data Access Layer
This sub-component is the communication bridge between all the
application interfaces and database layers. It is also the component
which helps the SWMS application to communicate with the end- user.
D. Query Interface
The Query Interface component works in collaboration with the
Application Interface Semantic Meta Model layer of the database in order
to facilitate the management of data. The user will be able to easily
launch complex queries based on SPARQL in order to satisfy their
reporting needs.
E. Data Export Interface
When a set of RDF files are imported into the database, with the
help of the Semantic Meta Model and the Exporting Interface layers, a
third party RDF is created in order to combine all the information
contained in the ontologies from the data store in one big ontology RDF
file which could be exported and used by a another semantic application
or even published on the web.
F. SWMS Database
The Database contains different layers of Data Access:
* Importing Interface--It contains a number of Data Access Views
that, based on triggering evens and/or business logic, integrate the RDF
files information in the database;
* Semantic Meta Model--It is a generic database model. Here,
information about the data that is stored in the data store section is
recorded, together with information regarding user enrolments and
reports definitions.
* Semantic data store--This is the most dynamic part of the system
because it grows with any new file that is imported. For each file in
the data store a minimum number of tables is required: a table that
replicates the file in the database, a table that represents the RDF
entities and another table linking the entities in order to define the
ontology graph or even parallel graphs.
G. SWMS Application
The Semantic Web Management System Application is a simple and user
friendly application, which at present is in the design phase. It will
be made of 6 screens: Main Menu Console, Admin Console, Importing
Console, Reporting Console, Exporting Console and Help Window. The
application has got a simple design and it is using the open source
libraries for Jena Semantic Web Framework, but also other open source
and in house developed libraries.
The scenario for which this system was conceived is the following:
In a company, each department is using dedicated software applications
in order to manage data and documents. Nevertheless, some pieces of
information might be interconnected. For a certain task, somebody may
need to know exactly where to find specific information, as well as the
links and relationship between different information entities. Without a
centralized and efficient repository, this is rather difficult to
accomplish.
[FIGURE 1 OMITTED]
4. CONCLUSION
Semantic technologies are being added to enterprise solutions to
accommodate new techniques for discovering relationships across
different database, business applications and Web services. RDF triples
are persistent, indexed, and queried, similar to other object-relational
data types.
To conclude, the benefits of combining semantic web concepts and
database metadata in one Information Management Tool are: improved data
access, scalability and platform independence.
As future work, we plan to implement and validate the architecture
presented above.
5. REFERENCES
Buraga, C. (1998), An Advanced Concurrent Teleconferencing System,
Proceedings of the 6th International Symposium on Automatic Control and
Computer Science--SACCS'98, Bucharest, Romania.
Dokulil, J., Yaghob, J and Zagoral, F. (2008). Trisolda: The
Environment for Semantic Data Processing, International Journal On
Advances in Software, vol 1 no 1, pp. 43-58.
Vitvar, T, Mocan, A., Kerrigan, M., Zaremba, M., Moran, M.,
Cimpian, E., Haselwanter, T. and Fensel, D. (2007) Semantically-enabled
service oriented architecure: Concepts, techology and application,
Journal of Service Oriented Computing and Applications.
***W3C Semantic Web, http://www.w3.org/2001/sw/, Accesed on:
2009-06-01
*** RDF, http://www.w3.org/RDF/, Accesed on: 2009-06-01
***SPARQL, http://www.w3.org/TR/rdf-sparql-query/, Accessed on:
2009-06-01