出版社:Centro Latinoamericano de Estudios en Informática
摘要:Everyday a huge number of new information resources are linked to the
web. This way the web is growing very fast, making search tasks more and
more difficult with worse results. To solve the problem several initiatives
were undertaken and a new area of research and development emerged: the one
called Semantic Web. When we refer to the semantic web we are thinking about
a network of concepts. Each concept has a group of related resources and can
be related to other concepts; we can then use this concept network to
navigate among web resources or simply among information resources. From the
undertaken initiatives one became an ISO standard: Topic Maps ISO 13250. The
aim of this paper is to introduce a Topic Map (TM) Builder, that is a
processor that extracts topics and relations from instances of a family of
XML documents. A TM-Builder is strongly dependent on the resources structure.
So, to extract a topic map for different collections of information resources
(sets of documents with different structures) we have to implement several
TM-Builders, one for each collection. This is not very easy! To overcome this
inconvenient we have created an XML abstraction layer for TM-Builders that
enables us to specify the topic map we want to build from a concrete family
of resources, in order to generate automatically the intended extractor. To
describe that process, i.e. the extraction of knowledge from XML documents to
produce a TM, we present a language to specify topic maps for a class of XML
documents, that we call XSTM (XML Specification for Topic Maps). We also
discuss a XSL processor that automatically generates the Extractor from its
formal specification written in XSTM, the XSTM-P.