摘要:The research intends to boost the relevance of Web search results by classifying Websnippet into socially constructed hierarchical search concepts, such as the most comprehensive human edited knowledge structure, the Open Directory Project (ODP). The semantic aspects of the search concepts (categories) in the socially constructed hierarchical knowledge repositories are extracted from the associated textual information contributed by societies. The textual information is explored and analyzed to construct a category-document set, which is subsequently employed to represent the semantics of the socially constructed search concepts. Simple API for XML (SAX), a component of JAXP (Java API for XML Processing) is utilized to read in and analyze the two RDF format ODP data files, structure.rdf and content.rdf. kNN, which is trained by the constructed category-document set, is used to categorized the Web search results. The categorized Web search results are then ontologically filtered based on the interactions of Web information seekers. Initial experimental results demonstrate that the proposed approach can improve precision by 23.5%.
关键词:HTML, SAX, Web search, ontology, semantic analysis, socially constructed knowledge repository, the Open Directory Project