首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Lexical-semantic SLVM for XML Document Classification
  • 本地全文:下载
  • 作者:Long, Jun ; Wang, Luda ; Li, Zude
  • 期刊名称:Journal of Software
  • 印刷版ISSN:1796-217X
  • 出版年度:2014
  • 卷号:9
  • 期号:12
  • 页码:3028-3034
  • DOI:10.4304/jsw.9.12.3028-3034
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:Structured link vector model (SLVM) and its improved version depend on statistical term measures to implement XML document representation. As a result, they ignore the lexical semantics of terms and its mutual information, leading to text classification errors. This paper proposed a XML document representation method, WordNet-based lexical-semantic SLVM, to solve the problem. Using WordNet, this method constructed a data structure for characterizing lexical semantic contents of XML document, and adjusted EM modeling to disambiguate word stems. Then, synset matrix of lexical semantic contents was built in the lexical-semantic feature space for XML document representation, and lexical semantic relations were marked on it to construct the feature matrix in lexical-semantic SLVM. On categorized dataset of Wikipedia XML, using NWKNN classification algorithm, the experimental results show that the feature matrix of our method performs F1 measure better than original SLVM and frequent sub-tree SLVM based on TF-IDF.
  • 关键词:Semi-structured document;SLVM;Lexical semantics;Classification;Feature matrix
国家哲学社会科学文献中心版权所有