首页    期刊浏览 2024年07月18日 星期四
登录注册

文章基本信息

  • 标题:Feature Weighting Improvement of Web Text Categorization Based on Particle Swarm Optimization Algorithm
  • 其他标题:Feature Weighting Improvement of Web Text Categorization Based on Particle Swarm Optimization Algorithm
  • 本地全文:下载
  • 作者:Yonghe Lu ; Yanhong Peng
  • 期刊名称:Journal of Computers
  • 印刷版ISSN:1796-203X
  • 出版年度:2015
  • 卷号:10
  • 期号:4
  • 页码:260-267
  • DOI:10.17706/jcp.10.4.260-267
  • 出版社:Academy Publisher
  • 摘要:It is usually true that some structures like title can express the main content of texts, and these structures may have an influence on the effectiveness of text categorization. However, the most common feature weighting algorithms, called term frequency-inverse document frequency (TF-IDF) doesn’t think about the structural information of texts. To solve this problem, a new feature weighting algorithm based on Particle Swarm Optimization algorithm is put forward. It considers the structure information (i.e., HTML tags) of web pages. Firstly, web pages are crawled and pre-processed, at the same time, the content of four HTML tags is reserved; secondly, Chi-squared (CHI) is used to select features; thirdly, a new feature weighting algorithm, which is called the feature tag weighting algorithm, is come up with. In the feature tag weighting algorithm, we use particle swarm optimization (PSO) to calculate tag weighting coefficients; lastly, k-nearestneighbor (kNN) is used as the web text categorization. The experiment results show that feature tag weighting algorithm has better performance than TF-IDF in the effectiveness of web text categorization.
  • 其他关键词:Text categorization, TF-IDF, PSO, web text, HTML tag.
国家哲学社会科学文献中心版权所有