首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Set of Frequent Word Sequence (SFWS) as Document Model for Feature Based Document Clustering
  • 本地全文:下载
  • 作者:Gusti Ayu Putri Saptawati
  • 期刊名称:International Journal on Electrical Engineering and Informatics
  • 印刷版ISSN:2085-6830
  • 出版年度:2019
  • 卷号:11
  • 期号:4
  • 页码:822-832
  • DOI:10.15676/ijeei.2019.11.4.13
  • 出版社:School of Electrical Engineering and Informatics
  • 摘要:Sequence of word sequence has been considered as an appropriate text representationsince text reveal inherent sequential nature. Those representations are Frequent Word Sequence(FWS), Set of Frequent Word Sequence (SFWS) and Frequent Word Itemsets (FWI).Moreover, Maximal Frequent Sequence (MFS) is text feature that exploiting sequentialproperty of textual data. In this paper, we proposed SFWS as the best text representation fordocument clustering. SFWS considers document as set of sentences in which sentence is thelanguage highest grammatical hierarchy, conveying a complete thought. Consequently,document clustering would have accurate results. The main contribution of this work is thedata pre-processing, feature extraction and selection based on SFW. Since SFWS works basedon sentence, we need to construct sequence sentences of all document into sequence databasefor sentences. Then, sequential pattern mining was applied to extract set of frequent sentencesequence. And finally, we select features with maximal set of frequent sequence (MSFS). Weconducted experiments on Twenty News Group Text Data (TNTD). To do so, we developedFeature based clustering (FBC) algorithm with MSFS as text feature based on SFWSrepresentation. The experimental results showed that document clustering based on SFWS hadthe highest accuracy, compared with FWS and FWI.
  • 关键词:Frequent Word Sequence (FWS); Set of Frequent Word Sequence (SFWS);Frequent Word Itemset (FWI); Maximal Frequent Sequence (MFS); document clustering;Feature Base Clustering
国家哲学社会科学文献中心版权所有