期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2019
卷号:7
期号:8
页码:3669-3674
DOI:10.15680/IJIRCCE.2019. 0708003
出版社:S&S Publications
摘要:In the substance mining area, noticeable techniques use the sack of-words models, which address a
record as a vector. These methods ignored the word gathering information, and the incredible packing result obliged to
some remarkable spaces. This paper proposes review of another closeness measure reliant on the postfix tree model of
substance files. It separates the word progression information and after that figure the likeness between the substance
records of the corpus by applying a postfix tree closeness that solidifies with TF-IDF weighting system. Preliminary
outcomes on standard record benchmark corpus that exhibit that the new message similarity measure is convincing.
Differentiating and the outcomes of the other two progressive word gathering based procedures, our proposed system
achieves an improvement of about 15% on the typical of F-Measure score.
关键词:Crawler; Term Frequency;Inverse Document Frequency; Clustering; Document Model; Similarity
Measure;