首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Punjabi Documents Clustering System
  • 本地全文:下载
  • 作者:Sharma, Saurabh ; Gupta, Vishal
  • 期刊名称:Journal of Emerging Technologies in Web Intelligence
  • 印刷版ISSN:1798-0461
  • 出版年度:2013
  • 卷号:5
  • 期号:2
  • 页码:171-187
  • DOI:10.4304/jetwi.5.2.171-187
  • 语种:English
  • 出版社:Academy Publisher
  • 摘要:Text document clustering inherits its qualities from Natural Languages Processing, Machine Learning and Information Retrieval. For unsupervised document organization, automatic topic extraction and fast information filtering and accuracy in retrieval, this is an effective method. Many clustering algorithms are available for unsupervised document organization and its retrieval thereof. The documents for text clustering are merely considered as an assortment of words in traditional approaches to clustering. The semantic relationship of the words should form the decisive base for clustering, which is generally conveniently forgotten albeit the information is vital for the purpose. A new method for generating frequent phrases by analyzing the semantic relations between the words in a sentence is discussed. Karaka list captures the semantic relations, which is a grammatical connector for connecting Nouns, Pronouns and Verbs in a sentence. This new clustering method utilizes an amalgamation of the theories behind Karaka Analyzer, Frequent Item sets and Frequent Word Sequences. Results are indicative of the fact that New Hybrid approach performs better in terms of Number of Clusters, Meaningful label of Clusters and effectiveness of clustering for those documents which do not have desired information in frequent phrases. Use of semantic features is the key to better results.
  • 关键词:Punjabi Document Clustering;Karaka Theory;Frequent Phrases
国家哲学社会科学文献中心版权所有