首页    期刊浏览 2025年02月20日 星期四
登录注册

文章基本信息

  • 标题:Document Clustering Using K-Means videHadoop
  • 本地全文:下载
  • 作者:Manisha Agrawal ; Nisha Pandey
  • 期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
  • 印刷版ISSN:2320-9798
  • 电子版ISSN:2320-9801
  • 出版年度:2018
  • 卷号:6
  • 期号:5
  • 页码:5613-5618
  • DOI:10.15680/IJIRCCE.2018.0605023
  • 出版社:S&S Publications
  • 摘要:Clustering is a useful data mining technique which group’s data points such that the points within a single group have similar characteristics, while the points in different groups are dissimilar. Partitioning algorithm methods such as k-means algorithm is one kind of widely used clustering algorithms. As there is an increasing trend of applications to deal with vast amounts of data, clustering such big data is a challenging problem. Recently, partitioning clustering algorithms on a large cluster of commodity machines using the MapReduce framework have received a lot of attention. Traditional way of clustering text documents is Vector space model, in which TF-IDF is used for k-means algorithm with supportive similarity measure. This scheme or paper exhibits an approach to cluster text documents in which results obtained by executing map reduce k-means algorithm on single node cluster on hadoop show that the performance of the algorithm increases as the text corpus increases thus forming the non-redundant results and appropriate information.
  • 关键词:Big Data; Hadoop; Yet Another Resource Negotiator (YARN); K;Means
国家哲学社会科学文献中心版权所有