文章基本信息

标题：Research of Performance of Distributed Platforms Based on Clustering Algorithm
其他标题：Research of Performance of Distributed Platforms Based on Clustering Algorithm
本地全文：下载
作者：Di Jian ; Yanfeng Peng
期刊名称：Journal of Computers
印刷版ISSN：1796-203X
出版年度：2016
卷号：11
期号：3
页码：195-200
DOI：10.17706/jcp.11.3.195-200
出版社：Academy Publisher
摘要：With the deep development and application of Internet technology, data need to be processed more and more, when dealing with large amounts of data. Spark is a versatile high-performance and parallel computing framework, which can be applied to data mining. This paper is based on the parallelization of platforms’ K-means algorithm, by building a YARN cluster environment and making experiments to analyze performance of two distributed platforms, and finally find out that the match of Spark and YARN shows more effective on clustering results and consumes less time on the execution of programs, so it’s more suitable for cluster analysis of big data.
其他关键词：Clustering algorithm, distributed platforms, research of performance.