摘要:In hybrid cloud computing era, hybrid clusters which are made of virtual machines and physical machines would be seen more and more generally. Hybrid clusters need more careful organization for finer resource allocations. Another problem of big data in this era is that database system can not well-handled the semi-structured and unstructured data. Luckily, MapReduce is a good weapon to solve the increasing big size and quickly-increased data at this social computing and multimedia computing time. One of the biggest challenges in hybrid mapreduce cluster is I/O bottleneck which would be aggravated under big data computing. In this paper, we take data locality into consideration and group slave nodes with low intra-communication and high inter-communication. After introducing the architecture and implementation of our grouped hybrid mapreduce cluster(GHMC), we give our method of k-means algorithm to group in our GHMC system and evaluate it with reality environments. The results show that there is a nearly 34.9% performance improvement in our GHMC system which are deployed by our K-means algorithm. What’s more, it also shows good scalability.