摘要:Data load balancing is one of the key problems of big data technology. As a big data application, Hadoop has had many successful applications. HDFS is Hadoop Distributed File System and has the load balancing procedure which can balance the storage load on each machine. However, this method cannot balance the overload rack preferentially, and so it is likely to cause the breakdown of overload machines. In this paper, we focus on the overload machines and propose an improved algorithm for balancing the overload racks preferentially. The improved method constructs Prior Balance List list which includes overload machines, For Balance List list and NextForBalanceList list by many factors and balances among the racks selected from these lists firstly. Experiments show that the improved method can balance the overload racks in time and reduce the possibility of breakdown of these racks