摘要:Deduplication has been widely used in backup systems and archive systems to improve storage utilization effectively. However the traditional deduplication technology can only eliminate exactly the same images, but it is unavailable to duplicate images which have the same visual perceptions but different codes. To address the above problem, this paper proposes a high-precision duplicate image deduplication approach. The main idea of the proposed approach is eliminating the duplicate images by five stages including feature extraction, high-dimension indexing, accuracy optimization, centroid selection and deduplication evaluation. Experimental results demonstrate: in a real dataset, the proposed approach not only effectively saves storage space, but also significantly improves the retrieval precision of duplicate images. In addition, the selection of the centroid images can meet the requirements of people’s perception.