摘要:The function of the comparing fingerprints algorithm was to judge whether a new partitioned data chunk was in a storage system a decade ago. At present, in the most de-duplication backup system the fingerprints of the big data chunks are huge and cannot be stored in the memory completely. The performance of the system is unavoidably retarded by data chunks accessing the storage system at the querying stage. Accordingly, a new query mechanism namely Two-stage Bloom Filter (TBF) mechanism is proposed. Firstly, as a representation of the entirety for the first grade bloom filter, each bit of the second grade bloom filter in the TBF represents the chunks having the identical fingerprints reducing the rate of false positives. Secondly, a two-dimensional list is built corresponding to the two grade bloom filter for the absolute addresses of the data chunks with the identical fingerprints. Finally, a new hash function class with the strong global random characteristic is set up according to the data fingerprints’ random characteristics. To reduce the comparing data greatly, TBF decreases the number of accessing disks, improves the speed of detecting the redundant data chunks, and reduces the rate of false positives which helps the improvement of the overall performance of system.