期刊名称:International Journal of Software Engineering and Its Applications
印刷版ISSN:1738-9984
出版年度:2014
卷号:8
期号:3
页码:339-350
DOI:10.14257/ijseia.2014.8.3.31
出版社:SERSC
摘要:In this paper, we propose a probabilistic algorithm for detecting duplicated data blocks in low bandwidth network. The algorithm identifies duplicated regions of the destination file, and only sends non-duplicated region of data. The proposed system produces two types of double Index table for a file, each chunk sizes are 4MB and 32KB, respectively. At the first level, system client detects large sized identical data blocks using 4MB chunk sized index- table by using byte-index chunking approach in rapid time. At the second level, we perform byte-index chunking using 32KB index-table on entire non-duplicated data area produced through first level file similarity detection. This gives us opportunity to more accuracy rated data deduplication and doesn't consume so much time because deduplication work restricted by only non-duplicated area. Experiment result shows the proposed approach can reduce processing time significantly comparable to fixed-size chunking. Also data deduplication rate is as high as variable-sized chunking.