期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2012
卷号:1
期号:8
页码:298-301
出版社:Shri Pannalal Research Institute of Technolgy
摘要:Copy-pasted code is very common in largesoftware and product line software becauseprogrammers prefer reusing code via copy-paste inorder to reduce programming effort. Copy pasted codeis prone to introducing errors. Unfortunately, it ischallenging to efficiently identify copy-pasted code inlarge software. Existing copy-paste detection tools areeither not scalable to large software, or cannot handlesmall modifications in copy-pasted code. In this paperwe propose an enhanced CloSpan algorithm for CPMinertool that uses data mining techniques toefficiently identify copy-pasted code in large softwareincluding operating systems. Specifically, it takes lessthan 20 minutes for CP-Miner with enhanced CloSpanAlgorithm to identify 190,000 copy pasted segments inLinux and 150,000 in FreeBSD.
关键词:software product lines; code reuse; code;duplication; data mining.