期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2016
卷号:83
期号:1
出版社:Journal of Theoretical and Applied
摘要:Cross-Language Plagiarism Detection (CLPD)is used to automatically identify and extract plagiarism among documents in different languages.The main challenge of cross-languageplagiarism detection is the difference of text languages, where the original source can be analysed and translated, and plagiarism can be detected automatically by comparing suspected text with the original text. This paper proposes an Arabic-English cross-language plagiarism detection method,to automatically detect the semantic relatedness between the words of two suspect targeted files.The proposed method consists of four phases. The first phase is a pre-processing phase,the second involves key phrase extraction and translation, the third phase used plagiarism detection techniques and the fourth phase is the classification process, which using Linear Logistic Regression (LLR). The evaluation process is created using precision and recall measurements of a dataset consisting of Wikipedia articles. The experimental resultsachieved96% precision, 85% recall and 90.16% F-measure. The results show that the LLRalgorithm can be used effectively to detect Arabic-English cross-language plagiarism.
关键词:Cross-Language Plagiarism Detection; Linear Logistic Regression;Arabic-English Cross- Language Plagiarism; Plagiarism Detection; And Wikipedia Articles.