期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2021
卷号:99
期号:17
语种:English
出版社:Journal of Theoretical and Applied
摘要:The Internet users that increasing can make it easier to access information even in different languages. Also, the translation application can help users to translate some idea or document without proper citation or acknowledge their idea So, plagiarism is increasing not only in the academic field but also in the industry. A lot of researchers already propose some method to detect plagiarism, but mostly in the European language. Previous research in Indonesian-English plagiarism has already proposed some methods but it is still dependent on machine translation. So, from this research, we purpose a model that can be used to detect cross-language plagiarism without depending on machine translation. The model's purpose is to use combination canonical correlation analysis with the paragraph to vector. Evaluation will be done with the monolingual task and cross-language detection plagiarism. The model evaluation has a good result in monolingual word similarity also when detecting cross-language plagiarism without depending on machine translation. After comparing with the benchmark that using Fingerprint Method with machine translation, the proposed method can detect plagiarism type with paraphrasing more accurately than the benchmark. Even the improvement compared with the benchmark not so significantly but through this proposed method can detect cross-language plagiarism in Indonesian-English language without depending on machine translation. For future work, it needs to enlarge the parallel corpus for Indonesian-English to improve the accuracy of the proposed method.