文章基本信息

标题：Comparative Analysis of Machine Learning Techniques for Splitting Identifiers within Source Code
本地全文：下载
作者：Abeer Abdulsalam ; Nazre Abdul Rashid
期刊名称：Webology
印刷版ISSN：1735-188X
出版年度：2020
卷号：17
期号：2
页码：776-787
DOI：10.14704/WEB/V17I2/WEB17066
出版社：University of Tehran
摘要：Feature location is the process of extracting identifiers within source code. In software engineering, it is a usual procedure to upgrade software by adding new features. In order to facilitate this process for the developers, feature location has been proposed to extract the significant components within the source code which are the identifiers. One of the challenging issues that faces the feature location task is handling multi-word identifiers where developers may use different type of separations among the words. Different research studies have used various types of techniques. However, recent studies have showed interest in Machine Learning Techniques (MLTs) due to their substantial performance. With the diversity MLTs, there is a vital demand to identify the most accurate one in terms of splitting the identifiers correctly. Therefore, this study aims to provide a comparative analysis of different MLTs including Naïve Bayes, Support Vector Machine and J48. The dataset used in the experiment is a benchmark data that contains vast amount of source codes along with numerous identifiers. Results showed that the best accuracy has been achieved by using the J48 classifier where the f-measure was 66%.
关键词：Feature Location; Split Identifiers; Source Code; Naïve Bayes; Support Vector Machine; J48;