期刊名称:International Journal of Software Engineering and Its Applications
印刷版ISSN:1738-9984
出版年度:2015
卷号:9
期号:1
页码:183-190
DOI:10.14257/ijseia.2015.9.1.16
出版社:SERSC
摘要:Analyzing the unstructured information in the source code (that is, the comments and identifiers) is based on the idea that the unstructured information reveals, to some extent, the concepts of the problem domain of the software. This information adds a new layer of source code semantic information and captures the domain semantics of the software. Developers use identifiers, method names, and comments to incorporate components of the solution domain of the software. Topic models reveal topics from the corpus, which embody real world concepts by analyzing words that frequently co-occur. These topics have been found to be effective mechanisms for describing the major themes spanning a corpus. Recently, software engineering researchers established that topic models can be effective in structuring various software artifacts, such as bug reports and requirements documents. In this paper, we extract topic models from the textual content of source code by conducting a case study on the source code of Java-based open-source systems, ArgoUML, Checkstyle, JHotDraw and jEdit. The paper investigates the effectiveness of LDA in comprehending large open-source software systems.