首页    期刊浏览 2024年09月18日 星期三
登录注册

文章基本信息

  • 标题:Automatic Acquisition of Annotated Training Corpora for Test-Code Generation
  • 作者:Magdalena Kacmajor ; Magdalena Kacmajor ; John D. Kelleher
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2019
  • 卷号:10
  • 期号:2
  • 页码:66
  • DOI:10.3390/info10020066
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:Open software repositories make large amounts of source code publicly available. Potentially, this source code could be used as training data to develop new, machine learning-based programming tools. For many applications, however, raw code scraped from online repositories does not constitute an adequate training dataset. Building on the recent and rapid improvements in machine translation (MT), one possibly very interesting application is code generation from natural language descriptions. One of the bottlenecks in developing these MT-inspired systems is the acquisition of parallel text-code corpora required for training code-generative models. This paper addresses the problem of automatically synthetizing parallel text-code corpora in the software testing domain. Our approach is based on the observation that self-documentation through descriptive method names is widely adopted in test automation, in particular for unit testing. Therefore, we propose synthesizing parallel corpora comprised of parsed test function names serving as code descriptions, aligned with the corresponding function bodies. We present the results of applying one of the state-of-the-art MT methods on such a generated dataset. Our experiments show that a neural MT model trained on our dataset can generate syntactically correct and semantically relevant short Java functions from quasi-natural language descriptions of functionality.
  • 关键词:test automation; code generation; neural machine translation; naturalness of software; statistical semantics test automation ; code generation ; neural machine translation ; naturalness of software ; statistical semantics
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有