首页 | 本学科首页   官方微博 | 高级检索  
     

语法和语义结合的代码补全方法
引用本文:付善庆,李征,赵瑞莲,郭俊霞. 语法和语义结合的代码补全方法[J]. 软件学报, 2022, 33(11): 3930-3943
作者姓名:付善庆  李征  赵瑞莲  郭俊霞
作者单位:北京化工大学信息科学与技术学院,北京100029
基金项目:国家自然科学基金(61702029,61672085,61872026)
摘    要:在软件工程领域,代码补全是集成开发环境(integrated development environment,IDE)中最有用的技术之一,提高了软件开发效率,成为了加速现代软件开发的重要技术.通过代码补全技术进行类名、方法名、关键字等预测,在一定程度上提高了代码规范,降低了编程人员的工作强度.近年来,人工智能技术的发展促进了代码补全技术的发展.总体来说,智能代码补全技术利用源代码训练深度学习网络,从语料库学习代码特征,根据待补全位置的上下文代码特征进行推荐和预测.现有的代码特征表征方式大多基于程序语法,没有反映出程序的语义信息.同时,目前使用到的网络结构在面对长代码序列时,解决长距离依赖问题的能力依旧不足.因此,提出了基于程序控制依赖关系和语法信息结合共同表征代码的方法,并将代码补全问题作为一个基于时间卷积网络(time convolution network,TCN)的抽象语法树(abstract grammar tree,AST)节点预测问题,使得网络模型可以更好地学习程序的语法和语义信息,并且可以捕获更长范围的依赖关系.实验结果表明,该方法比现有方法的准确率提高了约2.8%.

关 键 词:代码补全  程序语法特征  程序语义特征  特征结合  长距离依赖  深度学习
收稿时间:2020-11-08
修稿时间:2020-12-15

Code Completion Approach Based on Combination of Syntax and Semantics
FU Shan-Qing,LI Zheng,ZHAO Rui-Lian,GUO Jun-Xia. Code Completion Approach Based on Combination of Syntax and Semantics[J]. Journal of Software, 2022, 33(11): 3930-3943
Authors:FU Shan-Qing  LI Zheng  ZHAO Rui-Lian  GUO Jun-Xia
Affiliation:School of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
Abstract:In the field of software engineering, code completion is one of the most useful technologies in the integrated development environment (IDE). It improves the efficiency of software development and becomes an important technology to accelerate the development of modern software. Prediction of class names, method names, keywords, and so on, through code completion technology, to a certain extent, improves code specifications and reduces the work intensity of programmers. In recent years, the development of artificial intelligence promotes the development of code completion. In general, smart code completion uses the source code training network to learn code characteristics from the corpus, and makes recommendations and predictions based on the context code characteristics of the locations to be completed. Most of the existing code feature representations are based on program grammar and do not reflect the semantic information of the program. The network structure currently used is still not capable of solving long-distance dependency problems when facing long code sequences. Therefore, this study proposes a method to characterize codes based on program control dependency and grammar information, and considers code completion as an abstract grammar tree (AST) node prediction problem based on time convolution network (TCN). This network models can learn the grammar and semantic information of the program better, and can capture longer-range of dependencies. This method has been proven to be about 2.8% more accurate than existing methods.
Keywords:code completion  program grammar feature  program semantic feature  feature combination  long distance dependency  deep learning
本文献已被 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号