首页 | 本学科首页   官方微博 | 高级检索  
     

基于结构感知混合编码模型的代码注释生成方法
引用本文:蔡瑞初,张盛强,许柏炎.基于结构感知混合编码模型的代码注释生成方法[J].计算机工程,2023,49(2):61-69.
作者姓名:蔡瑞初  张盛强  许柏炎
作者单位:广东工业大学 计算机学院, 广州 510006
基金项目:国家自然科学基金(61876043);国家优秀青年科学基金(62122022);广州市科技计划项目(201902010058)。
摘    要:代码注释能够提高程序代码的可读性,从而提升软件开发效率并降低成本。现有的代码注释生成方法将程序代码的序列表示或者抽象语法树表示输入到不同结构的编码器网络,无法融合程序代码不同抽象形式的结构特性,导致生成的注释可读性较差。构建一种结构感知的混合编码模型,同时考虑程序代码的序列表示和结构表示,通过序列编码层和图编码层分别捕获程序代码的序列信息和语法结构信息,并利用聚合编码过程将两类信息融合至解码器。设计一种结构感知的图注意力网络,通过将程序代码的语法结构的层次和类型信息嵌入图注意力网络的学习参数,有效提升了混合编码模型对程序代码的复杂语法结构的学习能力。实验结果表明,与SiT基准模型相比,混合编码模型在Python和Java数据集上的BLEU、ROUGE-L、METEOR得分分别提高了2.68%、1.47%、3.82%和2.51%、2.24%、3.55%,能生成更准确的代码注释。

关 键 词:代码注释生成  混合编码模型  图注意力网络  深度自注意力网络  自然语言处理  
收稿时间:2021-12-21
修稿时间:2022-02-18

Method for Generating Code Comments Based on Structure-aware Hybrid Encoding Model
CAI Ruichu,ZHANG Shengqiang,XU Boyan.Method for Generating Code Comments Based on Structure-aware Hybrid Encoding Model[J].Computer Engineering,2023,49(2):61-69.
Authors:CAI Ruichu  ZHANG Shengqiang  XU Boyan
Affiliation:School of Computers, Guangdong University of Technology, Guangzhou 510006, China
Abstract:Code comments improve the readability of program codes, enhancing software development efficiency and reducing costs.Existing methods for code comment generation feed the sequence form or Abstract Syntax Tree(AST) form of a program code into encoder networks with different structures, which cannot fuse the structural characteristics of different abstract forms of program codes.This results in poor readability of the generated comments.This study proposes a Structure-aware Hybrid Encoding(SHE) model.The SHE model considers both the sequence form and structure form of the program code.This includes capturing the context information and the grammar structure information of the program code by the sequence encoding layer and the graph encoding layer, respectively, and effectively fusing the above two aspect information into the decoder through aggregation encoding.This study further proposes a Structure-aware Graph Attention(SGAT) network to effectively improve the learning ability of the SHE model for the complex grammar structure of a program code by integrating the hierarchical and type information of the grammar structure of the program code into the learning parameters of the graph attention network.The experimental results show that compared with the Structure-induced Transformer(SiT) baseline models, the SHE model improves the Bi-Lingual Evaluation Understudy(BLEU), Recall-Oriented Understudy for Gisting Evaluation-Longest common subsequence(ROUGE-L), and Metric for Evaluation of Translation with Explicit Ordering(METEOR) scores by 2.68%, 1.47%, and 3.82%, respectively, on the Python dataset.Moreover, the SHE model improves BLEU, ROUGE-L and METEOR scores by 2.51%, 2.24%, and 3.55%, respectively, on the Java dataset.The experimental results demonstrate that the SHE model can generate more accurate code comments than the baseline models.
Keywords:code comment generation  hybrid encoding model  graph attention network  deep self-attention network  natural language processing  
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号