首页 | 本学科首页   官方微博 | 高级检索  
     

基于软件知识图谱的代码语义标签自动生成方法
引用本文:邢双双,刘名威,彭鑫.基于软件知识图谱的代码语义标签自动生成方法[J].软件学报,2022,33(11):4027-4045.
作者姓名:邢双双  刘名威  彭鑫
作者单位:复旦大学 计算机科学技术学院, 上海 201203;上海市数据科学重点实验室(复旦大学), 上海 201203
基金项目:国家自然科学基金(61972098)
摘    要:开源及企业软件项目和各类软件开发网站上的代码片段是重要的软件开发资源.然而,很多开发者代码搜索需求反映的代码的高层意图和主题难以通过基于代码文本的信息检索技术来实现精准的代码搜索.因此,反映代码整体意图和主题的语义标签对于改进代码搜索、辅助代码理解都具有十分重要的作用.现有的标签生成技术主要面向文本内容或依赖于历史数据,无法满足大范围代码语义标注和辅助搜索、理解的需要.针对这一问题,提出了一种基于知识图谱的代码语义标签自动生成方法KGCodeTagger.该方法通过基于API文档和软件开发问答文本的概念和关系抽取构造软件知识图谱,作为代码语义标签生成的基础.针对给定的代码,该方法识别并抽取出通用API调用或概念提及,并链接到软件知识图谱中的相关概念上.在此基础上,该方法进一步识别与所链接的概念相关的其他概念作为候选,然后按照多样性和代表性排序,产生最终的代码语义标签.通过实验对KGCodeTagger软件知识图谱构建的各个步骤进行了评估,并通过与几个已有的基准方法的比较,对所生成的代码语义标签质量进行了评估.实验结果表明,KGCodeTagger的软件知识图谱构建步骤是合理有效的,该方法所生成的代码语义标签是高质量、有意义的,能够帮助开发人员快速理解代码的意图.

关 键 词:程序理解  代码搜索  知识图谱  语义标签
收稿时间:2020/12/25 0:00:00
修稿时间:2021/2/13 0:00:00

Automatic Code Semantic Tag Generation Approach Based on Software Knowledge Graph
XING Shuang-Shuang,LIU Ming-Wei,PENG Xin.Automatic Code Semantic Tag Generation Approach Based on Software Knowledge Graph[J].Journal of Software,2022,33(11):4027-4045.
Authors:XING Shuang-Shuang  LIU Ming-Wei  PENG Xin
Affiliation:School of Computer Science, Fudan University, Shanghai 201203, China;Shanghai Key Laboratory of Data Science (Fudan University), Shanghai 201203, China
Abstract:Code snippets in open-source and enterprise software projects and posted on various software development websites are important software development resources. However, developer''s needs for code search often reflect high-level intentions and topics, which are difficult to be satisfied through code search techniques based on information retrieval. It is thus highly desirable that code snippets can be accompanied with semantic tags reflecting their high-level intentions and topics to facilitate code search and understanding. Existing tag generation technologies are mainly oriented to text content or rely on historical data, and cannot meet the needs of large-scale code semantic annotation and auxiliary code search and understanding. Targeted at the issue, this study proposes an approach based on software knowledge graph (called KGCodeTagger) that automatically generates semantic tags for code snippets. KGCodeTagger constructs a software knowledge graph based on concepts and relations extracted from API documentations and software development Q&A text and uses the knowledge graph as the basis of code semantic tag generation. Given a code snippet, KGCodeTagger identifies and extracts API invocations and concept mentions, and then links them to the corresponding concepts in the software knowledge graph. On this basis, the approach further identifies other concepts related to the linked concepts as candidates and selects semantic tags from relevant concepts based on the diversity and representativeness. The software knowledge graph construction steps of KGCodeTagger and the quality of the generated code tags are evaluated. The results show that KGCodeTagger can produce high-quality and meaningful software knowledge graph and code semantic tags, which can help developers quickly understand the intention of the code.
Keywords:program comprehension  code search  knowledge graph  semantic tag
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号