首页 | 本学科首页   官方微博 | 高级检索  
     

基于图嵌入的软件项目源代码检索方法
引用本文:凌春阳,邹艳珍,林泽琦,谢冰,赵俊峰.基于图嵌入的软件项目源代码检索方法[J].软件学报,2019,30(5):1481-1497.
作者姓名:凌春阳  邹艳珍  林泽琦  谢冰  赵俊峰
作者单位:高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871;北京大学(天津滨海)新一代信息技术研究院, 天津 300450,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871;北京大学(天津滨海)新一代信息技术研究院, 天津 300450
基金项目:国家重点研发计划(2016YFB1000801);国家杰出青年科学基金(61525201)
摘    要:源代码检索是软件工程领域的一项重要研究问题,其主要任务是检索和复用软件项目API(application program interface,应用程序接口).随着软件项目的规模越来越大、越来越复杂,当前,源代码检索一方面需要提高基于自然语言API查询的准确性,另一方面需要定位和展示目标API及其相关代码之间的关联,以更好地辅助用户理解API的实现逻辑和使用场景.为此,提出一种基于图嵌入的软件项目源代码检索方法.该方法能够基于软件项目源代码自动构建其代码结构图,并通过图嵌入对源代码进行信息表示.在此基础上,用户可以输入自然语言问题、检索并返回相关的API及其关联信息构成的连通代码子图,从而提高API检索和复用的效率.在以开源项目Apache Lucene和POI为例的检索实验中,该方法检索结果的F1值比现有基于最短路径的方法提高了10%,同时显著缩短了平均响应时间.

关 键 词:API检索  代码检索  代码图  图嵌入
收稿时间:2018/8/31 0:00:00
修稿时间:2018/10/31 0:00:00

Approach to Searching Software Source Code with Graph Embedding
LING Chun-Yang,ZOU Yan-Zhen,LIN Ze-Qi,XIE Bing and ZHAO Jun-Feng.Approach to Searching Software Source Code with Graph Embedding[J].Journal of Software,2019,30(5):1481-1497.
Authors:LING Chun-Yang  ZOU Yan-Zhen  LIN Ze-Qi  XIE Bing and ZHAO Jun-Feng
Affiliation:Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China,Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;Peking University Information Technology Institute(Tianjin Binhai), Tianjin 300450, China,Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China,Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China and Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;Peking University Information Technology Institute(Tianjin Binhai), Tianjin 300450, China
Abstract:Searching software source code and locating software''s API (application program interface) are important research issues in software engineering. As software projects are becoming more and more complex, existing search tools mainly face the following two challenges. First, more accurate search results are required in natural language question based search process. Second, the relationships between API are required to illustrate so that these API'' underlying logic and usage scenarios are able to be understood more quickly. In this study, an ovel approach is proposed to searching a software project''s API based on graph embedding. It aims to improve the accuracy of natural language based code graph search. A software project''s code graph is built automatically from its source code and they are represented through graph embedding. For a natural language question, a code-connected subgraph, composed by relevant API and their associated relationships, are returned as the best answer. In experiments, Apache Lucene and POI projects are selected as examples to perform some API search tasks. Experimental results show that the proposed approach improves F1-score by 10% than existing shortest path based approach, while reduces average response time significantly.
Keywords:API search  code search  code graph  graph embedding
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号