首页 | 本学科首页   官方微博 | 高级检索  
     

基于代码结构知识的软件文档语义搜索方法
引用本文:林泽琦,邹艳珍,赵俊峰,曹英魁,谢冰.基于代码结构知识的软件文档语义搜索方法[J].软件学报,2019,30(12):3714-3729.
作者姓名:林泽琦  邹艳珍  赵俊峰  曹英魁  谢冰
作者单位:高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871;北京大学(天津滨海)新一代信息技术研究院, 天津 300450,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871;北京大学(天津滨海)新一代信息技术研究院, 天津 300450,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 信息科学技术学院, 北京 100871
基金项目:国家重点研发计划(2016YFB1000801);国家杰出青年科学基金(61525201)
摘    要:自然语言文本形式的文档是软件项目的重要组成部分.如何帮助开发者在大量文档中进行高效、准确的信息定位,是软件复用领域中的一个重要研究问题.提出了一种基于代码结构知识的软件文档语义搜索方法.该方法从软件项目的源代码中解析出代码结构图,并以此作为领域特定的知识来帮助机器理解自然语言文本的语义.这一语义信息与信息检索技术相结合,从而实现了对软件文档的语义检索.在StackOverflow问答文档数据集上的实验表明,与多种文本检索方法相比,该方法在平均准确率(mean average precision,简称MAP)上可以取得至少13.77%的提升.

关 键 词:软件复用  自然语言文本  代码结构知识  信息检索  语义搜索
收稿时间:2017/10/9 0:00:00
修稿时间:2018/5/7 0:00:00

Software Text Semantic Search Approach Based on Code Structure Knowledge
LIN Ze-Qi,ZOU Yan-Zhen,ZHAO Jun-Feng,CAO Ying-Kui and XIE Bing.Software Text Semantic Search Approach Based on Code Structure Knowledge[J].Journal of Software,2019,30(12):3714-3729.
Authors:LIN Ze-Qi  ZOU Yan-Zhen  ZHAO Jun-Feng  CAO Ying-Kui and XIE Bing
Affiliation:Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China,Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;Peking University Information Technology Institute(Tianjin Binhai), Tianjin 300450, China,Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;Peking University Information Technology Institute(Tianjin Binhai), Tianjin 300450, China,Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China and Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
Abstract:Natural language text is a common form of knowledge representation in various software artifacts. During the practice of software reuse, software developers usually need to search the large amount of textual resource. This paper presents a software text semantic search approach based on code structure knowledge. This approach extracts a code structure graph from software source code and leverages it as a domain-specific knowledge base to analyze the semantic meanings of natural language texts. The semantic information is combined with information retrieval technology to re-rank text search results semantically. Experimental results on StackOverflow dataset show that this approach achieves at least 13.77% improvement in mean average precision (MAP) comparing to several text retrieval approaches.
Keywords:software reuse  natural language text  code structure knowledge  information retrieval  semantic search
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号