基于代码结构知识的软件文档语义搜索方法 Software Text Semantic Search Approach Based on Code Structure Knowledge期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于代码结构知识的软件文档语义搜索方法

引用本文：	林泽琦,邹艳珍,赵俊峰,曹英魁,谢冰.基于代码结构知识的软件文档语义搜索方法[J].软件学报,2019,30(12):3714-3729.

作者姓名：	林泽琦邹艳珍赵俊峰曹英魁谢冰

作者单位：	高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学信息科学技术学院, 北京 100871,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学信息科学技术学院, 北京 100871;北京大学(天津滨海)新一代信息技术研究院, 天津 300450,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学信息科学技术学院, 北京 100871;北京大学(天津滨海)新一代信息技术研究院, 天津 300450,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学信息科学技术学院, 北京 100871,高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学信息科学技术学院, 北京 100871

基金项目：	国家重点研发计划（2016YFB1000801）；国家杰出青年科学基金（61525201）

摘要：	自然语言文本形式的文档是软件项目的重要组成部分.如何帮助开发者在大量文档中进行高效、准确的信息定位，是软件复用领域中的一个重要研究问题.提出了一种基于代码结构知识的软件文档语义搜索方法.该方法从软件项目的源代码中解析出代码结构图，并以此作为领域特定的知识来帮助机器理解自然语言文本的语义.这一语义信息与信息检索技术相结合，从而实现了对软件文档的语义检索.在StackOverflow问答文档数据集上的实验表明，与多种文本检索方法相比，该方法在平均准确率（mean average precision，简称MAP）上可以取得至少13.77%的提升.
关键词：	软件复用自然语言文本代码结构知识信息检索语义搜索
收稿时间：	2017/10/9 0:00:00
修稿时间：	2018/5/7 0:00:00
Software Text Semantic Search Approach Based on Code Structure Knowledge

LIN Ze-Qi,ZOU Yan-Zhen,ZHAO Jun-Feng,CAO Ying-Kui and XIE Bing.Software Text Semantic Search Approach Based on Code Structure Knowledge[J].Journal of Software,2019,30(12):3714-3729.

Authors:	LIN Ze-Qi ZOU Yan-Zhen ZHAO Jun-Feng CAO Ying-Kui and XIE Bing

Affiliation:	Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China,Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;Peking University Information Technology Institute(Tianjin Binhai), Tianjin 300450, China,Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;Peking University Information Technology Institute(Tianjin Binhai), Tianjin 300450, China,Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China and Key Laboratory of High Confidence Software Technologies(Peking University), Ministry of Education, Beijing 100871, China;School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China

Abstract:	Natural language text is a common form of knowledge representation in various software artifacts. During the practice of software reuse, software developers usually need to search the large amount of textual resource. This paper presents a software text semantic search approach based on code structure knowledge. This approach extracts a code structure graph from software source code and leverages it as a domain-specific knowledge base to analyze the semantic meanings of natural language texts. The semantic information is combined with information retrieval technology to re-rank text search results semantically. Experimental results on StackOverflow dataset show that this approach achieves at least 13.77% improvement in mean average precision (MAP) comparing to several text retrieval approaches.

Keywords:	software reuse natural language text code structure knowledge information retrieval semantic search

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏