首页 | 本学科首页   官方微博 | 高级检索  
     

基于维基百科的领域概念语义知识库的自动构建方法*
引用本文:张巧燕,林民,张树钧.基于维基百科的领域概念语义知识库的自动构建方法*[J].计算机应用研究,2018,35(1).
作者姓名:张巧燕  林民  张树钧
作者单位:内蒙古师范大学 计算机与信息工程学院,内蒙古师范大学 计算机与信息工程学院,内蒙古师范大学 计算机与信息工程学院
基金项目:国家自然科学基金(61562068);国家自然科学基金资助项目(61640204);内蒙古自然科学基金(2015MS0629);内蒙古自然科学基金(2014MS0617);内蒙古民委蒙古文信息化专项扶持子项目(MW-2014-MGYWXXH-01);内蒙古自治区高等学校科学研究项目(NJZY028);内蒙古师范大学引进人才科研启动经费项目(2014YJRC036);内蒙古师范大学校级基金(2015YBXM002);
摘    要:针对为检索服务的语义知识库存在的内容不全面和不准确的问题,提出一种基于维基百科的软件工程领域概念语义知识库的构建方法;首先,以SWEBOK V3概念为标准,从维基百科提取概念的解释文本,并抽取其关键词表示概念的语义;其次,通过概念在维基百科中的层次关系、概念与其它概念解释文本关键词之间的链接关系、不同概念解释文本关键词之间的链接关系构建概念语义知识库;接着, LDA主题模型分别和TF-IDF算法、TextRank算法相结合的两种方法抽取关键词;最后,对构建好的概念语义知识库用随机游走算法计算概念间的语义相似度;将实验结果与人工标注结果对比发现,本方法构建的语义知识库语义相似度准确率能够达到84%以上;充分验证了所提方法的有效性。

关 键 词:维基百科    语义知识库  关键词抽取  语义相似度计算  随机游走
收稿时间:2016/8/14 0:00:00
修稿时间:2017/11/20 0:00:00

The research on automatic construction of domain concepts onWikipedia semantic knowledge base
Zhang Qiaoyan,Lin Min and Zhang Shujun.The research on automatic construction of domain concepts onWikipedia semantic knowledge base[J].Application Research of Computers,2018,35(1).
Authors:Zhang Qiaoyan  Lin Min and Zhang Shujun
Affiliation:College of Computer and Information Engineering,Inner Mongolia Normal University,Hohhot,Inner Mongolia,,College of Computer and Information Engineering,Inner Mongolia Normal University,Hohhot,Inner Mongolia
Abstract:The problem of incomplete and inaccurate content for the retrieval of semantic knowledge base exists,. propose a method of constructing the concept semantic knowledge base in the field of software engineering based on Wikipedia. First, Taking the concept of SWEBOK V3 as the standard, the interpretation of the concept is extracted from Wikipedia and the keywords are extracted to represent the semantic meaning of the concept. Second,Through the concept of hierarchical relationships in Wikipedia,link relationships between concepts and explanatory text of other concepts and link relationships between explanatory texts of different concepts to build concept semantic knowledge base. . Then, The LDA topic model is combined with the two methods that are called TF-IDF algorithm and TextRank algorithm respectively serve the keywords extraction. Finally,The semantic similarity between concepts is calculated by the random walk algorithm for the construction of the concept semantic knowledge base. The experimental results were compared with the manual work , The semantic similarity of knowledge base constructed by this method can reach more than 84%, It effectiveness of the proposed method is verified.
Keywords:wikipedia  semantic knowledge base  keywords extraction  semantic similarity computation  random walk  
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号