首页 | 本学科首页   官方微博 | 高级检索  
     

基于本体的领域文档主题抽取方法研究
引用本文:陈金梁,李青. 基于本体的领域文档主题抽取方法研究[J]. 电脑开发与应用, 2014, 0(9): 44-47
作者姓名:陈金梁  李青
作者单位:北京航空航天大学机械工程及自动化学院,北京,100191
摘    要:为了使抽取的主题词更能反映领域文档的内容,提出一种基于本体的领域文档主题抽取方法。该方法利用领域文档的特点,使用领域本体对文档词汇集进行过滤,排除非领域高频词汇的干扰并降低文档词汇集维度,从而提高算法效率和抽取质量;利用同/近义词典对文档候选主题词及其权重进行合并,降低同/近义词对抽取结果的影响,使得结果更加全面准确。实验表明,该方法具有较高的正确率和召回率。

关 键 词:手工标注主题词既费时费力又具有较强主

Research Based on Domain Ontology Document Theme Extraction Method
CHEN Jin-liang,LI Qing. Research Based on Domain Ontology Document Theme Extraction Method[J]. Computer Development & Applications, 2014, 0(9): 44-47
Authors:CHEN Jin-liang  LI Qing
Affiliation:(School of Mechanical Engineering and Automation, Beijing University of Aeronautics and Astronautics, Beijing 100191, China )
Abstract:In order to reflect contents of the extracted keywords field of the document, this paper proposes a field of document extraction method based on ontology. The method uses the characteristics of the field of document, which uses domain ontology to filter documentation vocabulary, to exclude the interference of high frequency vocabularies not in the field, and reduce the dimension of documentation vocabulary, thus improving the algorithm efficiency and extraction quality. This method uses synonym/near synonym dictionary to candidate theme words and their weights, reduces the impact of synonym/near synonym on extraction results, making the results more comprehensive and accurate. Experiments show that the method has higher precision and recall rate.
Keywords:theme extraction  domain ontology  synonym/near synonym  TF-IDF
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号