首页 | 本学科首页   官方微博 | 高级检索  
     

基于语义关联的中文网页主题词提取方法研究*
引用本文:李芳芳,葛斌,毛星亮,汤大权.基于语义关联的中文网页主题词提取方法研究*[J].计算机应用研究,2011,28(1):105-107.
作者姓名:李芳芳  葛斌  毛星亮  汤大权
作者单位:1. 国防科学技术大学C4ISR技术国防科技重点实验室,长沙,410073
2. 中共湖南省委互联网新闻宣传办公室,长沙,410011
基金项目:国家自然科学基金资助项目(60903225);湖北省自然科学基金资助项目(2008CDB388)
摘    要:提出了一种基于语义关联的中文网页主题词提取方法,首先借助滑动窗口和“知网”计算词语间的语义相似度,形成候选名词对集合;然后基于该集合生成无向图表示词语间的语义联系,并通过该无向图对主题词权重进行建模;最后选取权值较高的名词作为主题词。实验结果表明,相比未建立语义关联的主题词提取方法,本方法在查准率、召回率和F1测度值上均有一定的提高,当提取主题词个数为7时,本方法召回率和F1测度值达到最大值,且分别较传统方法最大值提高了12.5%和9.53%。

关 键 词:语义关联  中文网页  主题词  权重

Thematic words extraction from Chinese Web pages based on semantic relations
LI Fang-fang,GE Bin,MAO Xing-liang,TANG Da-quan.Thematic words extraction from Chinese Web pages based on semantic relations[J].Application Research of Computers,2011,28(1):105-107.
Authors:LI Fang-fang  GE Bin  MAO Xing-liang  TANG Da-quan
Affiliation:LI Fang-fang1,GE Bin1,MAO Xing-liang2,TANG Da-quan1 (1.Key Laboratory of C4ISR Technology National Defense Science & Technology,National University of Defense Technology,Changsha 410073,China,2.Internet News Management Office of Publicity Department of Hunan Provincial CCP Committees,Changsha 410011,China)
Abstract:This paper proposed a new thematic words extraction method based on semantic relations.Firstly,used sliding window and HowNet to calculate semantic similarity between words, to form the candidate noun pairs.Then generated undirected graph based on these noun pairs to show the semantic links between them, and based on the graph to model the weight of words. Finally,selected the terms with higher weight as thematic words. Experimental results show that the proposed method substantially outperforms the traditi...
Keywords:semantic relations  Chinese Web pages  thematic words  weight
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号