首页 | 本学科首页   官方微博 | 高级检索  
     

基于中文维基百科的词语语义相关度计算
引用本文:万富强,吴云芳.基于中文维基百科的词语语义相关度计算[J].中文信息学报,2013,27(6):31-38.
作者姓名:万富强  吴云芳
作者单位:北京大学 计算语言学教育部重点实验室, 北京,100871
基金项目:国家自然科学基金资助项目(61371129),教育部人文社会科学研究规划基金资助项目(13YJA740060),国家社科基金资助项目(12&ZD227)
摘    要:语义相关度计算在信息检索、词义消歧、自动文摘、拼写校正等自然语言处理中均扮演着重要的角色。该文采用基于维基百科的显性语义分析方法计算汉语词语之间的语义相关度。基于中文维基百科,将词表示为带权重的概念向量,进而将词之间相关度的计算转化为相应的概念向量的比较。进一步,引入页面的先验概率,利用维基百科页面之间的链接信息对概念向量各分量的值进行修正。实验结果表明,使用该方法计算汉语语义相关度,与人工标注标准的斯皮尔曼等级相关系数可以达到0.52,显著改善了相关度计算的结果。

关 键 词:语义相关度  显性语义分析  中文维基百科  先验概率  概念向量  

Computing Lexical Semantic Relatedness with Chinese Wikipedia
WAN Fuqiang,WU Yunfang.Computing Lexical Semantic Relatedness with Chinese Wikipedia[J].Journal of Chinese Information Processing,2013,27(6):31-38.
Authors:WAN Fuqiang  WU Yunfang
Affiliation:Key Laboratory of Computational Linguistics (Peking University), Ministry of Education, Beijing 100871, China
Abstract:Lexical semantic relatedness plays an important role in natural language processing, such as information retrieval, word sense disambiguation and automatic text summarization and spelling correction, etc. In this paper, we employ Wikipedia-based Explicit Semantic Analysis to compute semantic relatedness between Chinese words. Based on Chinese Wikipedia, a word is represented as weighted vectors of concepts. Then,computing the semantic relatedness of words amounts to comparing the corresponding concept vectors. Furthermore, weadd the priori probability factor of concept and use the linking information among the Wikipedia pages to optimize the concept vectors. The experimental results show that the Spearmans rank correlation coefficient between the computed relatedness and human judgments reaches 0.52, which significantly outperforms the baseline.
Key wordssemantic relatedness; explicit semantic analysis; Chinese Wikipedia;priori probability; concept vectors
Keywords:semantic relatedness  explicit semantic analysis  Chinese Wikipedia  priori probability  concept vectors  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号