首页 | 本学科首页   官方微博 | 高级检索  
     

基于搜索引擎的词汇语义相似度计算方法
引用本文:陈海燕.基于搜索引擎的词汇语义相似度计算方法[J].计算机科学,2015,42(1):261-267.
作者姓名:陈海燕
作者单位:华东政法大学计算机科学与技术系 上海201620
基金项目:本文受国家社会科学基金项目(06BFX051),上海高校选拔培养优秀青年教师科研专项基金(hzf05046)资助
摘    要:词汇语义相似度的计算在网页浏览和查询推荐等网络相关工作中起着重要的作用.传统的基于分类的方法不能处理持续出现的新词.由于网络数据中隐藏着大量的噪音和冗余,鲁棒性和准确性仍然是一个挑战,因此提出了一种基于搜索引擎的词汇语义相似度计算方法.语义片段和检索结果的页数被用来去除词汇语义相似度计算过程中的噪音和冗余.此外,还提出了一种方法来整合查询结果页数、语义片段和显示的搜索结果的数量,该方法不需要任何先验知识与本体.实验结果显示,所提出的方法在Rubenstein-Goodenough测试集的相关系数为0.851,优于现有的基于网络的词汇语义相似度计算方法,同时在搜索引擎的查询扩展任务中具有较为良好的应用效果.

关 键 词:语义相似度  信息检索  查询建议  网络检索

Measuring Semantic Similarity between Words Using Web Search Engines
CHEN Hai-yan.Measuring Semantic Similarity between Words Using Web Search Engines[J].Computer Science,2015,42(1):261-267.
Authors:CHEN Hai-yan
Affiliation:Department of Computer Science and Technology,East China University of Political Science and Law,Shanghai 201620,China
Abstract:Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and querysuggestion.Because taxonomy-based methods cannot deal with continually emerging words,recently Web-based methods have been proposed to solve this problem.Because of the noise and redundancy hidden in the Web data,robustness and accuracy are still challenges.We proposed a method integrating page counts and snippets returned by Web search engines.Then,the semantic snippets and the number of search results were used to remove noise and redundancy in the Web snippets.After that,a method integrating page counts,semantics snippets and the number of already displayed search results was proposed.The proposed method does not need any human annotated knowledge,and can be applied Web-related tasks easily.A correlation coefficient of 0.851 against Rubenstein-Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin.Moreover,the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods.
Keywords:Semantic similarity  Information retrieval  Query suggestion  Web search
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号