Measuring semantic similarity between words by removing noise and redundancy in web snippets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Measuring semantic similarity between words by removing noise and redundancy in web snippets

Authors:	Zheng Xu Xiangfeng Luo Jie Yu Weimin Xu

Abstract:	Semantic similarity measures play important roles in many Web‐related tasks such as Web browsing and query suggestion. Because taxonomy‐based methods can not deal with continually emerging words, recently Web‐based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets (‘Web‐snippet’ includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web‐related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein–Goodenough benchmark dataset shows that the proposed method outperforms the existing Web‐based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright © 2011 John Wiley & Sons, Ltd.

Keywords:	semantic similarity information retrieval query suggestion Web search

设为首页 | 免责声明 | 关于勤云 | 加入收藏