面向信息检索的近邻语言模型 Neighbourhood Language Model for Information Retrieval期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向信息检索的近邻语言模型

引用本文：	韩中元,李生,齐浩亮,杨沐昀. 面向信息检索的近邻语言模型[J]. 中文信息学报, 2011, 25(1): 66-71

作者姓名：	韩中元李生齐浩亮杨沐昀

作者单位：	1. 哈尔滨工业大学计算机科学与技术学院,黑龙江哈尔滨 150001; 2. 黑龙江工程学院计算机科学与技术系,黑龙江哈尔滨 150050

基金项目：	国家自然科学基金重点资助项目(60736044),国家自然科学基金面上资助项目(60873105); 黑龙江省教育厅科学技术研究项目(11541287); 哈尔滨市科技局青年创新人才项目(2009RFQXG213)

摘要：	面向信息检索的语言模型对单篇文档构建语言模型,存在较严重的数据稀疏问题。该文认为利用文档的近邻信息能够更合理地反映词在文档中的分布,有助于数据稀疏问题的解决,因此将文档的近邻信息加入语言模型的平滑算法中,提出近邻语言模型。该文在TREC评测的典型文档集美国能源署文件(DOE)和《华尔街日报》(WSJ)数据集上测试了在不同近邻选择来源上近邻语言模型的性能。实验结果表明,近邻语言模型对检索性能有一定的提升。
关键词：	信息检索语言模型近邻信息
Neighbourhood Language Model for Information Retrieval

HAN Zhongyuan,LI Sheng,QI Haoliang,YANG Muyun. Neighbourhood Language Model for Information Retrieval[J]. Journal of Chinese Information Processing, 2011, 25(1): 66-71

Authors:	HAN Zhongyuan LI Sheng QI Haoliang YANG Muyun

Affiliation:	1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China; 2.Department of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, Heilongjiang 150050, China

Abstract:	The data sparseness is a non-trivial issue for language model based information retrieval methods.The paper proposes a Neighbourhood Language Model to alleviate this issue by employing the neighbour information of a document as a smoothing to the word distribution.Tested on DOE and WSJ proportion of TREC data,the results show that the Neighbourhood Language Model can improve the information retrieval performance.

Keywords:	information retrieval language model neighbourhood information
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《中文信息学报》浏览原始摘要信息
	点击此处可从《中文信息学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏