首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于类别先验信息的问题检索语言模型
引用本文:吉宗诚,王 斌. 一种基于类别先验信息的问题检索语言模型[J]. 中文信息学报, 2014, 28(4): 98-103
作者姓名:吉宗诚  王 斌
作者单位:1.中国科学院 计算技术研究所,北京 100190;
2. 中国科学院大学,北京 100049
基金项目:国家自然科学基金资助项目(61070111);科学院先导资助项目(XDA06030200)
摘    要:社区问答系统已经积累了大量的以层次类别结构进行组织的问题答案对。为了能够重用这些非常宝贵的历史问题答案对资源,设计出一个非常有效的问题检索模型至关重要。在该文中,我们在语言模型建模的框架下提出了一种新的基于问题类别先验信息的方法来提高相似问题检索的性能。特别地,我们将叶子类别语言模型看作是Dirichlet超参来对一元语言模型的参数进行加权,从而提出了一种新的基于类别先验信息的语言模型。该方法具有严格的数学推导依据。在来源于Yahoo! Answers的真实的大量数据集上做了实验比较和分析,实验结果表明我们提出的方法比之前简单的线性插值的方法具有非常显著的性能提升。

关 键 词:社区问答  问题检索  类别  类别先验信息  语言模型  

A Language Model Based on Category Prior for Question Retrieval
JI Zongcheng,WANG Bin. A Language Model Based on Category Prior for Question Retrieval[J]. Journal of Chinese Information Processing, 2014, 28(4): 98-103
Authors:JI Zongcheng  WANG Bin
Affiliation:1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences,Beijing 100049 ,China
Abstract:Community Question Answering (CQA) services have been building up large archives of question-answer pairs, which are organized into a hierarchy of categories. To reuse the invaluable historical question-answer pairs, it is essential to develop effective Question Retrieval (QR) models. In this paper, we propose a novel approach based on category prior of questions within the language modeling framework for improving the QR performance. Specifically, a new Language Model based on category prior is proposed which views the Leaf Category Language Model as the Dirichlet hyper-parameter that weights the parameters of the unigram Language Model. The approach has solid mathematic foundation. Experiments conducted on a large scale real world CQA dataset from Yahoo! Answers show that our proposed method can significantly outperform the previous work which just combines the category information with the unigram Language Model linearly.
Keywords:Community Question Answering   Question Retrieval   category   category prior   language model  
本文献已被 CNKI 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号