首页 | 本学科首页   官方微博 | 高级检索  
     

INFORMATION RETRIEVAL FOR SHORT DOCUMENTS
作者姓名:Qi  Haoliang  Li  Mu*  Gao  Jianfeng**  Li  Sheng
作者单位:Qi Haoliang Li Mu* Gao Jianfeng** Li Sheng (Ministry of Education - Microsoft Key Laboratory of Natural Language Processing and Speech (Harbin Institute of Technology),Harbin 150001,China) *(Microsoft Research Asia,Beijing 100080,China) **(Microsoft Research,Redmond,WA 98052,USA)
基金项目:Supported by the Funds of Heilongjiang Outstanding Young Teacher (1151G037).
摘    要:I. Introduction Most of current Information Retrieval (IR) sys-tems try to match terms of queries with terms of documents. One major problem of these approaches lies in that users want to retrieve documents accord-ing to content, while individual words provide unre-liable evidence about the content of the texts1?3]. When some parts of text in the document collection are missing, e.g. only the abstract is available, the word-use variability problem will have substantial impact on the IR per…

关 键 词:信息恢复  短文档  基准文档模型  信息论
收稿时间:17 March 2006

Information Retrieval for short documents
Qi Haoliang Li Mu* Gao Jianfeng** Li Sheng.INFORMATION RETRIEVAL FOR SHORT DOCUMENTS[J].Journal of Electronics,2006,23(6):933-936.
Authors:Haoliang Qi PhD  Mu Li  Jianfeng Gao  Sheng Li
Affiliation:1. Ministry of Education - Microsoft Key Laboratory of Natural Language Processing and Speech (Harbin Institute of Technology), Harbin 150001, China
2. Microsoft Research Asia, Beijing 100080, China
3. Microsoft Research, Redmond, WA 98052, USA
Abstract:The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the ab-stract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be any linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.
Keywords:Information retrieval  Short documents  Reference Document Model (RDM)
本文献已被 CNKI 万方数据 SpringerLink 等数据库收录!
点击此处可从《电子科学学刊(英文版)》浏览原始摘要信息
点击此处可从《电子科学学刊(英文版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号