首页 | 本学科首页   官方微博 | 高级检索  
     

面向微博搜索的时间敏感的排序学习方法
引用本文:王书鑫,卫冰洁,鲁 骁,王 斌. 面向微博搜索的时间敏感的排序学习方法[J]. 中文信息学报, 2015, 29(4): 175-182
作者姓名:王书鑫  卫冰洁  鲁 骁  王 斌
作者单位:1. 中国科学院大学 中国科学院计算技术研究所,北京 100190
2. 国家计算机网络应急技术处理协调中心,北京 100029;
3. 中国科学院信息工程研究所,北京 100093
基金项目:中国科学院先导专项课题(XDA06030200)
摘    要:近年来微博检索已经成为信息检索领域的研究热点。相关的研究表明,微博检索具有时间敏感性。已有工作根据不同的时间敏感性假设,例如,时间越新文档越相关,或者时间越接近热点时刻文档越相关,得到多种不同的检索模型,都在一定程度上提高了检索效果。但是这些假设主要来自于观察,是一种直观简化的假设,仅能从某个方面反映时间因素影响微博排序的规律。该文验证了微博检索具有复杂的时间敏感特性,直观的简化假设并不能准确地描述这种特性。在此基础上提出了一个利用微博的时间特征和文本特征,通过机器学习的方式来构建一个针对时间敏感的微博检索的排序学习模型(TLTR)。在时间特征上,考察了查询相关的全局时间特征以及查询-文档对的局部时间特征。在TREC Microblog Track 20112012数据集上的实验结果表明,TLTR模型优于现有的其他时间敏感的微博排序方法。

关 键 词:时间敏感  排序学习  微博搜索  

Temporal Sensitive Learning to Rank Method for Microblog Search
WANG Shuxin,WEI Bingjie,LU Xiao,WANG Bin. Temporal Sensitive Learning to Rank Method for Microblog Search[J]. Journal of Chinese Information Processing, 2015, 29(4): 175-182
Authors:WANG Shuxin  WEI Bingjie  LU Xiao  WANG Bin
Affiliation:1. University of Chinese Academy of Sciences, Institute of Computing Technology, CAS, Beijing 100190, China;
2. National Computer Network Emergency Response Technical Team/Coordination Center, Beijing 100029, China;
3. Institute of Information Engineering, CAS, Beijing 100093, China
Abstract:Microblog search has become a hot research problem in information retrieval area in recent years. Related work shows that most queries in microblog search are time-sensitive. To address this problem, many existing methods were proposed based on different time-sensitive assumptions, such as, “the newer of a document, the more important it is” or “the closer to the peak point a document is, the more important it is”. All these methods have improved retrieval effectiveness somehow. However, it is hard to summarize the temporal role in ranking of microblog search to one straight forward assumption as above. In this paper, our study on temporal distributions of relevant documents of different queries shows the complexity of temporal role in ranking; therefore, simple straight forward assumptions are not accurate. We proposed to use the temporal and entity evidences of query-document pairs to train a time-sensitive learning to rank model to tackle this problem. As for temporal features, both global features of query and local features of query-documents pair are extracted. Experimental results show that TLTR significantly improves the retrieval effectiveness over existing time aware ranking models on TREC Microblog Track 2011—2012 data set.
Keywords:time-sensitive   learning to rank   microblog search  
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号