首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
多查询相关的排序支持向量机融合算法   总被引:3,自引:1,他引:2  
排序学习是目前信息检索与机器学习领域研究的热点问题.现有排序学习算法在学习时把训练样本集中的所有查询及其相关文档等同对待,忽视了查询之间的差异,影响了排序模型的性能.对查询之间的差异进行描述,并在训练过程中考虑这种差异,提出一种基于有监督学习的融合多个与查询相关排序子模型的方法.该方法为每一个查询及其相关文档建立一个子排序模型,并将子排序模型的输出进行向量化表示,将多个查询相关的排序模型转化为体现查询差异的特征数据,实现多排序模型的集成.以排序支持向量机为例,在查询级和样本级建立新的损失函数作为优化目标,并利用此损失函数调节不同查询产生损失之间的权重,提出多查询相关的排序支持向量机融合算法.在文档检索和网页检索中的实验结果表明,使用多查询相关的排序支持向量机融合算法可以取得比传统排序学习模型更好的性能.  相似文献   

2.
在许多信息检索任务中,为了进一步提高检索性能,通常需要对检索到的文档进行重新排序,目前的排序学习方法主要集中在损失函数的构造上,而没有考虑特征之间的关系。该文将多通道深度卷积神经网络作用于文档列表排序学习方法,即ListCNN,实现了信息检索的精确重排序。由于从文档中提取的多个特征中有一些特征具有局部相关性和冗余性,因此,文中使用卷积神经网络来重新提取特征,以提高列表方法的性能。ListCNN架构考虑了原始文档特征的局部相关性,能够有效地重新提取代表性特征。在公共数据集LETOR 4.0上对ListCNN进行实验验证,结果表明其性能优于已有文档列表方法。  相似文献   

3.
高效检索是数字图书馆的核心业务之一,其中排序是高效信息检索的核心问题。给定一系列的书目列表,利用排序模型生成目标书目的排序列表。将学习排序算法应用于信息检索领域时,常用方法是通过最小化pairwise损失函数值来优化排序模型。然而,已有结论表明,pairwise损失值最小化不一定能得到listwise算法的最佳排序性能。并且将在线学习排序算法与listwise算法相结合也非常困难。提出了一种基于listwise的在线学习排序算法,旨在保证listwise算法性能优势的前提下,实现在线学习排序算法,从而降低检索复杂度。首先解决将在线学习排序算法与listwise算法相结合的问题;然后通过最小化基于预测列表和真实列表定义的损失函数来优化排序模型;最后提出基于online-listwise算法的自适应学习率。实验结果表明,所提出算法具有较好的检索性能和检索速度。  相似文献   

4.
代价敏感的排序支持向量机将样本的排序问题转换为样本对的分类问题,以适应Web信息检索.然而急剧膨胀的训练样本对使得学习时间过长.为此,文中提出一种支持二次误差的代价敏感的平滑型排序支持向量机(cs-sRSVM),用分段多项式光滑函数近似铰链损失函数,将优化目标转变为无约束问题.再由Newton-YUAN算法求无约束问题的唯一最优解.在排序学习公开数据集LETOR的实验表明,cs-sRSVM与已有的代价敏感排序算法相比,训练时间更短,而检索性能同样出色.  相似文献   

5.
胡小生  钟勇 《计算机应用》2012,32(12):3331-3334
当前排序学习算法在学习时将样本集中的所有查询及其相关文档等同对待,忽略了查询之间以及其相关文档之间的差异性,影响了排序模型的性能。对查询之间的差异进行分析,同时考虑文档排序位置造成的资料被检视概率不同的差异特性,提出了一种两层加权融合的排序方法。该方法为每一个查询及其相关文档建立一个子排序模型,在此过程中,对文档赋予非对称权重,然后通过建立新的损失函数作为优化目标,利用损失函数调节不同查询产生损失之间的权重,最终实现多查询相关排序模型的加权融合。在标准数据集LETOR OHSUMED上的实验结果表明,所提方法在排序性能上有较大提升。  相似文献   

6.
在排序学习方法中,通过直接优化信息检索评价指标来学习排序模型的方法,取得了很好的排序效果,但是其损失函数在利用所有排序位置信息以及融合多样性排序因素方面还有待提高。为此,提出基于强化学习的多样性文档排序算法。首先,将强化学习思想应用于文档排序问题,通过将排序行为建模为马尔可夫决策过程,在每一次迭代过程中利用所有排序位置的信息,不断为每个排序位置选择最优的文档。其次,在排序过程中结合多样性策略,依据相似度阈值,裁剪高度相似的文档,从而保证排序结果的多样性。最后,在公共数据集上的实验结果表明,提出的算法在保证排序准确性的同时,增强了排序结果的多样性。  相似文献   

7.
曹莹  苗启广  刘家辰  高琳 《软件学报》2013,24(11):2584-2596
AdaBoost 是一种重要的集成学习元算法,算法最核心的特性“Boosting”也是解决代价敏感学习问题的有效方法.然而,各种代价敏感Boosting 算法,如AdaCost、AdaC 系列算法、CSB 系列算法等采用启发式策略,向AdaBoost 算法的加权投票因子计算公式或权值调整策略中加入代价参数,迫使算法聚焦于高代价样本.然而,这些启发式策略没有经过理论分析的验证,对原算法的调整破坏了AdaBoost 算法最重要的Boosting 特性。AdaBoost算法收敛于贝叶斯决策,与之相比,这些代价敏感Boosting 并不能收敛到代价敏感的贝叶斯决策.针对这一问题,研究严格遵循Boosting 理论框架的代价敏感Boosting 算法.首先,对分类间隔的指数损失函数以及Logit 损失函数进行代价敏感改造,可以证明新的损失函数具有代价意义下的Fisher 一致性,在理想情况下,优化这些损失函数最终收敛到代价敏感贝叶斯决策;其次,在Boosting 框架下使用函数空间梯度下降方法优化新的损失函数得到算法AsyB以及AsyBL.二维高斯人工数据上的实验结果表明,与现有代价敏感Boosting 算法相比,AsyB 和AsyBL 算法能够有效逼近代价敏感贝叶斯决策;UCI 数据集上的测试结果也进一步验证了AsyB 以及AsyBL 算法能够生成有更低错分类代价的代价敏感分类器,并且错分类代价随迭代呈指数下降.  相似文献   

8.
检索结果的排序是信息检索领域中的一个重要问题,目前大多数检索模型都将排序学习问题归结为一个二值分类问题,但实验表明分类的准确性与检索的性能并没有直接的联系,一个分类学习算法可能获得很高的分类准性,但并不一定能够有很好的排序性能。本文在目前一些排序算法的基础上重点研究了基于边际(margin)的风险最小化排序学习框架,并对它的损失函数和计算复杂度进行了深入分析,最终得出一个高性能的排序学习算法,并在Corel图像集和TRECVID 2005-2007视频数据集上验证了该算法的有效性。  相似文献   

9.
近年来,推荐系统越来越受到人们的关注,按照应用场景主要分为评分预测和Top-K推荐。考虑到传统评分推荐系统和Top-K排序推荐系统只考虑用户和项目的二元评分信息,具有一定的局限性,因此扩展了一种基于列表排序学习的矩阵分解方法。一方面,充分考虑用户之间关注关系。首先通过用户之间的关注关系计算用户之间的信任度,接着通过用户之间的信任度在原始模型的损失函数中添加用户社交约束项,使相互信任的用户偏好向量尽可能接近。另一方面,计算项目所拥有标签的权重,并以此计算项目之间的标签相似度,再将项目的标签约束项添加至损失函数中。在真实Epinions和百度电影数据集中的实验结果表明,该方法的NDCG值和原始模型相比具有一定的提高,有效地提高了推荐准确率。  相似文献   

10.
排序是信息检索中一个重要的环节,当今已经提出百余种用于构建排序函数的特征,如何利用这些特征构建更有效的排序函数成为当今的一个热点问题,因此排序学习(Learning to Rank),一个信息检索与机器学习的交叉学科,越来越受到人们的重视。从排序特征的构建方式易知,特征之间并不是完全独立的,然而现有的排序学习方法的研究,很少在特征分析的基础上,从特征重组与选择的角度,来构建更有效的排序函数。针对这一问题,提出如下的模型框架:对构建排序函数的特征集合进行分析,然后重组与选择,利用排序学习方法学习排序函数。基于这一框架,提出四种特征处理的算法:基于主成分分析的特征重组方法、基于MAP、前向选择和排序学习算法隐含的特征选择。实验结果显示,经过特征处理后,利用排序学习算法构建的排序函数,一般优于原始的排序函数。  相似文献   

11.
Listwise approaches are an important class of learning to rank, which utilizes automatic learning techniques to discover useful information. Most previous research on listwise approaches has focused on optimizing ranking models using weights and has used imprecisely labeled training data; optimizing ranking models using features was largely ignored thus the continuous performance improvement of these approaches was hindered. To address the limitations of previous listwise work, we propose a quasi-KNN model to discover the ranking of features and employ rank addition rule to calculate the weight of combination. On the basis of this, we propose three listwise algorithms, FeatureRank, BLFeatureRank, and DiffRank. The experimental results show that our proposed algorithms can be applied to a strict ordered ranking training set and gain better performance than state-of-the-art listwise algorithms.  相似文献   

12.
Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user’s queries. Essentially, a code search engine provides a ranking schema, which combines a set of ranking features to calculate the relevance between a query and candidate code examples. Consequently, the ranking schema places relevant code examples at the top of the result list. However, it is difficult to determine the configurations of the ranking schemas subjectively. In this paper, we propose a code example search approach that applies a machine learning technique to automatically train a ranking schema. We use the trained ranking schema to rank candidate code examples for new queries at run-time. We evaluate the ranking performance of our approach using a corpus of over 360,000 code snippets crawled from 586 open-source Android projects. The performance evaluation study shows that the learning-to-rank approach can effectively rank code examples, and outperform the existing ranking schemas by about 35.65 % and 48.42 % in terms of normalized discounted cumulative gain (NDCG) and expected reciprocal rank (ERR) measures respectively.  相似文献   

13.
Lin HT  Li L 《Neural computation》2012,24(5):1329-1367
We present a reduction framework from ordinal ranking to binary classification. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranker from the binary classifier. Based on the framework, we show that a weighted 0/1 loss of the binary classifier upper-bounds the mislabeling cost of the ranker, both error-wise and regret-wise. Our framework allows not only the design of good ordinal ranking algorithms based on well-tuned binary classification approaches, but also the derivation of new generalization bounds for ordinal ranking from known bounds for binary classification. In addition, our framework unifies many existing ordinal ranking algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms. In addition, the newly designed algorithms lead to better cost-sensitive ordinal ranking performance, as well as improved listwise ranking performance.  相似文献   

14.
As information technologies advance and user-friendly interfaces develop, the interaction between humans and computers, information devices, and new consumer electronics is increasingly gaining attention. One example that most people can relate to is Apple’s innovation in human–computer interaction which has been used on many products such as iPod and iPhone. Siri, the intelligent personal assistant, is a typical application of machine-learning human–computer interaction.Algorithms in machine learning have been employed in many disciplines, including gesture recognition, speaker recognition, and product recommendation systems. While the existing learning algorithms compute and learn from a large quantity of data, this study proposes an improved learning to rank algorithm named MultiStageBoost. In addition to ranking data through multiple stages, the MultiStageBoost algorithm significantly improves the existing algorithms in two ways. Firstly, it classifies and filters data to small quantities and applies the Boosting algorithm to achieve faster ranking performance. Secondly, it enhances the original binary classification by using the reciprocal of fuzzily weighted membership as the ranking distance.The importance of data is revealed in their ranked positions. Usually data ranked in the front are given more attention than those ranked in the middle. For example, after ranking 10,000 pieces of data, the top 10, or at most 100, are the most important and relevant. Whether the data after the top ones are ranked precisely does not really matter. Due to this reason, this study has made improvement on the conventional methods of the pair-wise ranking approach. Not only are data classified and ranked binarily, they are also given different weights depending on whether they are concordant or discordant. Incorporating the concept of weighting into the ranking distance allows us to increase the precision of ranking. Results from experiments demonstrate that our proposed algorithm outperforms the conventional methods in three evaluation measures: P@n, MAP, and NDCG. MultiStageBoost was then applied to speech recognition. However, we do not aim to improve the technology of speech recognition, but simply hope to provide evidences that MultiStageBoost can be used in the classification and ranking in speech recognition. Experiments show that the recognition optimization procedures established by this study are able to increase the recognition rate to over 95% in the personal computing device and industrial personal computer. It is expected that in the future this voice management system will accurately and effectively identify speakers answering the voice response questionnaire and will successfully carry out the functions in the choice of answers, paying the way for the formation of a virtual customer service person.  相似文献   

15.
何海江  龙跃进 《计算机应用》2011,31(11):3108-3111
针对标记训练集不足的问题,提出了一种协同训练的多样本排序学习算法,从无标签数据挖掘隐含的排序信息。算法使用了两类多样本排序学习机,从当前已有的标记数据集分别构造两个不同的排序函数。相应地,每一个无标签查询都有两个不同的文档排列,由似然损失来计算这两个排列的相似性,为那些文档排列相似度低的查询贴上标签,使两个多样本排序学习机新增了训练数据。在排序学习公开数据集LETOR上的实验结果证实,协同训练的排序算法很有效。另外,还讨论了标注比例对算法的影响。  相似文献   

16.
This paper is concerned with supervised rank aggregation, which aims to improve the ranking performance by combining the outputs from multiple rankers. However, there are two main shortcomings in previous rank aggregation approaches. First, the learned weights for base rankers do not distinguish the differences among queries. This is suboptimal since queries vary significantly in terms of ranking. Besides, most current aggregation functions do not directly optimize the evaluation measures in ranking. In this paper, the differences among queries are taken into consideration, and a supervised rank aggregation function is proposed. This aggregation function is directly optimizing the evaluation measure NDCG, referred to as RankAgg.NDCG, We prove that RankAgg.NDCG can achieve better NDCG performance than the linear combination of the base rankers. Experimental results performed on benchmark datasets show our approach outperforms a number of baseline approaches.  相似文献   

17.
为了提高网页排序的准确性,提出一种基于ε-贪婪学习和用户点击行为的网页排序算法。首先,根据用户查询,通过轮盘赌策略向用户推荐相关网页列表;然后,根据用户点击网页的行为进行ε-贪婪学习,计算得到排序系统中的强化信号,通过奖励和惩罚机制为每个网页计算相关性程度值;最后,根据相关性程度对网页进行重新排序。随着用户反馈的信息越来越多,相关网页会排列在列表的最高等级上。实验结果表明,提出的算法能够准确地推荐出相关网页,在P@n、NDCG和MAP性能指标上都获得了较优的性能。  相似文献   

18.
组样本用于模型训练,为排序学习方法的构造提供一种新的思路.文中改进已有的组样本排序学习方法,构造组样本损失函数,用于排序学习模型的训练.基于似然损失函数,采用样本偏序权重损失函数和最优初始序列选择方法,构造基于神经网络的组排序学习方法,实验证明文中方法能够有效提高排序准确率.  相似文献   

19.
相似App推荐可以有效帮助用户发现其所感兴趣的App.与以往的相似性学习不同,相似App推荐场景主要面向的是排序问题.本文主要研究在排序场景下如何学习相似性函数.已有的工作仅关注绝对相似性或基于三元组的相似性.本文建模了列表式的相似性,并将三元组相似性与列表式相似性用统一的面向排序场景的相对相似性学习框架来描述,提出了基于列表的多核相似性学习算法SimListMKL.实验证明,该算法在真实的相似App推荐场景下性能优于已有的基于三元组相似性学习算法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号