首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 66 毫秒
1.
排序问题在信息检索领域是一个非常重要的课题。虽然排序学习模型的算法早已被深入研究,但针对排序学习算法中的特征选择的研究却很少。现实的情况是,许多用于分类的特征选择方法被直接应用到排序学习中。但由于排序和分类有着显著的差异,应研究出针对排序的特征选择算法。文中在介绍常用的排序学习的特征选择方法的基础上,提出了一种全新的、适用于QA问题的排序学习的特征选择方法一锦标赛排序特征选择方法。实验结果显示,这种新的特征选择方法在提高特征提取效率和降低特征向量维数方面都有显著改善。  相似文献   

2.
大型搜索系统对用户查询的快速响应尤为必要,同时在计算候选文档的特征相关性时,必须遵守严格的后端延迟约束。通过特征选择,提高了机器学习的效率。针对排序学习中快速特征选择的起点多为单一排序效果最好的特征的特点,首先提出了一种用层次聚类法生成特征选择起点的算法,并将该算法应用于已有的2种快速特征选择中。除此之外,还提出了一种充分利用聚类特征的新方法来处理特征选择。在2个标准数据集上的实验表明,该算法既可以在不影响精度的情况下获得较小的特征子集,也可以在中等子集上获得最佳的排序精度。  相似文献   

3.
排序学习算法作为信息检索与机器学习的一个交叉领域,越来越受到人们的重视。然而,几乎没有排序学习算法考虑到查询差异的存在。文中查询被建模为多元高斯分布,KL距离被用来度量查询之间的距离,利用谱聚类方法对查询进行聚类,为每个聚类类别训练一个排序函数。实验结果表明经过聚类得到的排序函数需要较少的训练样例,但是它的性能却和没有经过聚类得到的排序函数具有可比性,甚至优于后者。  相似文献   

4.
针对标签排序问题的特点,提出一种面向标签排序数据集的特征选择算法(Label Ranking Based Feature Selection, LRFS)。该算法首先基于邻域粗糙集定义了新的邻域信息测度,能直接度量连续型、离散型以及排序型特征间的相关性、冗余性和关联性。然后,在此基础上提出基于邻域关联权重因子的标签排序特征选择算法。实验结果表明,LRFS算法能够在不降低排序准确率的前提下,有效剔除标签排序数据集中的无关特征或冗余特征。  相似文献   

5.
排序学习(learning-to-rank,简称LTR)模型在信息检索领域取得了显著成果,而该模型的传统训练方法需要收集大规模文本数据.然而,随着数据隐私保护日渐受到人们重视,从多个数据拥有者(如企业)手中收集数据训练排序学习模型的方式变得不可行.各企业之间数据被迫独立存储,形成了数据孤岛.由于排序模型训练需要使用查询...  相似文献   

6.
近年来微博检索已经成为信息检索领域的研究热点。相关的研究表明,微博检索具有时间敏感性。已有工作根据不同的时间敏感性假设,例如,时间越新文档越相关,或者时间越接近热点时刻文档越相关,得到多种不同的检索模型,都在一定程度上提高了检索效果。但是这些假设主要来自于观察,是一种直观简化的假设,仅能从某个方面反映时间因素影响微博排序的规律。该文验证了微博检索具有复杂的时间敏感特性,直观的简化假设并不能准确地描述这种特性。在此基础上提出了一个利用微博的时间特征和文本特征,通过机器学习的方式来构建一个针对时间敏感的微博检索的排序学习模型(TLTR)。在时间特征上,考察了查询相关的全局时间特征以及查询-文档对的局部时间特征。在TREC Microblog Track 20112012数据集上的实验结果表明,TLTR模型优于现有的其他时间敏感的微博排序方法。  相似文献   

7.
文档排序一直是信息检索(IR)领域的关键任务之一。受益于马尔科夫决策过程强大的建模能力,以及强化学习方法强大的求解能力,近年来基于强化学习的排序模型被提出并取得了良好效果。然而,由于候选文档中会包含大量的不相关文档,导致基于“试错”的强化学习方法存在效率低下的问题。为解决上述问题,该文提出了一种基于模仿学习的排序学习算法IR-DAGGER,其基于文档标注信息构建专家策略,在保证文档排序精度的同时提高了算法的学习效率。为了测试IR-DAGGER的性能,该文基于面向相关性排序任务的OHSUMED数据集和面向多样化排序的TREC数据集进行了实验,实验结果表明IR-DAGGER在上述两个数据集上均提升了文档排序的精度和效率。  相似文献   

8.
图像搜索中重要的问题之一是如何有效地对搜索结果进行排序.现有图像搜索引擎的排序模型一般都基于相关文本而没有考虑图像的视觉特征.由于文本特征有时并不能很好地匹配图像的内容,所以搜索结果中会包含被错误排序的图像.针对该问题已经提出了视觉重排序方法,通过视觉信息来精炼基于文本的搜索结果.然而视觉重排序带来的性能提升有限,主要原因是基于文本的搜索结果中的错误会传播到视觉重排序阶段.本文基于排序学习的框架提出一个联合文本和视觉特征的图像排序学习模型,同时考虑了视觉和文本特征来进行排序学习,避免了视觉重排序中的错误传播.实验结果表明本文提出的排序模型显著地好于现有的重排序方法.  相似文献   

9.
特征选择是机器学习和数据挖掘领域的一个关键问题。而对于高维数据,通常会利用特定的评价准则,获取原始特征的权重并进行排序。而如何从排序后的特征集中选择较优子集,仍然值得探讨。文中提出了一种简单的特征排序后子集选取的过滤器方法,基本思想就是将指数熵与模糊特征评价指标相结合,利用类似顺序前向选择的搜索策略,通过寻找模糊特征评价指标的变化曲线拐点,作为搜索的终止条件。通过理论分析以及在合成和基准的现实数据集上的实验表明该方法具有较好的性能。  相似文献   

10.
杨潇  崔超然  王帅强 《计算机科学》2017,44(12):255-259
在排序学习中引入特征选择可以提高学习的效率和准确率。出于对选择速度的考虑,当前的研究主要从特征选择的角度出发,根据特征对排序的作用和特征之间的相似性选择对排序区分度最大的特征集合。由于特征大都是人工归纳的,因此特征和特征之间难免存在重叠和冗余。为了减少特征之间的冗余,从特征生成的角度出发,对现有特征进行矩阵分解,从而生成新的特征集。考虑到使用奇异值分解(Singular Value Decomposition SVD)等方法进行矩阵分解时不能综合考虑排序结果对特征的影响,基于特征矩阵对排序的效果、特征矩阵与原矩阵之间的差距来构造优化算法,提出了一种基于矩阵分解的排序学习优化方法,并根据该优化方法设计了排序学习特征选择算法MFRank。实验中使用映射随机梯度下降法近似求得优化问题的最优值,在公开测试集MQ2008上的结果显示,所提MFRank方法获得了与当前最优的特征选择方法即RankBoost和RankSVM-Struct等排序算法相当的结果。  相似文献   

11.
Both the quality and quantity of training data have significant impact on the accuracy of rank functions in web search. With the global search needs, a commercial search engine is required to expand its well tailored service to small countries as well. Due to heterogeneous intrinsic of query intents and search results on different domains (i.e., for different languages and regions), it is difficult for a generic ranking function to satisfy all type of queries. Instead, each domain should use a specific well tailored ranking function. In order to train each ranking function for each domain with a scalable strategy, it is critical to leverage existing training data to enhance the ranking functions of those domains without sufficient training data. In this paper, we present a boosting framework for learning to rank in the multi-task learning context to attack this problem. In particular, we propose to learn non-parametric common structures adaptively from multiple tasks in a stage-wise way. An algorithm is developed to iteratively discover super-features that are effective for all the tasks. The estimation of the regression function for each task is then learned as linear combination of those super-features. We evaluate the accuracy of multi-task learning methods for web search ranking using data from multiple domains from a commercial search engine. Our results demonstrate that multi-task learning methods bring significant relevance improvements over existing baseline method.  相似文献   

12.
Gait is a useful biometric because it can operate from a distance and without subject cooperation. However, it is affected by changes in covariate conditions (carrying, clothing, view angle, etc.). Existing methods suffer from lack of training samples, can only cope with changes in a subset of conditions with limited success, and implicitly assume subject cooperation. We propose a novel approach which casts gait recognition as a bipartite ranking problem and leverages training samples from different people and even from different datasets. By exploiting learning to rank, the problem of model over-fitting caused by under-sampled training data is effectively addressed. This makes our approach suitable under a genuine uncooperative setting and robust against changes in any covariate conditions. Extensive experiments demonstrate that our approach drastically outperforms existing methods, achieving up to 14-fold increase in recognition rate under the most difficult uncooperative settings.  相似文献   

13.
张乐园  李佳烨  李鹏清 《计算机应用》2018,38(12):3444-3449
针对高维的数据中往往存在非线性、低秩形式和属性冗余等问题,提出一种基于核函数的属性自表达无监督属性选择算法——低秩约束的非线性属性选择算法(LRNFS)。首先,将每一维的属性映射到高维的核空间上,通过核空间上的线性属性选择去实现低维空间上的非线性属性选择;然后,对自表达形式引入偏差项并对系数矩阵进行低秩与稀疏处理;最后,引入核矩阵的系数向量的稀疏正则化因子来实现属性选择。所提算法中用核矩阵来体现其非线性关系,低秩考虑数据的全局信息进行子空间学习,自表达形式确定属性的重要程度。实验结果表明,相比于基于重新调整的线性平方回归(RLSR)半监督特征选择算法,所提算法进行属性选择之后作分类的准确率提升了2.34%。所提算法解决了数据在低维特征空间上线性不可分的问题,提升了属性选择的准确率。  相似文献   

14.
As a crucial task in information retrieval, ranking defines the preferential order among the retrieved documents for a given query. Supervised learning has recently been dedicated to automatically learning ranking models by incorporating various models into one effective model. This paper proposes a novel supervised learning method, in which instances are represented as bags of contexts of features, instead of bags of features. The method applies rank-order correlations to measure the correlation relationships between features. The feature vectors of instances, i.e., the 1st-order raw feature vectors, are then mapped into the feature correlation space via projection to derive the context-level feature vectors, i.e., the 2nd-order context feature vectors. As for ranking model learning, Ranking SVM is employed with the 2nd-order context feature vectors as the input. The proposed method is evaluated using the LETOR benchmark datasets and is found to perform well with competitive results. The results suggest that the learning method benefits from the rank-order-correlation-based feature vector context transformation.  相似文献   

15.
将排序学习的方法应用于构件检索的研究中,首先,采用刻面描述的方法对构件进行全面的描述,并通过word2vec模型和权重设定的方法对刻面描述的构件进行特征提取;然后,对构件特征进行潜在语义分析和余弦相似度计算,得到构件训练数据集;最后,通过使用构件训练数据集和构件数据集对经过改进的Plackett-Luce概率排序模型用最大似然估计方法训练模型参数,从而得到一种构件排序模型.将构件排序模型应用到构件检索中开发实现了一个构件检索方法,通过实验验证了此方法的有效性,其查全率、查准率和效率都优于传统的构件检索方法.  相似文献   

16.
As social media and e-commerce on the Internet continue to grow, opinions have become one of the most important sources of information for users to base their future decisions on. Unfortunately, the large quantities of opinions make it difficult for an individual to comprehend and evaluate them all in a reasonable amount of time. The users have to read a large number of opinions of different entities before making any decision. Recently a new retrieval task in information retrieval known as Opinion-Based Entity Ranking (OpER) has emerged. OpER directly ranks relevant entities based on how well opinions on them are matched with a user's preferences that are given in the form of queries. With such a capability, users do not need to read a large number of opinions available for the entities. Previous research on OpER does not take into account the importance and subjectivity of query keywords in individual opinions of an entity. Entity relevance scores are computed primarily on the basis of occurrences of query keywords match, by assuming all opinions of an entity as a single field of text. Intuitively, entities that have positive judgments and strong relevance with query keywords should be ranked higher than those entities that have poor relevance and negative judgments. This paper outlines several ranking features and develops an intuitive framework for OpER in which entities are ranked according to how well individual opinions of entities are matched with the user's query keywords. As a useful ranking model may be constructed from many ranking features, we apply learning to rank approach based on genetic programming (GP) to combine features in order to develop an effective retrieval model for OpER task. The proposed approach is evaluated on two collections and is found to be significantly more effective than the standard OpER approach.  相似文献   

17.
李慧  李存华  王霞 《计算机工程》2010,36(13):37-39
为提高网页排名满意度,基于特征选择技术提出一种新的页面排名算法。该算法利用多特征选择技术对页面的特征子集进行筛选,寻找一组具有最大权值与最小相似性的特征集合。在通用信息检索数据集上进行特征词选择测试,结果表明该算法的性能优于传统排名 算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号