首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Many feature transforms have been proposed for the problem of trajectory matching. These methods, which are often based on shape matching, tend to perform poorly for biological trajectories, such as cell motion, because similar biological behavior often results in dissimilar trajectory shape. Additionally, the criteria used for similarity may differ depending on the user's particular interest or the specific query behavior. We present a rank-based distance metric learning method that combines user input and a new set of biologically-motivated features for biological trajectory matching. We show that, with a small amount of user effort, this method outperforms existing trajectory methods. On an information retrieval task using real world data, our method outperforms recent, related methods by ~ 9%.  相似文献   

2.
The problem of “Learning to rank” is a popular research topic in Information Retrieval (IR) and machine learning communities. Some existing list-wise methods, such as AdaRank, directly use the IR measures as performance functions to quantify how well a ranking function can predict rankings. However, the IR measures only count for the document ranks, but do not consider how well the algorithm predicts the relevance scores of documents. These methods do not make best use of the available prior knowledge and may lead to suboptimal performance. Hence, we conduct research by combining both the document ranks and relevance scores. We propose a novel performance function that encodes the relevance scores. We also define performance functions by combining our proposed one with MAP or NDCG, respectively. The experimental results on the benchmark data collections show that our methods can significantly outperform the state-of-the-art AdaRank baselines.  相似文献   

3.
Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user’s queries. Essentially, a code search engine provides a ranking schema, which combines a set of ranking features to calculate the relevance between a query and candidate code examples. Consequently, the ranking schema places relevant code examples at the top of the result list. However, it is difficult to determine the configurations of the ranking schemas subjectively. In this paper, we propose a code example search approach that applies a machine learning technique to automatically train a ranking schema. We use the trained ranking schema to rank candidate code examples for new queries at run-time. We evaluate the ranking performance of our approach using a corpus of over 360,000 code snippets crawled from 586 open-source Android projects. The performance evaluation study shows that the learning-to-rank approach can effectively rank code examples, and outperform the existing ranking schemas by about 35.65 % and 48.42 % in terms of normalized discounted cumulative gain (NDCG) and expected reciprocal rank (ERR) measures respectively.  相似文献   

4.
Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe people's activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state‐of‐the‐art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph‐structure of the citation patterns for the community of experts and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combining all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state‐of‐the‐art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.  相似文献   

5.
贝叶斯网络的道德图是一种马尔可夫网络,是进行随机变量之间依赖关系分析、推理及预测的有力工具。基于道德图和贝叶斯网络间的密切联系,提出了一种基于贝叶斯网络理论进行道德图学习的方法。实验表明该方法能够显著提高道德图学习效率和可靠性,适合于多变量稀疏道德图学习。  相似文献   

6.
Users’ click-through data is a valuable source of information about the performance of Web search engines, but it is included in few datasets for learning to rank. In this paper, inspired by the click-through data model, a novel approach is proposed for extracting the implicit user feedback from evidence embedded in benchmarking datasets. This process outputs a set of new features, named click-through features. Generated click-through features are used in a layered multi-population genetic programming framework to find the best possible ranking functions. The layered multi-population genetic programming framework is fast and provides more extensive search capability compared to the traditional genetic programming approaches. The performance of the proposed ranking generation framework is investigated both in the presence and in the absence of explicit click-through data in the utilized benchmark datasets. The experimental results show that click-through features can be efficiently extracted in both cases but that more effective ranking functions result when click-through features are generated from benchmark datasets with explicit click-through data. In either case, the most noticeable ranking improvements are achieved at the tops of the provided ranked lists of results, which are highly targeted by the Web users.  相似文献   

7.
Graph data have been of common practice in many application domains. However, it is very difficult to deal with graphs due to their intrinsic complex structure. In this paper, we propose to apply Formal Concept Analysis (FCA) to learning from graph data. We use subgraphs appearing in each of graph data as its attributes and construct a lattice based on FCA to organize subgraph attributes which are too numerous. For statistical learning purpose, we propose a similarity measure based on the concept lattice, taking into account the lattice structure explicitly. We prove that, the upper part of the lattice can provide a reliable and feasible way to compute the similarity between graphs. We also show that the similarity measure is rich enough to include some other measures as subparts. We apply the measure to a transductive learning algorithm for graph classification to prove its efficiency and effectiveness in practice. The high accuracy and low running time results confirm empirically the merit of the similarity measure based on the lattice.  相似文献   

8.
Learning to rank is a supervised learning problem that aims to construct a ranking model for the given data. The most common application of learning to rank is to rank a set of documents against a query. In this work, we focus on point‐wise learning to rank, where the model learns the ranking values. Multivariate adaptive regression splines (MARS) and conic multivariate adaptive regression splines (CMARS) are supervised learning techniques that have been proven to provide successful results on various prediction problems. In this article, we investigate the effectiveness of MARS and CMARS for point‐wise learning to rank problem. The prediction performance is analyzed in comparison to three well‐known supervised learning methods, artificial neural network (ANN), support vector machine, and random forest for two datasets under a variety of metrics including accuracy, stability, and robustness. The experimental results show that MARS and ANN are effective methods for learning to rank problem and provide promising results.  相似文献   

9.
视频数据的急剧增加,给视频的浏览、存储、检索等应用带来一系列问题和挑战,视频摘要正是解决此类问题的一个有效途径。针对现有视频摘要算法基于约束和经验设置构造目标函数,并对帧集合进行打分带来的不确定和复杂度高等问题,提出一个基于排序学习的视频摘要生成方法。该方法把视频摘要的提取等价为视频帧对视频内容表示的相关度排序问题,利用训练集学习排序函数,使得排序靠前的是与视频相关度高的帧,用学到的排序函数对帧打分,根据分数高低选择关键帧作为视频摘要。另外,与现有方法相比,该方法是对帧而非帧集合打分,计算复杂度显著降低。通过在TVSum50数据集上测试,实验结果证实了该方法的有效性。  相似文献   

10.
This paper presents an efficient preference-based ranking algorithm running in two stages. In the first stage, the algorithm learns a preference function defined over pairs, as in a standard binary classification problem. In the second stage, it makes use of that preference function to produce an accurate ranking, thereby reducing the learning problem of ranking to binary classification. This reduction is based on the familiar QuickSort and guarantees an expected pairwise misranking loss of at most twice that of the binary classifier derived in the first stage. Furthermore, in the important special case of bipartite ranking, the factor of two in loss is reduced to one. This improved bound also applies to the regret achieved by our ranking and that of the binary classifier obtained. Our algorithm is randomized, but we prove a lower bound for any deterministic reduction of ranking to binary classification showing that randomization is necessary to achieve our guarantees. This, and a recent result by Balcan et al., who show a regret bound of two for a deterministic algorithm in the bipartite case, suggest a trade-off between achieving low regret and determinism in this context. Our reduction also admits an improved running time guarantee with respect to that deterministic algorithm. In particular, the number of calls to the preference function in the reduction is improved from Ω(n 2) to O(nlog?n). In addition, when the top k ranked elements only are required (k?n), as in many applications in information extraction or search engine design, the time complexity of our algorithm can be further reduced to O(klog?k+n). Our algorithm is thus practical for realistic applications where the number of points to rank exceeds several thousand.  相似文献   

11.

在推荐系统中,学习有效的高阶特征交互是提升点击率预测的关键. 现有的研究将低阶特征进行组合来学习高阶交叉特征表示,导致模型的时间复杂度随着特征维度的增加呈指数型增长;而基于深度神经网络的高阶特征交叉模型也无法很好地拟合低阶特征交叉,影响预测的准确率. 针对这些问题,提出了基于多粒度特征交叉剪枝的点击率预测模型FeatNet. 该模型首先在显式的特征粒度上,通过特征剪枝生成有效的特征集合,保持了不同特征组合的多样性,也降低了高阶特征交叉的复杂度;基于剪枝后的特征集合,在特征元素粒度上进一步进行隐式高阶特征交叉,通过滤波器自动过滤无效的特征交叉. 在2个真实的数据集上进行了大量的实验,FeatNet都取得了最优的点击率预测效果.

  相似文献   

12.
将排序学习的方法应用于构件检索的研究中,首先,采用刻面描述的方法对构件进行全面的描述,并通过word2vec模型和权重设定的方法对刻面描述的构件进行特征提取;然后,对构件特征进行潜在语义分析和余弦相似度计算,得到构件训练数据集;最后,通过使用构件训练数据集和构件数据集对经过改进的Plackett-Luce概率排序模型用最大似然估计方法训练模型参数,从而得到一种构件排序模型.将构件排序模型应用到构件检索中开发实现了一个构件检索方法,通过实验验证了此方法的有效性,其查全率、查准率和效率都优于传统的构件检索方法.  相似文献   

13.
Using background knowledge to rank itemsets   总被引:1,自引:1,他引:0  
Assessing the quality of discovered results is an important open problem in data mining. Such assessment is particularly vital when mining itemsets, since commonly many of the discovered patterns can be easily explained by background knowledge. The simplest approach to screen uninteresting patterns is to compare the observed frequency against the independence model. Since the parameters for the independence model are the column margins, we can view such screening as a way of using the column margins as background knowledge. In this paper we study techniques for more flexible approaches for infusing background knowledge. Namely, we show that we can efficiently use additional knowledge such as row margins, lazarus counts, and bounds of ones. We demonstrate that these statistics describe forms of data that occur in practice and have been studied in data mining. To infuse the information efficiently we use a maximum entropy approach. In its general setting, solving a maximum entropy model is infeasible, but we demonstrate that for our setting it can be solved in polynomial time. Experiments show that more sophisticated models fit the data better and that using more information improves the frequency prediction of itemsets.  相似文献   

14.
The Eulerian Editing problem asks, given a graph G and an integer k, whether G can be modified into an Eulerian graph using at most k edge additions and edge deletions. We show that this problem is polynomial-time solvable for both undirected and directed graphs. We generalize these results for problems with degree parity constraints and degree balance constraints, respectively. We also consider the variants where vertex deletions are permitted. Combined with known results, this leads to full complexity classifications for both undirected and directed graphs and for every subset of the three graph operations.  相似文献   

15.
在实际的人脸识别中,给定的训练图像往往存在遮挡和噪声,导致稀疏表示分类(SRC)算法的性能下降。针对上述问题,提出一种基于结构化低秩表示(SLR)和低秩投影的人脸识别方法--SLR_LRP。首先通过SLR对原始训练样本进行低秩分解得到干净的训练样本,根据原始训练样本和恢复得到的干净训练样本得到一个低秩投影矩阵;然后将测试样本投影到该低秩投影矩阵;最后使用SRC对恢复后的测试样本进行分类。在AR人脸库和Extended Yale B人脸库上的实验结果表明,SLR_LRP可以有效处理样本中存在的遮挡和像素破坏。  相似文献   

16.
Learning to rank, a task to learn ranking functions to sort a set of entities using machine learning techniques, has recently attracted much interest in information retrieval and machine learning research. However, most of the existing work conducts a supervised learning fashion. In this paper, we propose a transductive method which extracts paired preference information from the unlabeled test data. Then we design a loss function to incorporate this preference data with the labeled training data, and learn ranking functions by optimizing the loss function via a derived Ranking SVM framework. The experimental results on the LETOR 2.0 benchmark data collections show that our transductive method can significantly outperform the state-of-the-art supervised baseline.  相似文献   

17.
排序是信息检索中一个重要的环节,当今已经提出百余种用于构建排序函数的特征,如何利用这些特征构建更有效的排序函数成为当今的一个热点问题,因此排序学习(Learning to Rank),一个信息检索与机器学习的交叉学科,越来越受到人们的重视。从排序特征的构建方式易知,特征之间并不是完全独立的,然而现有的排序学习方法的研究,很少在特征分析的基础上,从特征重组与选择的角度,来构建更有效的排序函数。针对这一问题,提出如下的模型框架:对构建排序函数的特征集合进行分析,然后重组与选择,利用排序学习方法学习排序函数。基于这一框架,提出四种特征处理的算法:基于主成分分析的特征重组方法、基于MAP、前向选择和排序学习算法隐含的特征选择。实验结果显示,经过特征处理后,利用排序学习算法构建的排序函数,一般优于原始的排序函数。  相似文献   

18.
As social media and e-commerce on the Internet continue to grow, opinions have become one of the most important sources of information for users to base their future decisions on. Unfortunately, the large quantities of opinions make it difficult for an individual to comprehend and evaluate them all in a reasonable amount of time. The users have to read a large number of opinions of different entities before making any decision. Recently a new retrieval task in information retrieval known as Opinion-Based Entity Ranking (OpER) has emerged. OpER directly ranks relevant entities based on how well opinions on them are matched with a user's preferences that are given in the form of queries. With such a capability, users do not need to read a large number of opinions available for the entities. Previous research on OpER does not take into account the importance and subjectivity of query keywords in individual opinions of an entity. Entity relevance scores are computed primarily on the basis of occurrences of query keywords match, by assuming all opinions of an entity as a single field of text. Intuitively, entities that have positive judgments and strong relevance with query keywords should be ranked higher than those entities that have poor relevance and negative judgments. This paper outlines several ranking features and develops an intuitive framework for OpER in which entities are ranked according to how well individual opinions of entities are matched with the user's query keywords. As a useful ranking model may be constructed from many ranking features, we apply learning to rank approach based on genetic programming (GP) to combine features in order to develop an effective retrieval model for OpER task. The proposed approach is evaluated on two collections and is found to be significantly more effective than the standard OpER approach.  相似文献   

19.
Both the quality and quantity of training data have significant impact on the accuracy of rank functions in web search. With the global search needs, a commercial search engine is required to expand its well tailored service to small countries as well. Due to heterogeneous intrinsic of query intents and search results on different domains (i.e., for different languages and regions), it is difficult for a generic ranking function to satisfy all type of queries. Instead, each domain should use a specific well tailored ranking function. In order to train each ranking function for each domain with a scalable strategy, it is critical to leverage existing training data to enhance the ranking functions of those domains without sufficient training data. In this paper, we present a boosting framework for learning to rank in the multi-task learning context to attack this problem. In particular, we propose to learn non-parametric common structures adaptively from multiple tasks in a stage-wise way. An algorithm is developed to iteratively discover super-features that are effective for all the tasks. The estimation of the regression function for each task is then learned as linear combination of those super-features. We evaluate the accuracy of multi-task learning methods for web search ranking using data from multiple domains from a commercial search engine. Our results demonstrate that multi-task learning methods bring significant relevance improvements over existing baseline method.  相似文献   

20.
Gait is a useful biometric because it can operate from a distance and without subject cooperation. However, it is affected by changes in covariate conditions (carrying, clothing, view angle, etc.). Existing methods suffer from lack of training samples, can only cope with changes in a subset of conditions with limited success, and implicitly assume subject cooperation. We propose a novel approach which casts gait recognition as a bipartite ranking problem and leverages training samples from different people and even from different datasets. By exploiting learning to rank, the problem of model over-fitting caused by under-sampled training data is effectively addressed. This makes our approach suitable under a genuine uncooperative setting and robust against changes in any covariate conditions. Extensive experiments demonstrate that our approach drastically outperforms existing methods, achieving up to 14-fold increase in recognition rate under the most difficult uncooperative settings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号