首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
在数据挖掘的许多实际应用中,在进行准确分类(classification)的同时,按照分类的可能性大小进行排序(ranking)日益显得重要。许多分类算法在设计时只考虑分类的准确性,未考虑对分类的可能性进行度量,因而无法用于排序(ranking)任务。本文提出了一种新的基于遗传算法的数据挖掘方法,在产生分类规则的同时,对分类的可能性进行度量。实验证明该算法是可行的。  相似文献   

2.
Ranking is a core problem for information retrieval since the performance of the search system is directly impacted by the accuracy of ranking results. Ranking model construction has been the focus of both the fields of information retrieval and machine learning, and learning to rank in particular has attracted much interest. Many ranking models have been proposed, for example, RankSVM is a state‐of‐the‐art method for learning to rank and has been empirically demonstrated to be effective. However, most of the proposed methods do not consider about the significant differences between queries, only resort to a single function in ranking. In this paper, we present a novel ranking model named QoRank, which performs the learning task dependent on queries. We also propose a LSE (least‐squares estimation) ‐based weighted method to aggregate the ranking lists produced by base decision functions as the final ranking. Comparison of QoRank with other ranking techniques is conducted, and several evaluation criteria are employed to evaluate its performance. Experimental results on the LETOR OHSUMED data set show that QoRank strikes a good balance of accuracy and complexity, and outperforms the baseline methods. © 2010 Wiley Periodicals, Inc.  相似文献   

3.
Due to the economic significance of bankruptcy prediction of companies for financial institutions, investors and governments, many quantitative methods have been used to develop effective prediction models. Support vector machine (SVM), a powerful classification method, has been used for this task; however, the performance of SVM is sensitive to model form, parameter setting and features selection. In this study, a new approach based on direct search and features ranking technology is proposed to optimise features selection and parameter setting for 1-norm and least-squares SVM models for bankruptcy prediction. This approach is also compared to the SVM models with parameter optimisation and features selection by the popular genetic algorithm technique. The experimental results on a data set with 2010 instances show that the proposed models are good alternatives for bankruptcy prediction.  相似文献   

4.
Boosted ranking models: a unifying framework for ranking predictions   总被引:2,自引:2,他引:0  
Ranking is an important functionality in a diverse array of applications, including web search, similarity-based multimedia retrieval, nearest neighbor classification, and recommendation systems. In this paper, we propose a new method, called Boosted Ranking Model (BRM), for learning how to rank from training data. An important feature of the proposed method is that it is domain-independent and can thus be applied to a wide range of ranking domains. The main contribution of the new method is that it reduces the problem of learning how to rank to the much more simple, and well-studied problem of constructing an optimized binary classifier from simple, weak classifiers. Using that reduction, our method constructs an optimized ranking model using multiple simple, easy-to-define ranking models as building blocks. The new method is a unifying framework that includes, as special cases, specific methods that we have proposed in earlier publications for specific ranking applications, such as nearest neighbor retrieval and classification. In this paper, we reformulate those earlier methods as special cases of the proposed BRM method, and we also illustrate a novel application of BRM, on the problem of making movie recommendations to individual users.  相似文献   

5.
朱辉生  陈琳  倪艺洋  汪卫  施伯乐 《软件学报》2020,31(7):2169-2183
事件序列中蕴藏的频繁情节刻画了用户或系统的行为规律.现有的频繁情节挖掘算法在各自支持度定义下具有较好的挖掘效果,但在支持度定义发生变化时却很难甚至无法直接挖掘频繁情节.针对用户多变的支持度定义需求,提出了一种频繁情节挖掘算法FEM-DFS(frequent episode mining-depth first search).该算法通过单遍扫描事件序列,以深度优先搜索方式来发现频繁情节,以共享前/后缀树来存储频繁情节,以单调性、前缀单调性或后缀单调性来压缩频繁情节的搜索空间.实验评估证实了所提出算法的有效性.  相似文献   

6.
Lin HT  Li L 《Neural computation》2012,24(5):1329-1367
We present a reduction framework from ordinal ranking to binary classification. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranker from the binary classifier. Based on the framework, we show that a weighted 0/1 loss of the binary classifier upper-bounds the mislabeling cost of the ranker, both error-wise and regret-wise. Our framework allows not only the design of good ordinal ranking algorithms based on well-tuned binary classification approaches, but also the derivation of new generalization bounds for ordinal ranking from known bounds for binary classification. In addition, our framework unifies many existing ordinal ranking algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms. In addition, the newly designed algorithms lead to better cost-sensitive ordinal ranking performance, as well as improved listwise ranking performance.  相似文献   

7.
排序问题在信息检索领域是一个非常重要的课题。虽然排序学习模型的算法早已被深入研究,但针对排序学习算法中的特征选择的研究却很少。现实的情况是,许多用于分类的特征选择方法被直接应用到排序学习中。但由于排序和分类有着显著的差异,应研究出针对排序的特征选择算法。文中在介绍常用的排序学习的特征选择方法的基础上,提出了一种全新的、适用于QA问题的排序学习的特征选择方法一锦标赛排序特征选择方法。实验结果显示,这种新的特征选择方法在提高特征提取效率和降低特征向量维数方面都有显著改善。  相似文献   

8.
In dimensional affect recognition, the machine learning methods, which are used to model and predict affect, are mostly classification and regression. However, the annotation in the dimensional affect space usually takes the form of a continuous real value which has an ordinal property. The aforementioned methods do not focus on taking advantage of this important information. Therefore, we propose an affective rating ranking framework for affect recognition based on face images in the valence and arousal dimensional space. Our approach can appropriately use the ordinal information among affective ratings which are generated by discretizing continuous annotations. Specifically, we first train a series of basic cost-sensitive binary classifiers, each of which uses all samples relabeled according to the comparison results between corresponding ratings and a given rank of a binary classifier. We obtain the final affective ratings by aggregating the outputs of binary classifiers. By comparing the experimental results with the baseline and deep learning based classification and regression methods on the benchmarking database of the AVEC 2015 Challenge and the selected subset of SEMAINE database, we find that our ordinal ranking method is effective in both arousal and valence dimensions.  相似文献   

9.
Regression problems try estimating a continuous variable from a number of characteristics or predictors. Several proposals have been made for regression models based on the use of fuzzy rules; however, all these proposals make use of rule models in which the irrelevance of the input variables in relation to the variable to be approximated is not taken into account. Regression problems share with the ordinal classification the existence of an explicit relationship of order between the values of the variable to be predicted. In a recent paper, the authors have proposed an ordinal classification algorithm that takes into account the detection of the irrelevance of input variables. This algorithm extracts a set of fuzzy rules from an example set, using as the basic model a sequential covering strategy along with a genetic algorithm. In this paper, a proposal for a regression algorithm based on this ordinal classification algorithm is presented. The proposed model can be interpreted as a multiclassifier and multilevel system that learns at each stage using the knowledge gained in previous stages. Due to similarities between regression and ordinal problems as well as the use of a set of ordinal algorithms, an error interval can be returned with the regression output value. Experimental results show the good behavior of the proposal as well as the results of the error interval.  相似文献   

10.
Ranking alternatives involving inconsistent preferences is one of the most important topics in decision-making. Determining how to assist decision makers in understanding the decision context and adjusting inconsistencies in judgment are two important issues in ranking alternatives. This study proposes a visualization approach which will assist decision makers in ranking alternatives involving inconsistent preferences. Gower Plots are adopted to detect alternatives involving inconsistencies. An adjusting model is developed to provide suggestions for simultaneously improving ordinal and cardinal inconsistencies. A Decision Ball model is applied to visualize the decision context. By a graphical and interactive interface, decision makers can iteratively detect inconsistencies, choose the preferred way to adjust inconsistencies, observe relationships among alternatives, and then rank alternatives.  相似文献   

11.
Algorithms for feature selection in predictive data mining for classification problems attempt to select those features that are relevant, and are not redundant for the classification task. A relevant feature is defined as one which is highly correlated with the target function. One problem with the definition of feature relevance is that there is no universally accepted definition of what it means for a feature to be ‘highly correlated with the target function or highly correlated with the other features’. A new feature selection algorithm which incorporates domain specific definitions of high, medium and low correlations is proposed in this paper. The proposed algorithm conducts a heuristic search for the most relevant features for the prediction task.  相似文献   

12.
Ranking hypothesis sets is a powerful concept for efficient object detection. In this work, we propose a branch&rank scheme that detects objects with often less than 100 ranking operations. This efficiency enables the use of strong and also costly classifiers like non-linear SVMs with RBF- $\chi ^2$ kernels. We thereby relieve an inherent limitation of branch&bound methods as bounds are often not tight enough to be effective in practice. Our approach features three key components: a ranking function that operates on sets of hypotheses and a grouping of these into different tasks. Detection efficiency results from adaptively sub-dividing the object search space into decreasingly smaller sets. This is inherited from branch&bound, while the ranking function supersedes a tight bound which is often unavailable (except for rather limited function classes). The grouping makes the system effective: it separates image classification from object recognition, yet combines them in a single formulation, phrased as a structured SVM problem. A novel aspect of branch&rank is that a better ranking function is expected to decrease the number of classifier calls during detection. We use the VOC’07 dataset to demonstrate the algorithmic properties of branch&rank.  相似文献   

13.
为了解决传统哈希算法在图像近邻检索任务中的模糊排序问题,提出了模糊序列感知哈希,旨在学习满足首位区分规则的哈希函数,其可直接利用二值编码本身信息区分模糊序列,从而在近邻检索中无需额外计算比特位权值和加权汉明距离,能以较小的代价区分与查询样本具有相同汉明距离的数据点之间的序列。建立了类似于近邻检索性能评价指标平均准确率的目标函数,其属于序列保持约束条件,能够保证数据点对在汉明空间与欧式空间内具有相同的相对相似性,可确保所提算法适应于近邻检索任务。在训练过程中,对二值编码、汉明距离以及判断函数进行了连续化松弛处理,从而可直接采用批量梯度下降算法优化目标函数,降低了训练复杂度。在三种图像数据集上的对比实验证明,模糊序列感知哈希的近邻检索性能较优。  相似文献   

14.
Learning to rank is a supervised learning problem that aims to construct a ranking model for the given data. The most common application of learning to rank is to rank a set of documents against a query. In this work, we focus on point‐wise learning to rank, where the model learns the ranking values. Multivariate adaptive regression splines (MARS) and conic multivariate adaptive regression splines (CMARS) are supervised learning techniques that have been proven to provide successful results on various prediction problems. In this article, we investigate the effectiveness of MARS and CMARS for point‐wise learning to rank problem. The prediction performance is analyzed in comparison to three well‐known supervised learning methods, artificial neural network (ANN), support vector machine, and random forest for two datasets under a variety of metrics including accuracy, stability, and robustness. The experimental results show that MARS and ANN are effective methods for learning to rank problem and provide promising results.  相似文献   

15.

核化一类硬划分SVDD、一/二类L2-SVM、L2 支持向量回归和Ranking SVM均已被证明是中心约束最小包含球. 这里将多视角学习引入核化L2-SVM, 提出核化两类多视角L2-SVM (Multi-view L2-SVM), 并证明该核化两类Multi-view L2-SVM 亦为中心约束最小包含球, 进而提出一种多视角核心向量机MvCVM. 所提出的Multi-view L2-SVM 和MvCVM既考虑了视角之间的差异性, 又考虑了视角之间的关联性, 使得分类器在各个视角上的学习结果趋于一致. 人造多视角数据集和真实多视角数据集的实验均表明了Multi-view L2-SVM 和MvCVM方法的有效性.

  相似文献   

16.
传统排序算法将排序问题转换成分类或回归问题来求解,这样得到的模型不够精确。对此提出一种新的排序算法,该算法把排序问题看成一个结构化学习过程,即通过训练集来学习一个排序结构。算法首先定义了一个查询级的目标函数,针对算法约束条件太多,难以直接优化,提出使用割平面算法进行求解。对于算法中的“寻找最违约排列”子问题,将其变换成为一个简单的降序排列问题。基于基准数据集的实验表明,相比起传统的排序算法,所提算法更为有效。  相似文献   

17.
GARP, an acronym for Green-space Acquisition and Ranking Program, is a computer-assisted decision strategy (CADS) that can be the basis of an orderly and rational local government program of acquiring land for open space and recreation. Fifteen criteria are used to rank parcels. Numerical values are assigned according to how well the parcel conforms to each criterion and the importance of the criterion for “active” and for “passive” recreational use. Active use generally requires capital intensive development such as ball fields. Passive use is typically low intensity such as bridle and walking paths. The numerical values for each parcel are summed separately for active and for passive use; these scores reveal an ordinal ranking of the properties separately for active and for passive use.  相似文献   

18.
张扬  何丕廉  向伟  李沐 《软件学报》2008,19(3):557-564
提出一种基于判别模型的拼写校正方法.它针对已有拼写校正系统Aspell的输出进行重排序,使用判别模型Ranking SVM来改进其性能.将现今较为成熟的拼写校正技术(包括编辑距离、基于字母的n元语法、发音相似度和噪音信道模型)以特征的形式整合到该模型中来,显著地提高了基准系统Aspell的初始排序质量,同时性能也超过了一些商用系统(如Microsoft Word 2003)的拼写校正模块.此外,还提出了一种在搜索引擎查询日志链中自动抽取拼写校正训练对的方法.基于这种方法训练的模型获得了基于人工标注数据所得结果相近的性能,它们分别将基准系统的错误率降低了32.2%和32.6%.  相似文献   

19.
Graph representations of data are increasingly common. Such representations arise in a variety of applications, including computational biology, social network analysis, web applications, and many others. There has been much work in recent years on developing learning algorithms for such graph data; in particular, graph learning algorithms have been developed for both classification and regression on graphs. Here we consider graph learning problems in which the goal is not to predict labels of objects in a graph, but rather to rank the objects relative to one another; for example, one may want to rank genes in a biological network by relevance to a disease, or customers in a social network by their likelihood of being interested in a certain product. We develop algorithms for such problems of learning to rank on graphs. Our algorithms build on the graph regularization ideas developed in the context of other graph learning problems, and learn a ranking function in a reproducing kernel Hilbert space (RKHS) derived from the graph. This allows us to show attractive stability and generalization properties. Experiments on several graph ranking tasks in computational biology and in cheminformatics demonstrate the benefits of our framework.  相似文献   

20.
Many multi-class classification algorithms in statistics and machine learning typically combine several binary classifiers in order to construct an overall classifier. In the popular pairwise ensemble, one classifier is built for each pair of classes, resulting in pairwise bipartite rankings. In contrast, ordinal regression algorithms consider a single ranking function for several ordered classes. It is known in the literature that pairwise ensembles can be useful for ordinal regression. However, can single ranking models make a contribution to multi-class classification? The answer to this question should be affirmative, as supported by theoretical results presented in this article. We conduct a formal analysis of the consistency of pairwise bipartite rankings by uncovering the conditions under which they can be equivalently expressed in terms of a single ranking. Similar to the utility representability of pairwise preference relations, it turns out that transitivity plays a crucial role in the characterization of the ranking representability of pairwise bipartite rankings. To this end, we introduce the new concepts of strict ranking representability, a restrictive condition that can be verified easily, and AUC ranking representability, a practically more useful condition that is more difficult to verify. However, the link between pairwise bipartite rankings and dice games allows us to formulate necessary transitivity conditions for AUC ranking representability. A sufficient condition on the other hand is obtained by introducing a new type of transitivity that can be verified by solving an integer quadratic program.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号