首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
已有的关于k近邻测度学习算法的工作主要集中于纯区分模型。在假定隐含的生成模型已知的情况下,提出了一种通过分析样本的k个近邻点的概率密度学习测度的方法。实验表明,这种基于类的生成模型假设学习到的局部测度可以有效改善kNN区分模型的性能。  相似文献   

2.
This paper presents an algorithm which learns a distance metric from a data set by knowledge embedding and uses the new distance metric to solve nonlinear pattern recognition problems such a clustering.  相似文献   

3.
In many real world applications classification models are required to be in line with domain knowledge and to respect monotone relations between predictor variables and the target class, in order to be acceptable for implementation. This paper presents a novel heuristic approach, called RULEM, to induce monotone ordinal rule based classification models. The proposed approach can be applied in combination with any rule- or tree-based classification technique, since monotonicity is guaranteed in a post-processing step. RULEM checks whether a rule set or decision tree violates the imposed monotonicity constraints and existing violations are resolved by inducing a set of additional rules which enforce monotone classification. The approach is able to handle non-monotonic noise, and can be applied to both partially and totally monotone problems with an ordinal target variable. Two novel justifiability measures are introduced which are based on RULEM and allow to calculate the extent to which a classification model is in line with domain knowledge expressed in the form of monotonicity constraints. An extensive benchmarking experiment and subsequent statistical analysis of the results on 14 public data sets indicates that RULEM preserves the predictive power of a rule induction technique while guaranteeing monotone classification. On the other hand, the post-processed rule sets are found to be significantly larger which is due to the induction of additional rules. E.g., when combined with Ripper a median performance difference was observed in terms of PCC equal to zero and an average difference equal to −0.66%, with on average 5 rules added to the rule sets. The average and minimum justifiability of the original rule sets equal respectively 92.66% and 34.44% in terms of the RULEMF justifiability index, and 91.28% and 40.1% in terms of RULEMS, indicating the effective need for monotonizing the rule sets.  相似文献   

4.
Multi-label classification aims to assign a set of proper labels for each instance, where distance metric learning can help improve the generalization ability of instance-based multi-label classification models. Existing multi-label metric learning techniques work by utilizing pairwise constraints to enforce that examples with similar label assignments should have close distance in the embedded feature space. In this paper, a novel distance metric learning approach for multi-label classification is proposed by modeling structural interactions between instance space and label space. On one hand, compositional distance metric is employed which adopts the representation of a weighted sum of rank-1 PSD matrices based on component bases. On the other hand, compositional weights are optimized by exploiting triplet similarity constraints derived from both instance and label spaces. Due to the compositional nature of employed distance metric, the resulting problem admits quadratic programming formulation with linear optimization complexity w.r.t. the number of training examples.We also derive the generalization bound for the proposed approach based on algorithmic robustness analysis of the compositional metric. Extensive experiments on sixteen benchmark data sets clearly validate the usefulness of compositional metric in yielding effective distance metric for multi-label classification.  相似文献   

5.
The RELIEF algorithm is a popular approach for feature weighting. Many extensions of the RELIEF algorithm are developed, and I-RELIEF is one of the famous extensions. In this paper, I-RELIEF is generalized for supervised distance metric learning to yield a Mahananobis distance function. The proposed approach is justified by showing that the objective function of the generalized I-RELIEF is closely related to the expected leave-one-out nearest-neighbor classification rate. In addition, the relationships among the generalized I-RELIEF, the neighbourhood components analysis, and graph embedding are also pointed out. Experimental results on various data sets all demonstrate the superiority of the proposed approach.  相似文献   

6.
Photo clustering is an effective way to organize albums and it is useful in many applications, such as photo browsing and tagging. But automatic photo clustering is not an easy task due to the large variation of photo content. In this paper, we propose an interactive photo clustering paradigm that jointly explores human and computer. In this paradigm, the photo clustering task is semi-automatically accomplished: users are allowed to manually adjust clustering results with different operations, such as splitting clusters, merging clusters and moving photos from one cluster to another. Behind users’ operations, we have a learning engine that keeps updating the distance measurements between photos in an online way, such that better clustering can be performed based on the distance measure. Experimental results on multiple photo albums demonstrated that our approach is able to improve automatic photo clustering results, and by exploring distance metric learning, our method is much more effective than pure manual adjustments of photo clustering.  相似文献   

7.
基于余弦距离度量学习的伪K近邻文本分类算法   总被引:2,自引:0,他引:2  
距离度量学习在分类领域有着广泛的应用,将其应用到文本分类时,由于一般采用的向量空间模型(VSM)中的TF*IDF算法在对文本向量表达时向量均是维度相同并且归一化的,这就导致传统距离度量学习过程中采用的欧式距离作为相似度判别标准在文本分类领域往往无法取得预期的效果,在距离度量学习中的LMNN算法的启发下提出一种余弦距离度量学习算法,使其适应于文本分类领域,称之为CS-LMNN.考虑到文本分类领域中样本类偏斜情况比较普遍,提出采用一种伪K近邻分类算法与CS-LMNN结合实现文本分类,该算法首先利用CS-LMNN算法对训练数据进行距离度量学习,根据训练结果对测试数据使用伪K近邻分类算法进行分类,实验结果表明,该算法可以有效的提高分类精度.  相似文献   

8.
针对有限样本下,KNN算法距离量的选择以及以前距离量学习研究中没有充分考虑样本分布的情况,提出了一种新的基于概率的两层最近邻自适应度量算法(PTLNN)。该算法分为两层,在低层使用欧氏距离来确定一个未标记的样本局部子空间;在高层,用AdaBoost在子空间进行信息提取。以最小化平均绝对误差为原则,定义一个基于概率的自适应距离度量进行最近邻分类。该算法结合KNN与AdaBoost算法的优势,在有限样本下充分考虑样本分布能降低分类错误率,并且在噪声数据下有很好的稳定性,能降低AdaBoost过度拟合现象发生。通过与其他算法对比实验表明,PTLNN算法取得更好的结果。  相似文献   

9.
Distance metric is a key issue in many machine learning algorithms. This paper considers a general problem of learning from pairwise constraints in the form of must-links and cannot-links. As one kind of side information, a must-link indicates the pair of the two data points must be in a same class, while a cannot-link indicates that the two data points must be in two different classes. Given must-link and cannot-link information, our goal is to learn a Mahalanobis distance metric. Under this metric, we hope the distances of point pairs in must-links are as small as possible and those of point pairs in cannot-links are as large as possible. This task is formulated as a constrained optimization problem, in which the global optimum can be obtained effectively and efficiently. Finally, some applications in data clustering, interactive natural image segmentation and face pose estimation are given in this paper. Experimental results illustrate the effectiveness of our algorithm.  相似文献   

10.
11.
We address the problem of metric learning for multi-view data. Many metric learning algorithms have been proposed, most of them focus just on single view circumstances, and only a few deal with multi-view data. In this paper, motivated by the co-training framework, we propose an algorithm-independent framework, named co-metric, to learn Mahalanobis metrics in multi-view settings. In its implementation, an off-the-shelf single-view metric learning algorithm is used to learn metrics in individual views of a few labeled examples. Then the most confidently-labeled examples chosen from the unlabeled set are used to guide the metric learning in the next loop. This procedure is repeated until some stop criteria are met. The framework can accommodate most existing metric learning algorithms whether types-of-side-information or example-labels are used. In addition it can naturally deal with semi-supervised circumstances under more than two views. Our comparative experiments demonstrate its competiveness and effectiveness.  相似文献   

12.
Nearest neighbor (NN) classifier with dynamic time warping (DTW) is considered to be an effective method for time series classification. The performance of NN-DTW is dependent on the DTW constraints because the NN classifier is sensitive to the used distance function. For time series classification, the global path constraint of DTW is learned for optimization of the alignment of time series by maximizing the nearest neighbor hypothesis margin. In addition, a reduction technique is combined with a search process to condense the prototypes. The approach is implemented and tested on UCR datasets. Experimental results show the effectiveness of the proposed method.  相似文献   

13.
摘 要:随着数据量的增加,特征选择已经成为机器学习和数据挖掘领域的热点。提出一种基于最近最远邻的特征选择算法。一个数据点和其最近的邻点属于同一集群,和最远的邻点属于不同的集群,通过计算最近最远邻的特征距离可以得到一种判断特征重要性的指标。在此基础上运用了互信息方法去除了特征之间的冗余。同时引入了Gradient Boosting方法进行模型参数调优,提高了分类准确性。在UCI数据集上进行分类预测,结果表明该算法能够找到较优的特征子集,分类准确性有一定提升。  相似文献   

14.
张良  罗祎敏  马洪超  张帆  胡川 《计算机应用》2017,37(6):1768-1771
针对高光谱遥感影像分类中,传统的主动学习算法仅利用已标签数据训练样本,大量未标签数据被忽视的问题,提出一种结合未标签信息的主动学习算法。首先,通过K近邻一致性原则、前后预测一致性原则和主动学习算法信息量评估3重筛选得到预测标签可信度高并具备一定信息量的未标签样本;然后,将其预测标签当作真实标签加入到标签样本集中;最后,训练得到更优质的分类模型。实验结果表明,与被动学习算法和传统的主动学习算法相比,所提算法能够在同等标记的代价下获得更高的分类精度,同时具有更好的参数敏感性。  相似文献   

15.
许明英  尉永清  赵静 《计算机应用》2011,31(9):2530-2533
贝叶斯分类器形成初期,训练集不完备,生成的分类器性能不理想且不能动态跟踪用户需求。针对此缺陷,提出一种结合反馈信息的贝叶斯分类增量学习方法。为有效降低特征间的冗余性,提高反馈特征子集的代表能力,用一种基于遗传算法的改进特征选择方法选取反馈集中最优特征子集修正分类器。通过实验分析了算法的性能,结果证明该算法能明显优化分类效果,且整体稳定性较好。  相似文献   

16.
Separating text lines in unconstrained handwritten documents remains a challenge because the handwritten text lines are often un-uniformly skewed and curved, and the space between lines is not obvious. In this paper, we propose a novel text line segmentation algorithm based on minimal spanning tree (MST) clustering with distance metric learning. Given a distance metric, the connected components (CCs) of document image are grouped into a tree structure, from which text lines are extracted by dynamically cutting the edges using a new hypervolume reduction criterion and a straightness measure. By learning the distance metric in supervised learning on a dataset of pairs of CCs, the proposed algorithm is made robust to handle various documents with multi-skewed and curved text lines. In experiments on a database with 803 unconstrained handwritten Chinese document images containing a total of 8,169 lines, the proposed algorithm achieved a correct rate 98.02% of line detection, and compared favorably to other competitive algorithms.  相似文献   

17.
为了能够在处理不同的数据类型或任务时得到良好的结果,设计了基于自适应假近邻方法的卷积神经网络(CNN)架构。将中心矩的思想应用在CNN的池化操作中,利用稀疏滤波算法实现训练过程的无监督化,并设置CNN算法的卷积掩模(卷积核)的大小和每层卷积单位(CNN神经元)的数量;此外,该架构还利用自适应假近邻方法实现了简化建模和预测等任务。实验结果证实,提出的改进CNN架构的复杂度较低,它可以更快地接受训练并且不易产生过度拟合。  相似文献   

18.
This paper studies the influence of superstars on spectators in cinema marketing. Casting superstars is a common risk-mitigation strategy in the cinema industry. Anecdotal evidence suggests that the presence of superstars is not always a guarantee of success and hence, a deeper study is required to analyze the potential audience of a movie. In this sense, knowledge, attitudes and emotions of spectators towards stars are analyzed as potential factors of influencing the intention of seeing a movie with stars in its cast. This analysis is performed through machine learning techniques. In particular, the problem is stated as an ordinal classification/regression task rather than a traditional classification or regression task, since the intention of watching a movie is measured in a graded scale, hence, its values exhibit an order. Several methods are discussed for this purpose, but Support Vector Ordinal Regression shows its superiority over other ordinal classification/regression techniques. Moreover, exhaustive experiments carried out confirm that the formulation of the problem as an ordinal classification/regression is a success, since powerful traditional classifiers and regressors show worse performance. The study also confirms that talent and popularity expressed by means of knowledge, attitude and emotions satisfactorily explain superstar persuasion. Finally, the impact of these three components is also checked.  相似文献   

19.
在如何从海量的数据中提取有用的信息上提出了一种新的SVM的增量学习算法.该算法基于KKT条件,通过研究支持向量分布特点,分析了新样本加入训练集后,支持向量集的变化情况,提出等势训练集的观点.能对训练数据进行有效的遗忘淘汰,使得学习对象的知识得到了积累.在理论分析和对旅游信息分类的应用结果表明,该算法能在保持分类精度的同时,有效得提高训练速度.  相似文献   

20.
Relevant component analysis (RCA) is a recently proposed metric learning method for semi-supervised learning applications. It is a simple and efficient method that has been applied successfully to give impressive results. However, RCA can make use of supervisory information in the form of positive equivalence constraints only. In this paper, we propose an extension to RCA that allows both positive and negative equivalence constraints to be incorporated. Experimental results show that the extended RCA algorithm is effective.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号