首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
基分类器的差异性对于集成学习来说至关重要,从直观上讲,对约束重采样有潜力获得比对样本重采样更好的多样性.文中在典型相关分析算法基础上,通过引入成对约束作为监督信息对样本进行特征抽取从而形成新的训练数据.算法中集成学习的思想主要体现在成对约束的选取上,对约束进行随机重采样以获得具有多样性的基分类器.在多特征手写体数据集以及人脸数据集(Yale和AR)上进行实验考察该算法随选取的约束比例变化的情况,结果表明该方法获得比传统集成学习方法更好的性能.  相似文献   

2.
在典型相关分析(CCA)的基础上,通过稀疏保持引入样本的类别信息,利用交叉相关项克服CCA及其推广算法要求不同视图样本必须成对出现的局限,提出一种有监督学习方法——有样本缺失的稀疏保持典型相关分析(SPCCAM).SPCCAM能在训练样本不成对的情况下进行多视图特征融合.在人工数据集、手写体数据集和PIE人脸数据集上的实验结果表明,SPCCAM能有效利用类信息提高分类性能.  相似文献   

3.
基于成对约束的判别型半监督聚类分析   总被引:10,自引:1,他引:9  
尹学松  胡恩良  陈松灿 《软件学报》2008,19(11):2791-2802
现有一些典型的半监督聚类方法一方面难以有效地解决成对约束的违反问题,另一方面未能同时处理高维数据.通过提出一种基于成对约束的判别型半监督聚类分析方法来同时解决上述问题.该方法有效地利用了监督信息集成数据降维和聚类,即在投影空间中使用基于成对约束的K均值算法对数据聚类,再利用聚类结果选择投影空间.同时,该算法降低了基于约束的半监督聚类算法的计算复杂度,并解决了聚类过程中成对约束的违反问题.在一组真实数据集上的实验结果表明,与现有相关半监督聚类算法相比,新方法不仅能够处理高维数据,还有效地提高了聚类性能.  相似文献   

4.
一种结合主动学习的半监督文档聚类算法   总被引:1,自引:0,他引:1  
半监督文档聚类,即利用少量具有监督信息的数据来辅助无监督文档聚类,近几年来逐渐成为机器学习和数据挖掘领域研究的热点问题.由于获取大量监督信息费时费力,因此,国内外学者考虑如何获得少量但对聚类性能提高显著的监督信息.提出一种结合主动学习的半监督文档聚类算法,通过引入成对约束信息指导DBSCAN的聚类过程来提高聚类性能,得到一种半监督文档聚类算法Cons-DBSCAN.通过对约束集中所含信息量的衡量和对DBSCAN算法本身的分析,提出了一种启发式的主动学习算法,能够选取含信息量大的成对约束集,从而能够更高效地辅助半监督文档聚类.实验结果表明,所提出的算法能够高效地进行文档聚类.通过主动学习算法获得的成对约束集,能够显著地提高聚类性能.并且,算法的性能优于两个代表性的结合主动学习的半监督聚类算法.  相似文献   

5.
结合以成对约束形式给出的监督信息和无监督信息,提出一种基于成对约束和稀疏保留的数据降维算法。通过成对约束信息进行鉴别分析,利用稀疏表示方法保留数据集在变换空间中的全局稀疏结构。实验结果表明,与传统特征抽取算法相比,该算法的识别效果更好,需要调节的参数更少,且鲁棒性较高。  相似文献   

6.
针对网络流量特征选择过程中监督信息缺乏的问题,提出一种基于成对约束扩展的半监督网络流量特征选择算法。该算法同时考虑少量成对约束和大量无标记样本,利用样本集合间的相关性和自相关性,扩展成对约束集到无标记样本上,产生更多可靠性强的成对约束,以揭示样本空间分布信息。最后,利用扩展的成对约束集进行特征选择。实验证明:与未进行成对约束扩展的算法相比,该算法在少量初始成对约束的情况下能获得更好的分类性能。  相似文献   

7.
半监督局部维数约减   总被引:1,自引:1,他引:0       下载免费PDF全文
在挖掘和分析高维数据任务中,有时只能获得有限的成对约束信息(must-link约束和cannot-link约束),由于缺乏数据类标号信息,监督维数约减方法常常不能得到满意的结果。在这种情况下,使用大量的无标号样本可以提高算法的性能。文中借助于成对约束信息和大量无标号样本,提出半监督局部维数约减方法(SLDR)。SLDR集成数据的局部信息和成对约束寻找一个最优投影,当数据被投影到低维空间时,不仅cannot-link约束中样本点对之间距离更远、must-link约束中样本点对之间距离更近,数据的内在几何信息还被保持。而且SLDR能推广为非线性方法,使之能够适应非线性数据的维数约减。在各种数据集上的实验结果充分验证了所提出算法的有效性。  相似文献   

8.
齐鸣鸣  向阳 《计算机科学》2012,39(11):212-215
提出一种融合稀疏保持的成对约束投影(Pairwise Constraint Projections inosculating Sparsity Preserving, SPPCP)。该算法在成对约束指导的降维过程中,通过平衡参数引入稀疏保持投影(Sparsity Preserving Projections, SPP),在保持成对约束特征的同时,也继承了稀疏保持所蕴含的几何结构保持和近部保持特性。在UCI数据集和 AR人脸库上的实验表明,该算法有效地融合了稀疏保持投影的优点,与典型的成对约束的半监督降维算法相比,提 高了基于最短欧氏距离的分类算法的精度和稳定性。  相似文献   

9.
半监督聚类就是利用样本的监督信息来帮助提升无监督学习的性能。在半监督聚类中,成对约束(must-link约束和cannot-link约束)作为样本的先验知识被广泛地使用。凝聚层次聚类(AHC)也叫合成聚类,是层次聚类法的一种。提出了一种基于成对约束的半监督凝聚层次聚类算法(PS-AHC),该算法利用成对约束来改变聚类簇之间的距离,使聚类簇之间的距离更真实。在UCI数据集上的实验表明,PS-AHC能有效地提高聚类的准确率,是一种有前景的半监督聚类算法。  相似文献   

10.
考虑到已有的半监督维数约减方法在利用边信息时将所有边信息等同,不能充分挖掘边所含信息,提出加权成对约束半监督局部维数约减算法(WSLDR).通过构建近邻图对边信息进行扩充,使边信息数量有所增加.另外,根据边所含信息量的不同构建边的权系数矩阵.将边信息融入近邻图对其进行修正,对修正后的近邻图和加权的成对约束寻找最优投影.算法不仅保持了数据的内在局部几何结构,而且使得类内数据分布更加紧密,类间数据分布更加分散.在UCI数据集上的实验结果验证了该算法的有效性.  相似文献   

11.

Clustering algorithms help identify homogeneous subgroups from data. In some cases, additional information about the relationship among some subsets of the data exists. When using a semi-supervised clustering algorithm, an expert may provide additional information to constrain the solution based on that knowledge and, in doing so, guide the algorithm to a more useful and meaningful solution. Such additional information often takes the form of a cannot-link constraint (i.e., two data points cannot be part of the same cluster) or a must-link constraint (i.e., two data points must be part of the same cluster). A key challenge for users of such constraints in semi-supervised learning algorithms, however, is that the addition of inaccurate or conflicting constraints can decrease accuracy and little is known about how to detect whether expert-imposed constraints are likely incorrect. In the present work, we introduce a method to score each must-link and cannot-link pairwise constraint as likely incorrect. Using synthetic experimental examples and real data, we show that the resulting impact score can successfully identify individual constraints that should be removed or revised.

  相似文献   

12.
周晨曦  梁循  齐金山 《自动化学报》2015,41(7):1253-1263
提出了一种基于约束动态更新的半监督层次聚类算法. 与现存的半监督层次聚类算法类似, 该算法也使用了必连和不连约束. 但不同的是, 该算法并不是在对满足必连约束的数据样本点进行预先划分的基础上依据不连约束进行聚合操作, 而是首先将约束扩展为一个闭包, 然后在这此基础上直接依据不连约束进行聚合操作, 并在聚合的过程中依据聚类结果动态地更新必连和不连约束, 以保证最终的聚类结果同时满足必连和不连约束. 该算法的优势在于省略了对必连约束的数据样本点进行预先划分的步骤, 这一改进能够保证数据样本点获得更为合理的聚合顺序, 从而得到更为准确的聚类结果. 本文具体给出了该算法基于Ward 层次聚类算法的实现, 提出了C-Ward算法.实验表明, 与其他同类算法相比, 无论是在人工模拟数据集还是在现实数据集上, 本文提出的算法都表现出了更高的准确性和更强的稳定性.  相似文献   

13.
Recent feature selection scores using pairwise constraints (must-link and cannot-link) have shown better performances than the unsupervised methods and comparable to the supervised ones. However, these scores use only the pairwise constraints and ignore the available information brought by the unlabeled data. Moreover, these constraint scores strongly depend on the given must-link and cannot-link subsets built by the user. In this paper, we address these problems and propose a new semi-supervised constraint score that uses both pairwise constraints and local properties of the unlabeled data. Experiments using Kendall’s coefficient and accuracy rates, show that this new score is less sensitive to the given constraints than the previous scores while providing similar performances.  相似文献   

14.
Distance metric is a key issue in many machine learning algorithms. This paper considers a general problem of learning from pairwise constraints in the form of must-links and cannot-links. As one kind of side information, a must-link indicates the pair of the two data points must be in a same class, while a cannot-link indicates that the two data points must be in two different classes. Given must-link and cannot-link information, our goal is to learn a Mahalanobis distance metric. Under this metric, we hope the distances of point pairs in must-links are as small as possible and those of point pairs in cannot-links are as large as possible. This task is formulated as a constrained optimization problem, in which the global optimum can be obtained effectively and efficiently. Finally, some applications in data clustering, interactive natural image segmentation and face pose estimation are given in this paper. Experimental results illustrate the effectiveness of our algorithm.  相似文献   

15.
齐鸣鸣 《计算机应用》2012,32(12):3315-3318
针对稀疏保持投影的稀疏重构过程中监督信息不足的问题,提出一种成对约束指导的稀疏保持投影算法。该算法在训练样本数据的稀疏重构的过程中,通过引入正约束和负约束监督信息指导稀疏重构,使得稀疏保持投影有效地融合了约束监督信息。在UMIST、YALE和AR人脸库人脸数据集上的实验结果表明,与无监督的稀疏保持投影相比,该方法提高了基于最近近邻分类算法的5%~15%识别准确率,有效地提高了降维分类性能。  相似文献   

16.
Dimensionality reduction plays an important role in many machine learning tasks. This paper studies semi-supervised dimensionality reduction using pairwise constraints. In this setting, domain knowledge is given in the form of pairwise constraint, which specifies whether a pair of instances belongs to the same class (must-link constraint) or different classes (cannot-link constraint). In this paper, a novel semi-supervised dimensionality reduction method called LGS3DR is proposed, which can integrate both local and global topological structures of the data as well as pairwise constraints. The LGS3DR method is effective and has a closed form solution. Experiments on data visualization and face recognition show that LGS3DR is superior to many existing dimensionality reduction methods.  相似文献   

17.
Constrained clustering methods (that usually use must-link and/or cannot-link constraints) have been received much attention in the last decade. Recently, kernel adaptation or kernel learning has been considered as a powerful approach for constrained clustering. However, these methods usually either allow only special forms of kernels or learn non-parametric kernel matrices and scale very poorly. Therefore, they either learn a metric that has low flexibility or are applicable only on small data sets due to their high computational complexity. In this paper, we propose a more efficient non-linear metric learning method that learns a low-rank kernel matrix from must-link and cannot-link constraints and the topological structure of data. We formulate the proposed method as a trace ratio optimization problem and learn appropriate distance metrics through finding optimal low-rank kernel matrices. We solve the proposed optimization problem much more efficiently than SDP solvers. Additionally, we show that the spectral clustering methods can be considered as a special form of low-rank kernel learning methods. Extensive experiments have demonstrated the superiority of the proposed method compared to recently introduced kernel learning methods.  相似文献   

18.
This paper presents a new semi-supervised fuzzy c-means clustering for data with clusterwise tolerance by opposite criteria. In semi-supervised clustering, pairwise constraints, that is, must-link and cannot-link, are frequently used in order to improve clustering performances. From the viewpoint of handling pairwise constraints, a new semi-supervised fuzzy c-means clustering is proposed by introducing clusterwise tolerance-based pairwise constraints. First, a concept of clusterwise tolerance-based pairwise constraints is introduced. Second, the optimization problems of the proposed method are formulated. Especially, must-link and cannot-link are handled by opposite criteria in our proposed method. Third, a new clustering algorithm is constructed based on the above discussions. Finally, the effectiveness of the proposed algorithm is verified through numerical examples.  相似文献   

19.
随着电子商务的飞速发展,电子商务网站上的各种产品评论数量也在飞速增长。如何从Web中大量存在的产品评论中挖掘出对消费者和生产厂商都有价值的信息,已经成为一个非常重要的研究领域。在产品评论中,用户往往会用不同的词语描述同一产品特征。识别这些产品特征同义词才能更好地进行观点汇总。该文经过对产品评论的分析,抽取了must-link和can-not-link两类约束,并使用约束层次聚类算法识别产品特征同义词。同时,比较了几种不同产品特征相似度计算方法的结果。实验结果表明,该文的方法在实际产品评论数据集上取得了较好的效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号