首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
子空间半监督Fisher判别分析   总被引:1,自引:2,他引:1  
杨武夷  梁伟  辛乐  张树武 《自动化学报》2009,35(12):1513-1519
Fisher判别分析寻找一个使样本数据类间散度与样本数据类内散度比值最大的子空间, 是一种很流行的监督式特征降维方法. 标注样本数据所属的类别通常需要大量的人工, 消耗大量的时间, 付出昂贵的成本. 为了解决同时利用有类别信息的样本数据和没有类别信息的样本数据用于寻找降维子空间的问题, 我们提出了一种子空间半监督Fisher判别分析方法. 子空间半监督Fisher判别分析寻找这样一个子空间, 这个子空间即保留了从有类别信息的样本数据中学习的类别判别结构, 也保留了从有类别信息的样本数据和没有类别信息的样本数据中学习的样本结构信息. 我们还推导了基于核的子空间半监督Fisher判别分析方法. 通过人脸识别实验验证了本文算法的有效性.  相似文献   

2.
Dealing with high-dimensional data has always been a major problem in many pattern recognition and machine learning applications. Trace ratio criterion is a criterion that can be applicable to many dimensionality reduction methods as it directly reflects Euclidean distance between data points of within or between classes. In this paper, we analyze the trace ratio problem and propose a new efficient algorithm to find the optimal solution. Based on the proposed algorithm, we are able to derive an orthogonal constrained semi-supervised learning framework. The new algorithm incorporates unlabeled data into training procedure so that it is able to preserve the discriminative structure as well as geometrical structure embedded in the original dataset. Under such a framework, many existing semi-supervised dimensionality reduction methods such as SDA, Lap-LDA, SSDR, SSMMC, can be improved using our proposed framework, which can also be used to formulate a corresponding kernel framework for handling nonlinear problems. Theoretical analysis indicates that there are certain relationships between linear and nonlinear methods. Finally, extensive simulations on synthetic dataset and real world dataset are presented to show the effectiveness of our algorithms. The results demonstrate that our proposed algorithm can achieve great superiority to other state-of-art algorithms.  相似文献   

3.
This paper proposes a semi-supervised learning method for semantic relation extraction between named entities. Given a small amount of labeled data, it benefits much from a large amount of unlabeled data by first bootstrapping a moderate number of weighted support vectors from all the available data through a co-training procedure on top of support vector machines (SVM) with feature projection and then applying a label propagation (LP) algorithm via the bootstrapped support vectors and the remaining hard unlabeled instances after SVM bootstrapping to classify unseen instances. Evaluation on the ACE RDC corpora shows that our method can integrate the advantages of both SVM bootstrapping and label propagation. It shows that our LP algorithm via the bootstrapped support vectors and hard unlabeled instances significantly outperforms the normal LP algorithm via all the available data without SVM bootstrapping. Moreover, our LP algorithm can significantly reduce the computational burden, especially when a large amount of labeled and unlabeled data is taken into consideration.  相似文献   

4.
Most manifold learning algorithms adopt the k nearest neighbors function to construct the adjacency graph. However, severe bias may be introduced in this case if the samples are not uniformly distributed in the ambient space. In this paper a semi-supervised dimensionality reduction method is proposed to alleviate this problem. Based on the notion of local margin, we simultaneously maximize the separability between different classes and estimate the intrinsic geometric structure of the data by both the labeled and unlabeled samples. For high-dimensional data, a discriminant subspace is derived via maximizing the cumulative local margins. Experimental results on high-dimensional classification tasks demonstrate the efficacy of our algorithm.  相似文献   

5.
当前已有的数据流分类模型都需要大量已标记样本来进行训练,但在实际应用中,对大量样本标记的成本相对较高。针对此问题,提出了一种基于半监督学习的数据流混合集成分类算法SMEClass,选用混合模式来组织基础分类器,用K个决策树分类器投票表决为未标记数据添加标记,以提高数据类标的置信度,增强集成分类器的准确度,同时加入一个贝叶斯分类器来有效减少标记过程中产生的噪音数据。实验结果显示,SMEClass算法与最新基于半监督学习的集成分类算法相比,其准确率有所提高,在运行时间和抗噪能力方面有明显优势。  相似文献   

6.
社交网络平台产生海量的短文本数据流,具有快速、海量、概念漂移、文本长度短小、类标签大量缺失等特点.为此,文中提出基于向量表示和标签传播的半监督短文本数据流分类算法,可对仅含少量有标记数据的数据集进行有效分类.同时,为了适应概念漂移,提出基于聚类簇的概念漂移检测算法.在实际短文本数据流上的实验表明,相比半监督分类算法和半监督数据流分类算法,文中算法不仅提高分类精度和宏平均,还能快速适应数据流中的概念漂移.  相似文献   

7.
Previous partially supervised classification methods can partition unlabeled data into positive examples and negative examples for a given class by learning from positive labeled examples and unlabeled examples, but they cannot further group the negative examples into meaningful clusters even if there are many different classes in the negative examples. Here we proposed an automatic method to obtain a natural partitioning of mixed data (labeled data + unlabeled data) by maximizing a stability criterion defined on classification results from an extended label propagation algorithm over all the possible values of model order (or the number of classes) in mixed data. Our experimental results on benchmark corpora for word sense disambiguation task indicate that this model order identification algorithm with the extended label propagation algorithm as the base classifier outperforms SVM, a one-class partially supervised classification algorithm, and the model order identification algorithm with semi-supervised k-means clustering as the base classifier when labeled data is incomplete.  相似文献   

8.
Graph-Based label propagation algorithms are popular in the state-of-the-art semi-supervised learning research. The key idea underlying this algorithmic family is to enforce labeling consistency between any two examples with a positive similarity. However, negative similarities or dissimilarities are equivalently valuable in practice. To this end, we simultaneously leverage similarities and dissimilarities in our proposed semi-supervised learning algorithm which we term Bidirectional Label Propagation (BLP). Different from previous label propagation mechanisms that proceed along a single direction of graph edges, the BLP algorithm can propagate labels along not only positive but also negative edge directions. By using an initial neighborhood graph and class assignment constraints inherent among the labeled examples, a set of class-specific graphs are learned, which include both positive and negative edges and thus reveal discriminative cues. Over the learned graphs, a convex propagation criterion is carried out to ensure consistent labelings along the positive edges and inconsistent labelings along the negative edges. Experimental evidence discovered in synthetic and real-world datasets validates excellent performance of the proposed BLP algorithm.  相似文献   

9.
提出一种基于受限约束范围标签传播的半监督学习算法。首先利用相似性矩阵计算得出概率转移矩阵,进而通过概率转移矩阵得出受限约束范围。然后在约束范围内利用半监督学习框架下的标签传播算法计算基于路径的相似性,路径相似性决定了标签传播的重要路径。由于只使用几条重要的传播路径使得算法中省去计算每一条路径的相似度,计算复杂度大大减少。最终使得标签在带标签数据与未标签数据之间通过几条重要的路径之间传播。实验已经证明此算法的有效性。  相似文献   

10.
基于集成学习的半监督情感分类方法研究   总被引:1,自引:0,他引:1  
情感分类旨在对文本所表达的情感色彩类别进行分类的任务。该文研究基于半监督学习的情感分类方法,即在很少规模的标注样本的基础上,借助非标注样本提高情感分类性能。为了提高半监督学习能力,该文提出了一种基于一致性标签的集成方法,用于融合两种主流的半监督情感分类方法:基于随机特征子空间的协同训练方法和标签传播方法。首先,使用这两种半监督学习方法训练出的分类器对未标注样本进行标注;其次,选取出标注一致的未标注样本;最后,使用这些挑选出的样本更新训练模型。实验结果表明,该方法能够有效降低对未标注样本的误标注率,从而获得比任一种半监督学习方法更好的分类效果。  相似文献   

11.
A general graph-based semi-supervised learning with novel class discovery   总被引:1,自引:0,他引:1  
In this paper, we propose a general graph-based semi-supervised learning algorithm. The core idea of our algorithm is to not only achieve the goal of semi-supervised learning, but also to discover the latent novel class in the data, which may be unlabeled by the user. Based on the normalized weights evaluated on data graph, our algorithm is able to output the probabilities of data points belonging to the labeled classes or the novel class. We also give the theoretical interpretations for the algorithm from three viewpoints on graph, i.e., regularization framework, label propagation, and Markov random walks. Experiments on toy examples and several benchmark datasets illustrate the effectiveness of our algorithm.  相似文献   

12.
李志恒 《计算机应用研究》2021,38(2):591-594,599
针对机器学习中训练样本和测试样本概率分布不一致的问题,提出了一种基于dropout正则化的半监督域自适应方法来实现将神经网络的特征表示从标签丰富的源域转移到无标签的目标域。此方法从半监督学习的角度出发,在源域数据中添加少量带标签的目标域数据,使得神经网络在学习到源域数据特征分布的同时也能学习到目标域数据的特征分布。由于有了先验知识的指导,即使没有丰富的标签信息,神经网络依然可以很好地拟合目标域数据。实验结果表明,此算法在几种典型的数字数据集SVHN、MNIST和USPS的域自适应任务上的性能优于现有的其他算法,并且在涵盖广泛自然类别的真实数据集CIFAR-10和STL-10的域自适应任务上有较好的鲁棒性。  相似文献   

13.
提出了一种基于两阶段学习的半监督支持向量机(semi-supervised SVM)分类算法.首先使用基于图的标签传递算法给未标识样本赋予初始伪标识,并利用k近邻图将可能的噪声样本点识别出来并剔除;然后将去噪处理后的样本集视为已标识样本集输入到支持向量机(SVM)中,使得SVM在训练时能兼顾整个样本集的信息,从而提高SVM的分类准确率.实验结果证明,同其它半监督学习算法相比较,本文算法在标识的训练样本较少的情况下,分类性能有所提高且具有较高的可靠性.  相似文献   

14.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2005,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

15.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2020,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

16.
针对贪心最大割图半监督学习算法(简称GGMC)计算复杂度较高的问题,提出一种改进的贪心最大割图半监督学习算法(简称GGMC-Estop)。依据对GGMC算法优化过程中目标函数变化趋势的实验分析,采取两种在迭代初期停止GGMC算法运行策略,继而通过一次标准的标签传播步骤预测图上所有样本的标记来实施对GGMC的改进。典型数据集的仿真实验结果表明,在取得相近分类性能的同时,改进算法在计算速度上有很大的提高。  相似文献   

17.
张晨光  张燕  张夏欢 《自动化学报》2015,41(9):1577-1588
针对现有多标记学习方法大多属于有监督学习方法, 而不能有效利用相对便宜且容易获得的大量未标记样本的问题, 本文提出了一种新的多标记半监督学习方法, 称为最大规范化依赖性多标记半监督学习方法(Normalized dependence maximization multi-label semi-supervised learning method). 该方法将已有标签作为约束条件,利用所有样本, 包括已标记和未标记样本,对特征集和标签集的规范化依赖性进行估计, 并以该估计值的最大化为目标, 最终通过求解带边界的迹比值问题为未标记样本打上标签. 与其他经典多标记学习方法在多个真实多标记数据集上的对比实验表明, 本文方法可以有效从已标记和未标记样本中学习, 尤其是已标记样本相对稀少时,学习效果得到了显著提高.  相似文献   

18.
In practice, many applications require a dimensionality reduction method to deal with the partially labeled problem. In this paper, we propose a semi-supervised dimensionality reduction framework, which can efficiently handle the unlabeled data. Under the framework, several classical methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), maximum margin criterion (MMC), locality preserving projections (LPP) and their corresponding kernel versions can be seen as special cases. For high-dimensional data, we can give a low-dimensional embedding result for both discriminating multi-class sub-manifolds and preserving local manifold structure. Experiments show that our algorithms can significantly improve the accuracy rates of the corresponding supervised and unsupervised approaches.  相似文献   

19.
Various problems are encountered when adopting ordinary vector space algorithms for high-order tensor data input. Namely, one must overcome the Small Sample Size (SSS) and overfitting problems. In addition, the structural information of the original tensor signal is lost during the vectorization process. Therefore, comparable methods using a direct tensor input are more appropriate. In the case of electrocardiograms (ECGs), another problem must be overcome; the manual diagnosis of ECG data is expensive and time consuming, rendering it difficult to acquire data with diagnosis labels. However, when effective features for classification in the original data are very sparse, we propose a semisupervised sparse multilinear discriminant analysis (SSSMDA) method. This method uses the distribution of both the labeled and the unlabeled data together with labels discovered through a label propagation Mgorithm. In practice, we use 12-lead ECGs collected from a remote diagnosis system and apply a short-time-fourier transformation (STFT) to obtain third-order tensors. The experimental results highlight the sparsity of the ECG data and the ability of our method to extract sparse and effective features that can be used for classification.  相似文献   

20.
分析了噪声对半监督学习Gaussian-Laplacian正则化(Gaussian-Laplacian regularized,简称GLR)框架的影响,针对最小二乘准则对噪声敏感的特点,结合信息论的最大相关熵准则(maximum correntropy criterion,简称MCC),提出了一种基于最大相关熵准则的鲁棒半监督学习算法(简称GLR-MCC),并证明了算法的收敛性.半二次优化技术被用来求解相关熵目标函数.在每次迭代中,复杂的信息论优化问题被简化为标准的半监督学习问题.典型机器学习数据集上的仿真实验结果表明,在标签噪声和遮挡噪声的情况下,该算法能够有效地提高半监督学习算法性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号