共查询到20条相似文献,搜索用时 0 毫秒
1.
The distribution-independent model of (supervised) concept learning due to Valiant (1984) is extended to that of semi-supervised learning (ss-learning), in which a collection of disjoint concepts is to be simultaneously learned with only partial information concerning concept membership available to the learning algorithm. It is shown that many learnable concept classes are also ss-learnable. A new technique of learning, using an intermediate oracle, is introduced. Sufficient conditions for a collection of concept classes to be ss-learnable are given. 相似文献
3.
Some recent successful semi-supervised learning methods construct more than one learner from both labeled and unlabeled data
for inductive learning. This paper proposes a novel multiple-view multiple-learner (MVML) framework for semi-supervised learning,
which differs from previous methods in possession of both multiple views and multiple learners. This method adopts a co-training
styled learning paradigm in enlarging labeled data from a much larger set of unlabeled data. To the best of our knowledge
it is the first attempt to combine the advantages of multiple-view learning and ensemble learning for semi-supervised learning.
The use of multiple views is promising to promote performance compared with single-view learning because information is more
effectively exploited. At the same time, as an ensemble of classifiers is learned from each view, predictions with higher
accuracies can be obtained than solely adopting one classifier from the same view. Experiments on different applications involving
both multiple-view and single-view data sets show encouraging results of the proposed MVML method. 相似文献
4.
多标记学习主要用于解决单个样本同时属于多个类别的问题.传统的多标记学习通常假设训练数据集含有大量有标记的训练样本.然而在许多实际问题中,大量训练样本中通常只有少量有标记的训练样本.为了更好地利用丰富的未标记训练样本以提高分类性能,提出了一种基于正则化的归纳式半监督多标记学习方法——MASS.具体而言,MASS首先在最小化经验风险的基础上,引入两种正则项分别用于约束分类器的复杂度及要求相似样本拥有相似结构化多标记输出,然后通过交替优化技术给出快速解法.在网页分类和基因功能分析问题上的实验结果验证了MASS方法的有效性. 相似文献
5.
网络表示学习是一个重要的研究课题,其目的是将高维的属性网络表示为低维稠密的向量,为下一步任务提供有效特征表示。最近提出的属性网络表示学习模型SNE(Social Network Embedding)同时使用网络结构与属性信息学习网络节点表示,但该模型属于无监督模型,不能充分利用一些容易获取的先验信息来提高所学特征表示的质量。基于上述考虑提出了一种半监督属性网络表示学习方法SSNE(Semi-supervised Social Network Embedding),该方法以属性网络和少量节点先验作为前馈神经网络输入,经过多个隐层非线性变换,在输出层通过保持网络链接结构和少量节点先验,学习最优化的节点表示。在四个真实属性网络和两个人工属性网络上,同现有主流方法进行对比,结果表明本方法学到的表示,在聚类和分类任务上具有较好的性能。 相似文献
6.
Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming,
as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views
are sufficient for learning and independent given the class. However, these assumptions are strong and are not satisfied in
many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce
a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on
its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is
applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised
learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones. 相似文献
7.
Semi-Supervised Learning on Riemannian Manifolds 总被引:1,自引:0,他引:1
We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the Laplace-Beltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set. Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace-Beltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification. 相似文献
8.
9.
监督学习需要利用大量的标记样本训练模型,但实际应用中,标记样本的采集费时费力。无监督学习不使用先验信息,但模型准确性难以保证。半监督学习突破了传统方法只考虑一种样本类型的局限,能够挖掘大量无标签数据隐藏的信息,辅助少量的标记样本进行训练,成为机器学习的研究热点。通过对半监督学习研究的总趋势以及具体研究内容进行详细的梳理与总结,分别从半监督聚类、分类、回归与降维以及非平衡数据分类和减少噪声数据共六个方面进行综述,发现半监督方法众多,但存在以下不足:(1)部分新提出的方法虽然有效,但仅通过特定数据集进行了实证,缺少一定的理论证明;(2)复杂数据下构建的半监督模型参数较多,结果不稳定且缺乏参数选取的指导经验;(3)监督信息多采用样本标签或成对约束形式,对混合约束的半监督学习需要进一步研究;(4)对半监督回归的研究匮乏,对如何利用连续变量的监督信息研究甚少。 相似文献
10.
基于单类分类器的半监督学习 总被引:1,自引:0,他引:1
提出一种结合单类学习器和集成学习优点的Ensemble one-class半监督学习算法.该算法首先为少量有标识数据中的两类数据分别建立两个单类分类器.然后用建立好的两个单类分类器共同对无标识样本进行识别,利用已识别的无标识样本对已建立的两个分类面进行调整、优化.最终被识别出来的无标识数据和有标识数据集合在一起训练一个基分类器,多个基分类器集成在一起对测试样本的测试结果进行投票.在5个UCI数据集上进行实验表明,该算法与tri-training算法相比平均识别精度提高4.5%,与仅采用纯有标识数据的单类分类器相比,平均识别精度提高8.9%.从实验结果可以看出,该算法在解决半监督问题上是有效的. 相似文献
11.
基于多核集成的在线半监督学习方法 总被引:1,自引:1,他引:1
在很多实时预测任务中,学习器需对实时采集到的数据在线地进行学习.由于数据采集的实时性,往往难以为采集到的所有数据提供标记.然而,目前的在线学习方法并不能利用未标记数据进行学习,致使学得的模型并不能即时反映数据的动态变化,降低其实时响应能力.提出一种基于多核集成的在线半监督学习方法,使得在线学习器即使在接收到没有标记的数据时也能进行在线学习.该方法采用多个定义在不同RKHS中的函数对未标记数据预测的一致程度作为正则化项,在此基础上导出了多核集成在线半监督学习的即时风险函数,然后借助在线凸规划技术进行求解.在UCl数据集上的实验结果以及在网络入侵检测上的应用表明,该方法能够有效利用数据流中未标记数据来提升在线学习的性能. 相似文献
12.
Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics 总被引:1,自引:0,他引:1
We focus on the problem of predicting functional properties of the proteins corresponding to genes in the yeast genome. Our goal is to study the effectiveness of approaches that utilize all data sources that are available in this problem setting, including relational data, abstracts of research papers, and unlabeled data. We investigate a propositionalization approach which uses relational gene interaction data. We study the benefit of text classification and information extraction for utilizing a collection of scientific abstracts. We study transduction and co-training for using unlabeled data. We report on both, positive and negative results on the investigated approaches. The studied tasks are KDD Cup tasks of 2001 and 2002. The solutions which we describe achieved the highest score for task 2 in 2001, the fourth rank for task 3 in 2001, the highest score for one of the two subtasks and the third place for the overall task 2 in 2002. 相似文献
13.
Boosting Algorithms for Parallel and Distributed Learning 总被引:1,自引:0,他引:1
The growing amount of available information and its distributed and heterogeneous nature has a major impact on the field of data mining. In this paper, we propose a framework for parallel and distributed boosting algorithms intended for efficient integrating specialized classifiers learned over very large, distributed and possibly heterogeneous databases that cannot fit into main computer memory. Boosting is a popular technique for constructing highly accurate classifier ensembles, where the classifiers are trained serially, with the weights on the training instances adaptively set according to the performance of previous classifiers. Our parallel boosting algorithm is designed for tightly coupled shared memory systems with a small number of processors, with an objective of achieving the maximal prediction accuracy in fewer iterations than boosting on a single processor. After all processors learn classifiers in parallel at each boosting round, they are combined according to the confidence of their prediction. Our distributed boosting algorithm is proposed primarily for learning from several disjoint data sites when the data cannot be merged together, although it can also be used for parallel learning where a massive data set is partitioned into several disjoint subsets for a more efficient analysis. At each boosting round, the proposed method combines classifiers from all sites and creates a classifier ensemble on each site. The final classifier is constructed as an ensemble of all classifier ensembles built on disjoint data sets. The new proposed methods applied to several data sets have shown that parallel boosting can achieve the same or even better prediction accuracy considerably faster than the standard sequential boosting. Results from the experiments also indicate that distributed boosting has comparable or slightly improved classification accuracy over standard boosting, while requiring much less memory and computational time since it uses smaller data sets. 相似文献
14.
15.
武永成 《电脑与微电子技术》2012,(20):8-11,16
半监督学习,与传统的监督学习不同,能同时在少量的已标记数据和大量的未标记数据上进行学习,从而提高性能。协同训练是一种流行的半监督学习算法,已成为目前机器学习和模式识别领域中的一个研究热点。综述半监督学习协同训练的基本思想、研究现状、常用算法,分析目前存在的主要困难,并指出需进一步研究的几个问题。 相似文献
16.
半监督聚类是机器学习的重要研究内容之一,它通过利用样本层面的少量标记数据信息或者利用特征层面的特征偏好信息来指导半监督聚类。但现有的半监督聚类算法仅考虑了单一层面的半监督先验信息,罕有同时考虑两个不同层面的此类信息进行半监督聚类。为了弥补这一遗漏,联合利用特征层面给定的特征偏好,即特征之间的相对重要性关系,并结合样本层面的少量标记数据等半监督信息,在传统的半监督聚类算法基础上发展出一个扩展型半监督聚类算法。初步实验验证了该算法的有效性。 相似文献
17.
Low-rank structures play important roles in recent advances of many problems in image science and data science. As a natural extension of low-rank structures for data with nonlinear structures, the concept of the low-dimensional manifold structure has been considered in many data processing problems. Inspired by this concept, we consider a manifold based low-rank regularization as a linear approximation of manifold dimension. This regularization is less restricted than the global low-rank regularization, and thus enjoy more flexibility to handle data with nonlinear structures. As applications, we demonstrate the proposed regularization to classical inverse problems in image sciences and data sciences including image inpainting, image super-resolution, X-ray computer tomography image reconstruction and semi-supervised learning. We conduct intensive numerical experiments in several image restoration problems and a semi-supervised learning problem of classifying handwritten digits using the MINST data. Our numerical tests demonstrate the effectiveness of the proposed methods and illustrate that the new regularization methods produce outstanding results by comparing with many existing methods. 相似文献
18.
具有噪声过滤功能的协同训练半监督主动学习算法 总被引:1,自引:0,他引:1
针对基于半监督学习的分类器利用未标记样本训练会引入噪声而使得分类性能下降的情形,文中提出一种具有噪声过滤功能的协同训练半监督主动学习算法.该算法以3个模糊深隐马尔可夫模型进行协同半监督学习,在适当的时候主动引入一些人机交互来补充类别标记,避免判决类别不相同时的拒判和初始时判决一致即认为正确的误判情形.同时加入噪声过滤机制,用以过滤南机器自动标记的可能是噪声的样本.将该算法应用于人脸表情识别.实验结果表明,该算法能有效提高未标记样本的利用率并降低半监督学习而引入的噪声,提高表情识别的准确率. 相似文献
19.
潘强 《自动化与信息工程》2013,(5):1-6
以往半监督多示例学习算法常把未标记包分解为示例集合,使用传统的半监督单示例学习算法确定这些示例的潜在标记以对它们进行利用。但该类方法认为多示例样本的分类与其概率密度分布紧密相关,且并未考虑包结构对包分类标记的影响。提出一种基于包层次的半监督多示例核学习方法,直接利用未标记包进行半监督学习器的训练。首先通过对示例空间聚类把包转换为概念向量表示形式,然后计算概念向量之间的海明距离,在此基础上计算描述包光滑性的图拉普拉斯矩阵,进而计算包层次的半监督核,最后在多示例学习标准数据集和图像数据集上测试本算法。测试表明本算法有明显的改进效果。 相似文献
20.
现有的基于图的半监督学习方法在本质上是属于模拟各种传播机制的标签传播方法.与现有的传播机制不同,尝试采用一种新的基于弹力的传播方法来实现半监督学习.基本思想是假设图中的每个节点以一定的弹性系数都接受其相邻节点的弹性力,并以另一个弹性系数将弹性力传递给相邻的节点.因此,两种类型的弹性力之间的差异可以度量每个节点的传播量.... 相似文献