共查询到20条相似文献,搜索用时 0 毫秒
1.
The distribution-independent model of (supervised) concept learning due to Valiant (1984) is extended to that of semi-supervised learning (ss-learning), in which a collection of disjoint concepts is to be simultaneously learned with only partial information concerning concept membership available to the learning algorithm. It is shown that many learnable concept classes are also ss-learnable. A new technique of learning, using an intermediate oracle, is introduced. Sufficient conditions for a collection of concept classes to be ss-learnable are given. 相似文献
3.
Some recent successful semi-supervised learning methods construct more than one learner from both labeled and unlabeled data
for inductive learning. This paper proposes a novel multiple-view multiple-learner (MVML) framework for semi-supervised learning,
which differs from previous methods in possession of both multiple views and multiple learners. This method adopts a co-training
styled learning paradigm in enlarging labeled data from a much larger set of unlabeled data. To the best of our knowledge
it is the first attempt to combine the advantages of multiple-view learning and ensemble learning for semi-supervised learning.
The use of multiple views is promising to promote performance compared with single-view learning because information is more
effectively exploited. At the same time, as an ensemble of classifiers is learned from each view, predictions with higher
accuracies can be obtained than solely adopting one classifier from the same view. Experiments on different applications involving
both multiple-view and single-view data sets show encouraging results of the proposed MVML method. 相似文献
4.
多标记学习主要用于解决单个样本同时属于多个类别的问题.传统的多标记学习通常假设训练数据集含有大量有标记的训练样本.然而在许多实际问题中,大量训练样本中通常只有少量有标记的训练样本.为了更好地利用丰富的未标记训练样本以提高分类性能,提出了一种基于正则化的归纳式半监督多标记学习方法——MASS.具体而言,MASS首先在最小化经验风险的基础上,引入两种正则项分别用于约束分类器的复杂度及要求相似样本拥有相似结构化多标记输出,然后通过交替优化技术给出快速解法.在网页分类和基因功能分析问题上的实验结果验证了MASS方法的有效性. 相似文献
5.
一种结合主动学习的半监督文档聚类算法 总被引:1,自引:0,他引:1
半监督文档聚类,即利用少量具有监督信息的数据来辅助无监督文档聚类,近几年来逐渐成为机器学习和数据挖掘领域研究的热点问题.由于获取大量监督信息费时费力,因此,国内外学者考虑如何获得少量但对聚类性能提高显著的监督信息.提出一种结合主动学习的半监督文档聚类算法,通过引入成对约束信息指导DBSCAN的聚类过程来提高聚类性能,得到一种半监督文档聚类算法Cons-DBSCAN.通过对约束集中所含信息量的衡量和对DBSCAN算法本身的分析,提出了一种启发式的主动学习算法,能够选取含信息量大的成对约束集,从而能够更高效地辅助半监督文档聚类.实验结果表明,所提出的算法能够高效地进行文档聚类.通过主动学习算法获得的成对约束集,能够显著地提高聚类性能.并且,算法的性能优于两个代表性的结合主动学习的半监督聚类算法. 相似文献
6.
网络表示学习是一个重要的研究课题,其目的是将高维的属性网络表示为低维稠密的向量,为下一步任务提供有效特征表示。最近提出的属性网络表示学习模型SNE(Social Network Embedding)同时使用网络结构与属性信息学习网络节点表示,但该模型属于无监督模型,不能充分利用一些容易获取的先验信息来提高所学特征表示的质量。基于上述考虑提出了一种半监督属性网络表示学习方法SSNE(Semi-supervised Social Network Embedding),该方法以属性网络和少量节点先验作为前馈神经网络输入,经过多个隐层非线性变换,在输出层通过保持网络链接结构和少量节点先验,学习最优化的节点表示。在四个真实属性网络和两个人工属性网络上,同现有主流方法进行对比,结果表明本方法学到的表示,在聚类和分类任务上具有较好的性能。 相似文献
7.
8.
Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming,
as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views
are sufficient for learning and independent given the class. However, these assumptions are strong and are not satisfied in
many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce
a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on
its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is
applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised
learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones. 相似文献
9.
Semi-Supervised Learning on Riemannian Manifolds 总被引:1,自引:0,他引:1
We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the Laplace-Beltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set. Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace-Beltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification. 相似文献
10.
基于图的半监督学习方法中,图结构经常要预先设定,这就导致了在标签传递过程中,算法不能自适应地学习一个最优的图.为此,提出了一种基于自适应图的半监督学习方法.该方法通过迭代的优化方法同时学习到最优的图和标签.而且,在少量标记样本的情况下该方法也可以得到较高的分类准确率,并通过实验证明了该方法的有效性. 相似文献
11.
监督学习需要利用大量的标记样本训练模型,但实际应用中,标记样本的采集费时费力。无监督学习不使用先验信息,但模型准确性难以保证。半监督学习突破了传统方法只考虑一种样本类型的局限,能够挖掘大量无标签数据隐藏的信息,辅助少量的标记样本进行训练,成为机器学习的研究热点。通过对半监督学习研究的总趋势以及具体研究内容进行详细的梳理与总结,分别从半监督聚类、分类、回归与降维以及非平衡数据分类和减少噪声数据共六个方面进行综述,发现半监督方法众多,但存在以下不足:(1)部分新提出的方法虽然有效,但仅通过特定数据集进行了实证,缺少一定的理论证明;(2)复杂数据下构建的半监督模型参数较多,结果不稳定且缺乏参数选取的指导经验;(3)监督信息多采用样本标签或成对约束形式,对混合约束的半监督学习需要进一步研究;(4)对半监督回归的研究匮乏,对如何利用连续变量的监督信息研究甚少。 相似文献
12.
已有的数据流分类算法多采用有监督学习,需要使用大量已标记数据训练分类器,而获取已标记数据的成本很高,算法缺乏实用性。针对此问题,文中提出基于半监督学习的集成分类算法SEClass,能利用少量已标记数据和大量未标记数据,训练和更新集成分类器,并使用多数投票方式对测试数据进行分类。实验结果表明,使用同样数量的已标记训练数据,SEClass算法与最新的有监督集成分类算法相比,其准确率平均高5。33%。且运算时间随属性维度和类标签数量的增加呈线性增长,能够适用于高维、高速数据流分类问题。 相似文献
13.
软件缺陷预测方法可以在项目的开发初期,通过预先识别出所有可能含有缺陷的软件模块来优化测试资源的分配。早期的缺陷预测研究大多集中于同项目缺陷预测,但同项目缺陷预测需要充足的历史数据,而在实际应用中可能需要预测的项目的历史数据较为稀缺,或这个项目是一个全新项目。因此跨项目缺陷预测问题成为当前软件缺陷预测领域内的一个研究热点,其研究挑战在于源项目与目标项目数据集间存在的分布差异性以及数据集内存在的类不平衡问题。受到基于搜索的软件工程思想的启发,论文提出了一种基于搜索的半监督集成跨项目软件缺陷预测方法S3EL。该方法首先通过调整训练集中各类数据的分布比例,构建出多个朴素贝叶斯基分类器,随后利用具有全局搜索能力的遗传算法,基于少量已标记目标实例对上述基分类器进行集成,并构建出最终的缺陷预测模型。在Promise数据集及AEEEM数据集上和多个经典的跨项目缺陷预测方法(Burak过滤法、Peters过滤法、TCA+、CODEP及HYDRA)进行了对比。以F1值作为评测指标,结果表明在大部分情况下,S3EL方法可以取得最好的预测性能。 相似文献
14.
Wei Chen Member CCF Jia-Hong Zhou Jia-Xin Zhu Member CCF Guo-Quan Wu Member CCF Jun Wei Member CCF 《计算机科学技术学报》2019,34(5):957-971
Docker has been the mainstream technology of providing reusable software artifacts recently. Developers can easily build and deploy their applications using Docker. Currently, a large number of reusable Docker images are publicly shared in online communities, and semantic tags can be created to help developers effectively reuse the images. However, the communities do not provide tagging services, and manually tagging is exhausting and time-consuming. This paper addresses the problem through a semi-supervised learning-based approach, named SemiTagRec. SemiTagRec contains four components:(1) the predictor, which calculates the probability of assigning a specific tag to a given Docker repository; (2) the extender, which introduces new tags as the candidates based on tag correlation analysis; (3) the evaluator, which measures the candidate tags based on a logistic regression model; (4) the integrator, which calculates a final score by combining the results of the predictor and the evaluator, and then assigns the tags with high scores to the given Docker repositories. SemiTagRec includes the newly tagged repositories into the training data for the next round of training. In this way, SemiTagRec iteratively trains the predictor with the cumulative tagged repositories and the extended tag vocabulary, to achieve a high accuracy of tag recommendation. Finally, the experimental results show that SemiTagRec outperforms the other approaches and SemiTagRec's accuracy, in terms of Recall@5 and Recall@10, is 0.688 and 0.781 respectively. 相似文献
15.
16.
Video compression algorithms manipulate video signals to dramatically reduce the storage and bandwidth required while
maximizing perceived video quality. Typical video compression methods include discrete cosine transform, vector quantization, fractal compression, and discrete wavelet transform. Recently, a machine learning based approach has been proposed which converts the color images (frames) to gray scale images (frames) and the color information for only a few representative pixels is kept. A learning model is then trained to predict the color values for the gray scale pixels across frames. Selecting the most representative pixels is essentially an active learning problem, while colorization is a semi-supervised learning problem. In this paper, we propose to combine active and semi-supervised learning for video compression. The basic idea is to minimize the size of the covariance matrix of the regularized least squares estimates, in which the regression model assumes that each pixel can be reconstructed by the other
pixels with similar spatial location and intensity value. The experimental results demonstrate the effectiveness of the proposed approach for video compression. 相似文献
17.
联邦学习允许边缘设备或客户端将数据存储在本地来合作训练共享的全局模型。主流联邦学习系统通常基于客户端本地数据有标签这一假设,然而客户端数据一般没有真实标签,且数据可用性和数据异构性是联邦学习系统面临的主要挑战。针对客户端本地数据无标签的场景,设计一种鲁棒的半监督联邦学习系统。利用FedMix方法分析全局模型迭代之间的隐式关系,将在标签数据和无标签数据上学习到的监督模型和无监督模型进行分离学习。采用FedLoss聚合方法缓解客户端之间数据的非独立同分布(non-IID)对全局模型收敛速度和稳定性的影响,根据客户端模型损失函数值动态调整局部模型在全局模型中所占的权重。在CIFAR-10数据集上的实验结果表明,该系统的分类准确率相比于主流联邦学习系统约提升了3个百分点,并且对不同non-IID水平的客户端数据更具鲁棒性。 相似文献
18.
基于多核集成的在线半监督学习方法 总被引:1,自引:1,他引:1
在很多实时预测任务中,学习器需对实时采集到的数据在线地进行学习.由于数据采集的实时性,往往难以为采集到的所有数据提供标记.然而,目前的在线学习方法并不能利用未标记数据进行学习,致使学得的模型并不能即时反映数据的动态变化,降低其实时响应能力.提出一种基于多核集成的在线半监督学习方法,使得在线学习器即使在接收到没有标记的数据时也能进行在线学习.该方法采用多个定义在不同RKHS中的函数对未标记数据预测的一致程度作为正则化项,在此基础上导出了多核集成在线半监督学习的即时风险函数,然后借助在线凸规划技术进行求解.在UCl数据集上的实验结果以及在网络入侵检测上的应用表明,该方法能够有效利用数据流中未标记数据来提升在线学习的性能. 相似文献
19.
基于单类分类器的半监督学习 总被引:1,自引:0,他引:1
提出一种结合单类学习器和集成学习优点的Ensemble one-class半监督学习算法.该算法首先为少量有标识数据中的两类数据分别建立两个单类分类器.然后用建立好的两个单类分类器共同对无标识样本进行识别,利用已识别的无标识样本对已建立的两个分类面进行调整、优化.最终被识别出来的无标识数据和有标识数据集合在一起训练一个基分类器,多个基分类器集成在一起对测试样本的测试结果进行投票.在5个UCI数据集上进行实验表明,该算法与tri-training算法相比平均识别精度提高4.5%,与仅采用纯有标识数据的单类分类器相比,平均识别精度提高8.9%.从实验结果可以看出,该算法在解决半监督问题上是有效的. 相似文献
20.
Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics 总被引:1,自引:0,他引:1
We focus on the problem of predicting functional properties of the proteins corresponding to genes in the yeast genome. Our goal is to study the effectiveness of approaches that utilize all data sources that are available in this problem setting, including relational data, abstracts of research papers, and unlabeled data. We investigate a propositionalization approach which uses relational gene interaction data. We study the benefit of text classification and information extraction for utilizing a collection of scientific abstracts. We study transduction and co-training for using unlabeled data. We report on both, positive and negative results on the investigated approaches. The studied tasks are KDD Cup tasks of 2001 and 2002. The solutions which we describe achieved the highest score for task 2 in 2001, the fourth rank for task 3 in 2001, the highest score for one of the two subtasks and the third place for the overall task 2 in 2002. 相似文献