期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Semisupervised Regression with Cotraining-Style Algorithms

Zhi-Hua Zhou Ming Li 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(11):1479-1493

The traditional setting of supervised learning requires a large amount of labeled training examples in order to achieve good generalization. However, in many practical applications, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semisupervised learning has attracted much attention. Previous research on semisupervised learning mainly focuses on semisupervised classification. Although regression is almost as important as classification, semisupervised regression is largely understudied. In particular, although cotraining is a main paradigm in semisupervised learning, few works has been devoted to cotraining-style semisupervised regression algorithms. In this paper, a cotraining-style semisupervised regression algorithm, that is, COREG, is proposed. This algorithm uses two regressors, each labels the unlabeled data for the other regressor, where the confidence in labeling an unlabeled example is estimated through the amount of reduction in mean squared error over the labeled neighborhood of that example. Analysis and experiments show that COREG can effectively exploit unlabeled data to improve regression estimates. 相似文献

2.

A unified framework for semi-supervised PU learning

Haoji Hu Chaofeng Sha Xiaoling Wang Aoying Zhou 《World Wide Web》2014,17(4):493-510

Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class but not the labeled data categories. This problem has been widely studied in recent years and the semi-supervised PU learning is an efficient solution to learn from positive and unlabeled examples. Among all the semi-supervised PU learning methods, it is hard to choose just one approach to fit all unlabeled data distribution. In this paper, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of existing methods. In essence, we propose an automatic KL-divergence learning method by utilizing the knowledge of unlabeled data distribution. Meanwhile, the experimental results show that (1) data distribution information is very helpful for the semi-supervised PU learning method; (2) the proposed framework can achieve higher precision when compared with the state-of-the-art method. 相似文献

3.

半监督多示例核

张钢印鉴程良伦钟钦灵《计算机科学》2011,38(9):220-223

在多示例学习中引入利用未标记示例的机制,能降低训练的成本并提高学习器的泛化能力。当前半监督多示例学习算法大部分是基于对包中的每一个示例进行标记,把多示例学习转化为一个单示例半监督学习问题。考虑到包的类标记由包中示例及包的结构决定,提出一种直接在包层次上进行半监督学习的多示例学习算法。通过定义多示例核,利用所有包(有标记和未标记)计算包层次的图拉普拉斯矩阵,作为优化目标中的光滑性惩罚项。在多示例核所张成的RKHS空间中寻找最优解被归结为确定一个经过未标记数据修改的多示例核函数,它能直接用在经典的核学习方法上。在实验数据集上对算法进行了测试,并和已有的算法进行了比较。实验结果表明,基于半监督多示例核的算法能够使用更少量的训练数据而达到与监督学习算法同样的精度,在有标记数据集相同的情况下利用未标记数据能有效地提高学习器的泛化能力。相似文献

4.

基于图谱分解的无线定位算法

林权赵方罗海勇康一梅《自动化学报》2011,37(3):316-321

基于有监督学习的射频指纹定位方法是室内高精度无线定位技术的一个研究热点. 针对有监督学习方法存在训练数据集采集代价较高的问题, 本文提出了一种基于半监督学习的室内无线定位算法. 该算法采用基于Laplacian矩阵谱分解的方法获取训练数据在特征向量空间上的表示, 然后通过有标记数据在特征向量空间上的标记对齐, 实现对未标记数据的标记. 实验结果表明, 仅需少量的有标记数据(20%左右), 便能以较高的精度(80%左右)实现对未标记数据的标记, 从而有效降低了训练开销. 相似文献

5.

结合半监督与主动学习的时间序列PU问题分类

下载免费PDF全文

陈娟朱福喜《计算机工程与应用》2018,54(11):116-121

目前基于PU问题的时间序列分类常采用半监督学习对未标注数据集[U]中数据进行自动标注并构建分类器,但在这种方法中,边界数据样本类别的自动标注难以保证正确性,从而导致构建分类器的效果不佳。针对以上问题,提出一种采用主动学习对未标注数据集[U]中数据进行人工标注从而构建分类器的方法OAL（Only Active Learning）,基于投票委员会（QBC）对标注数据集构建多个分类器进行投票,以计算未标注数据样本的类别不一致性,并综合考虑数据样本的分布密度,计算数据样本的信息量,作为主动学习的数据选择策略。鉴于人工标注数据量有限,在上述OAL方法的基础上,将主动学习与半监督学习相结合,即在主动学习迭代过程中,将类别一致性高的部分数据样本自动标注,以增加训练数据中标注数据量,保证构建分类器的训练数据量。实验表明了该方法通过部分人工标注,相比半监督学习,能够为PU数据集构建更高准确率的分类器。相似文献

6.

Equilibrium-Based Support Vector Machine for Semisupervised Classification 总被引：2，自引：0，他引：2

Daewon Lee Jaewook Lee 《Neural Networks, IEEE Transactions on》2007,18(2):578-583

A novel learning algorithm for semisupervised classification is proposed. The proposed method first constructs a support function that estimates a support of a data distribution using both labeled and unlabeled data. Then, it partitions a whole data space into a small number of disjoint regions with the aid of a dynamical system. Finally, it labels the decomposed regions utilizing the labeled data and the cluster structure described by the constructed support function. Simulation results show the effectiveness of the proposed method to label out-of-sample unlabeled test data as well as in-sample unlabeled data 相似文献

7.

Genetic Algorithm Classifier System for Semi‐Supervised Learning

下载免费PDF全文

L. Dee Miller Leen‐Kiat Soh Stephen Scott 《Computational Intelligence》2015,31(2):201-232

Real‐world datasets often contain large numbers of unlabeled data points, because there is additional cost for obtaining the labels. Semi‐supervised learning (SSL) algorithms use both labeled and unlabeled data points for training that can result in higher classification accuracy on these datasets. Generally, traditional SSLs tentatively label the unlabeled data points on the basis of the smoothness assumption that neighboring points should have the same label. When this assumption is violated, unlabeled points are mislabeled injecting noise into the final classifier. An alternative SSL approach is cluster‐then‐label (CTL), which partitions all the data points (labeled and unlabeled) into clusters and creates a classifier by using those clusters. CTL is based on the less restrictive cluster assumption that data points in the same cluster should have the same label. As shown, this allows CTLs to achieve higher classification accuracy on many datasets where the cluster assumption holds for the CTLs, but smoothness does not hold for the traditional SSLs. However, cluster configuration problems (e.g., irrelevant features, insufficient clusters, and incorrectly shaped clusters) could violate the cluster assumption. We propose a new framework for CTLs by using a genetic algorithm (GA) to evolve classifiers without the cluster configuration problems (e.g., the GA removes irrelevant attributes, updates number of clusters, and changes the shape of the clusters). We demonstrate that a CTL based on this framework achieves comparable or higher accuracy with both traditional SSLs and CTLs on 12 University of California, Irvine machine learning datasets. 相似文献

8.

联合标签预测与判别投影学习的半监督典型相关分析

下载免费PDF全文

周凯伟万建武王洪元马宏亮《中国图象图形学报》2019,24(7):1126-1135

目的典型相关分析是一种经典的多视图学习方法。为了提高投影方向的判别性能,现有典型相关分析方法通常采用引入样本标签信息的策略。然而,获取样本的标签信息需要付出大量的人力与物力,为此,提出了一种联合标签预测与判别投影学习的半监督典型相关分析算法。方法将标签预测与模型构建相融合,具体地说,将标签预测融入典型相关分析框架中,利用联合学习框架学得的标签矩阵更新投影方向,进而学得的投影方向又重新更新标签矩阵。标签预测与投影方向的学习过程相互依赖、交替更新,预测标签不断地接近其真实标签,有利于学得最优的投影方向。结果本文方法在AR、Extended Yale B、Multi-PIE和ORL这4个人脸数据集上分别进行实验。特征维度为20时,在AR、Extended Yale B、Multi-PIE和ORL人脸数据集上分别取得87%、55%、83%和85%识别率。取训练样本中每人2（3,4,5）幅人脸图像为监督样本,提出的方法识别率在4个人脸数据集上均高于其他方法。训练样本中每人5幅人脸图像为监督样本,在AR、Extended Yale B、Multi-PIE和ORL人脸数据集上分别取得94.67%、68%、83%和85%识别率。实验结果表明在训练样本标签信息较少情况下以及特征降维后的维数较低的情况下,联合学习模型使得降维后的数据最大限度地保存更加有效的信息,得到较好的识别结果。结论本文提出的联合学习方法提高了学习的投影方向的判别性能,能够有效地处理少量的有标签样本和大量的无标签样本的情况以及解决两步学习策略的缺陷。相似文献

9.

A semi-supervised convolutional neural network-based method for steel surface defect recognition

《Robotics and Computer》2020

Automatic defect recognition is one of the research hotspots in steel production, but most of the current methods focus on supervised learning, which relies on large-scale labeled samples. In some real-world cases, it is difficult to collect and label enough samples for model training, and this might impede the application of most current works. The semi-supervised learning, using both labeled and unlabeled samples for model training, can overcome this problem well. In this paper, a semi-supervised learning method using the convolutional neural network (CNN) is proposed for steel surface defect recognition. The proposed method requires fewer labeled samples, and the unlabeled data can be used to help training. And, the CNN is improved by Pseudo-Label. The experimental results on a benchmark dataset of steel surface defect recognition indicate that the proposed method can achieve good performances with limited labeled data, which achieves an accuracy of 90.7% with 17.53% improvement. Furthermore, the proposed method has been applied to a real-world case from a Chinese steel company, and obtains an accuracy of 86.72% which significantly better than the original method in this workshop. 相似文献

10.

Transfer Learning from Unlabeled Data via Neural Networks

Huaxiang Zhang Hua Ji Xiaoqin Wang 《Neural Processing Letters》2012,36(2):173-187

A machine learning framework which uses unlabeled data from a related task domain in supervised classification tasks is described. The unlabeled data come from related domains, which share the same class labels or generative distribution as the labeled data. Patterns in the unlabeled data are learned via a neural network and transferred to the target domain from where the labeled data are generated, so as to improve the performance of the supervised learning task. We call this approach self-taught transfer learning from unlabeled data. We introduce a general-purpose feature learning algorithm producing features that retain information from the unlabeled data. Information preservation assures that the features obtained will be useful for improving the classification performance of the supervised tasks. 相似文献

11.

半监督学习研究的述评

下载免费PDF全文

韩嵩韩秋弘《计算机工程与应用》2020,56(6):19-27

监督学习需要利用大量的标记样本训练模型,但实际应用中,标记样本的采集费时费力。无监督学习不使用先验信息,但模型准确性难以保证。半监督学习突破了传统方法只考虑一种样本类型的局限,能够挖掘大量无标签数据隐藏的信息,辅助少量的标记样本进行训练,成为机器学习的研究热点。通过对半监督学习研究的总趋势以及具体研究内容进行详细的梳理与总结,分别从半监督聚类、分类、回归与降维以及非平衡数据分类和减少噪声数据共六个方面进行综述,发现半监督方法众多,但存在以下不足：（1）部分新提出的方法虽然有效,但仅通过特定数据集进行了实证,缺少一定的理论证明;（2）复杂数据下构建的半监督模型参数较多,结果不稳定且缺乏参数选取的指导经验;（3）监督信息多采用样本标签或成对约束形式,对混合约束的半监督学习需要进一步研究;（4）对半监督回归的研究匮乏,对如何利用连续变量的监督信息研究甚少。相似文献

12.

Ensemble learning from multiple information sources via label propagation and consensus

Yaojin Lin Xuegang Hu Xindong Wu 《Applied Intelligence》2014,41(1):30-41

Many applications are facing the problem of learning from multiple information sources, where sources may be labeled or unlabeled, and information from multiple information sources may be beneficial but cannot be integrated into a single information source for learning. In this paper, we propose an ensemble learning method for different labeled and unlabeled sources. We first present two label propagation methods to infer the labels of training objects from unlabeled sources by making a full use of class label information from labeled sources and internal structure information from unlabeled sources, which are processes referred to as global consensus and local consensus, respectively. We then predict the labels of testing objects using the ensemble learning model of multiple information sources. Experimental results show that our method outperforms two baseline methods. Meanwhile, our method is more scalable for large information sources and is more robust for labeled sources with noisy data. 相似文献

13.

Logistic label propagation

Takumi Kobayashi Kenji Watanabe Nobuyuki Otsu 《Pattern recognition letters》2012,33(5):580-588

In this paper, we propose a novel method for semi-supervised learning, called logistic label propagation (LLP). The proposed method employs the logistic function to classify input pattern vectors, similarly to logistic regression. To cope with unlabeled samples as well as labeled ones in the semi-supervised learning framework, the logistic functions are learnt by using similarities between samples in a manner similar to label propagation. In the proposed method, these two methods of logistic regression and label propagation are effectively incorporated in terms of posterior probabilities. LLP estimates the labels of input samples by using the learnt logistic function, whereas the method of label propagation has to optimize the whole labels whenever an input sample comes. In addition, we suggest the way to provide proper parameter setting and initialization, which frees the users from determining a parameter value in trial and error. In experiments on classification (estimating labels) in the semi-supervised learning framework, the proposed method exhibits favorable performances compared to the other methods. 相似文献

14.

Semi-supervised learning with density-ratio estimation

Masanori Kawakita Takafumi Kanamori 《Machine Learning》2013,91(2):189-209

In this paper we study statistical properties of semi-supervised learning, which is considered to be an important problem in the field of machine learning. In standard supervised learning only labeled data is observed, and classification and regression problems are formalized as supervised learning. On the other hand, in semi-supervised learning, unlabeled data is also obtained in addition to labeled data. Hence, the ability to exploit unlabeled data is important to improve prediction accuracy in semi-supervised learning. This problem is regarded as a semiparametric estimation problem with missing data. Under discriminative probabilistic models, it was considered that unlabeled data is useless to improve the estimation accuracy. Recently, the weighted estimator using unlabeled data achieves a better prediction accuracy compared to the learning method using only labeled data, especially when the discriminative probabilistic model is misspecified. That is, improvement under the semiparametric model with missing data is possible when the semiparametric model is misspecified. In this paper, we apply the density-ratio estimator to obtain the weight function in semi-supervised learning. Our approach is advantageous because the proposed estimator does not require well-specified probabilistic models for the probability of the unlabeled data. Based on statistical asymptotic theory, we prove that the estimation accuracy of our method outperforms supervised learning using only labeled data. Some numerical experiments present the usefulness of our methods. 相似文献

15.

面向类别比例偏移的半监督支持向量机方法^*

李远肇王少博李宇峰《模式识别与人工智能》2016,29(7):625-632

当未标记数据与有标记数据类别比例偏移较大时,半监督支持向量机性能不佳.基于此情况,文中提出面向类别比例偏移的半监督支持向量机方法.首先估计未标记数据类中心,然后对多个类别比例下的类中心进行最坏情况集成,从而提升半监督支持向量机的性能保障.实验表明,文中方法有效提升半监督支持向量机在类别比例偏移时的性能保障. 相似文献

16.

基于深度自学习的图像哈希检索方法

欧新宇伍嘉朱恒李佶《计算机工程与科学》2015,37(12):2386-2392

基于监督学习的卷积神经网络被证明在图像识别的任务中具有强大的特征学习能力。然而,利用监督的深度学习方法进行图像检索,需要大量已标注的数据,否则很容易出现过拟合的问题。为了解决这个问题,提出了一种新颖的基于深度自学习的图像哈希检索方法。首先,通过无监督的自编码网络学习到一个具有判别性的特征表达函数,这种方法降低了学习的复杂性,让训练样本不需要依赖于有语义标注的图像,算法被迫在大量未标注的数据上学习更强健的特征。其次,为了加快检索速度,抛弃了传统利用欧氏距离计算相似性的方法,而使用感知哈希算法来进行相似性衡量。这两种技术的结合确保了在获得更好的特征表达的同时,获得了更快的检索速度。实验结果表明,提出的方法优于一些先进的图像检索方法。相似文献

17.

考虑标记间协作的标记分布学习

李睿钰祝继华刘新媛《软件学报》2022,33(2):539-554

近些年来,作为一种新的有监督学习范式,标记分布学习(LDL)已被应用到多个领域,如人脸年龄估计、头部姿态估计、电影评分预测、公共视频监控中的人群计数等,并且在这些领域的相关任务上取得了一定性能上的进展.最近几年,很多关于标记分布学习的算法在解决标记分布学习问题时考虑到了标记之间的相关性,但是现有方法大多将标记相关性作为... 相似文献

18.

Exploitation of unlabeled sequences in hidden Markov models

Inoue M. Ueda N. 《IEEE transactions on pattern analysis and machine intelligence》2003,25(12):1570-1581

This paper presents a method for effectively using unlabeled sequential data in the learning of hidden Markov models (HMMs). With the conventional approach, class labels for unlabeled data are assigned deterministically by HMMs learned from labeled data. Such labeling often becomes unreliable when the number of labeled data is small. We propose an extended Baum-Welch (EBW) algorithm in which the labeling is undertaken probabilistically and iteratively so that the labeled and unlabeled data likelihoods are improved. Unlike the conventional approach, the EBW algorithm guarantees convergence to a local maximum of the likelihood. Experimental results on gesture data and speech data show that when labeled training data are scarce, by using unlabeled data, the EBW algorithm improves the classification performance of HMMs more robustly than the conventional naive labeling (NL) approach. 相似文献

19.

Learning model order from labeled and unlabeled data for partially supervised classification, with application to word sense disambiguation

Zheng-Yu Niu Dong-Hong Ji Chew Lim Tan 《Computer Speech and Language》2007,21(4):609-619

Previous partially supervised classification methods can partition unlabeled data into positive examples and negative examples for a given class by learning from positive labeled examples and unlabeled examples, but they cannot further group the negative examples into meaningful clusters even if there are many different classes in the negative examples. Here we proposed an automatic method to obtain a natural partitioning of mixed data (labeled data + unlabeled data) by maximizing a stability criterion defined on classification results from an extended label propagation algorithm over all the possible values of model order (or the number of classes) in mixed data. Our experimental results on benchmark corpora for word sense disambiguation task indicate that this model order identification algorithm with the extended label propagation algorithm as the base classifier outperforms SVM, a one-class partially supervised classification algorithm, and the model order identification algorithm with semi-supervised k-means clustering as the base classifier when labeled data is incomplete. 相似文献

20.

Semi-supervised change detection using modified self-organizing feature map neural network

《Applied Soft Computing》2014

In the present article, semi-supervised learning is integrated with an unsupervised context-sensitive change detection technique based on modified self-organizing feature map (MSOFM) network. In the proposed methodology, training of the MSOFM network is initially performed using only a few labeled patterns. Thereafter, the membership values, in both the classes, for each unlabeled pattern are determined using the concept of fuzzy set theory. The soft class label for each of the unlabeled patterns is then estimated using the membership values of its K nearest neighbors. Here, training of the network using the unlabeled patterns along with a few labeled patterns is carried out iteratively. A heuristic method has been suggested to select some patterns from the unlabeled ones for training. To check the effectiveness of the proposed methodology, experiments are conducted on three multi-temporal and multi-spectral data sets. Performance of the proposed work is compared with that of two unsupervised techniques, a supervised technique and two semi-supervised techniques. Results are also statistically validated using paired t-test. The proposed method produced promising results. 相似文献