首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Video Annotation Based on Kernel Linear Neighborhood Propagation   总被引:1,自引:0,他引:1  
The insufficiency of labeled training data for representing the distribution of the entire dataset is a major obstacle in automatic semantic annotation of large-scale video database. Semi-supervised learning algorithms, which attempt to learn from both labeled and unlabeled data, are promising to solve this problem. In this paper, a novel graph-based semi-supervised learning method named kernel linear neighborhood propagation (KLNP) is proposed and applied to video annotation. This approach combines the consistency assumption, which is the basic assumption in semi-supervised learning, and the local linear embedding (LLE) method in a nonlinear kernel-mapped space. KLNP improves a recently proposed method linear neighborhood propagation (LNP) by tackling the limitation of its local linear assumption on the distribution of semantics. Experiments conducted on the TRECVID data set demonstrate that this approach outperforms other popular graph-based semi-supervised learning methods for video semantic annotation.  相似文献   

2.
结合半监督核的高斯过程分类   总被引:1,自引:0,他引:1  
提出了一种半监督算法用于学习高斯过程分类器, 其通过结合非参数的半监督核向分类器提供未标记数据信息. 该算法主要包括以下几个方面: 1)通过图拉普拉斯的谱分解获得核矩阵, 其联合了标记数据和未标记数据信息; 2)采用凸最优化方法学习核矩阵特征向量的最优权值, 构建非参数的半监督核; 3)把半监督核整合到高斯过程模型中, 构建所提出的半监督学习算法. 该算法的主要特点是: 把基于整个数据集的非参数半监督核应用于高斯过程模型, 该模型有着明确的概率描述, 可以方便地对数据之间的不确定性进行建模, 并能够解决复杂的推论问题. 通过实验结果表明, 该算法与其他方法相比具有更高的可靠性.  相似文献   

3.
Recently, integrating new knowledge sources such as pairwise constraints into various classification tasks with insufficient training data has been actively studied in machine learning. In this paper, we propose a novel semi-supervised classification approach, called semi-supervised classification with enhanced spectral kernel, which can simultaneously handle both sparse labeled data and additional pairwise constraints together with unlabeled data. Specifically, we first design a non-parameter spectral kernel learning model based on the squared loss function. Then we develop an efficient semi-supervised classification algorithm which takes advantage of Laplacian spectral regularization: semi-supervised classification with enhanced spectral kernel under the squared loss (ESKS). Finally, we conduct many experiments on a variety of synthetic and real-world data sets to demonstrate the effectiveness of the proposed ESKS algorithm.  相似文献   

4.
Composite kernels for semi-supervised clustering   总被引:3,自引:2,他引:1  
A critical problem related to kernel-based methods is how to select optimal kernels. A kernel function must conform to the learning target in order to obtain meaningful results. While solutions to the problem of estimating optimal kernel functions and corresponding parameters have been proposed in a supervised setting, it remains a challenge when no labeled data are available, and all we have is a set of pairwise must-link and cannot-link constraints. In this paper, we address the problem of optimizing the kernel function using pairwise constraints for semi-supervised clustering. We propose a new optimization criterion for automatically estimating the optimal parameters of composite Gaussian kernels, directly from the data and given constraints. We combine our proposal with a semi-supervised kernel-based algorithm to demonstrate experimentally the effectiveness of our approach. The results show that our method is very effective for kernel-based semi-supervised clustering.  相似文献   

5.
以往半监督多示例学习算法常把未标记包分解为示例集合,使用传统的半监督单示例学习算法确定这些示例的潜在标记以对它们进行利用。但该类方法认为多示例样本的分类与其概率密度分布紧密相关,且并未考虑包结构对包分类标记的影响。提出一种基于包层次的半监督多示例核学习方法,直接利用未标记包进行半监督学习器的训练。首先通过对示例空间聚类把包转换为概念向量表示形式,然后计算概念向量之间的海明距离,在此基础上计算描述包光滑性的图拉普拉斯矩阵,进而计算包层次的半监督核,最后在多示例学习标准数据集和图像数据集上测试本算法。测试表明本算法有明显的改进效果。  相似文献   

6.
情感分类是目前自然语言处理领域的一个热点研究问题。该文关注情感分类中的半监督学习方法(即基于少量标注样本和大量未标注样本进行学习的方式),提出了一种新的基于动态随机特征子空间的半监督学习方法。首先,动态生成多个随机特征子空间;然后,基于协同训练(Co-training)在每个特征子空间中挑选置信度高的未标注样本;最后使用这些挑选出的样本更新训练模型。实验结果表明我们的方法明显优于传统的静态产生方式及其他现有的半监督方法。此外该文还探索了特征子空间的划分数目问题。  相似文献   

7.
Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class but not the labeled data categories. This problem has been widely studied in recent years and the semi-supervised PU learning is an efficient solution to learn from positive and unlabeled examples. Among all the semi-supervised PU learning methods, it is hard to choose just one approach to fit all unlabeled data distribution. In this paper, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of existing methods. In essence, we propose an automatic KL-divergence learning method by utilizing the knowledge of unlabeled data distribution. Meanwhile, the experimental results show that (1) data distribution information is very helpful for the semi-supervised PU learning method; (2) the proposed framework can achieve higher precision when compared with the state-of-the-art method.  相似文献   

8.
Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled training data in many real-world application areas, such as video annotation. As a significant factor of these algorithms, however, pair-wise similarity metric of samples has not been fully investigated. Specifically, for existing approaches, the estimation of pair-wise similarity between two samples relies on the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pair-wise similarity measure. Accordingly, in this paper, a novel framework for video annotation, called Joint Spatio-Temporal Correlation Learning (JSTCL) is proposed. This framework is characterized by simultaneously taking into account both the spatial and temporal property of video data to improve the estimation of pair-wise similarity. We apply the proposed framework to video annotation and report superior performance compared to key existing approaches over the benchmark TRECVID data set.  相似文献   

9.
Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.  相似文献   

10.
基于集成学习的半监督情感分类方法研究   总被引:1,自引:0,他引:1  
情感分类旨在对文本所表达的情感色彩类别进行分类的任务。该文研究基于半监督学习的情感分类方法,即在很少规模的标注样本的基础上,借助非标注样本提高情感分类性能。为了提高半监督学习能力,该文提出了一种基于一致性标签的集成方法,用于融合两种主流的半监督情感分类方法:基于随机特征子空间的协同训练方法和标签传播方法。首先,使用这两种半监督学习方法训练出的分类器对未标注样本进行标注;其次,选取出标注一致的未标注样本;最后,使用这些挑选出的样本更新训练模型。实验结果表明,该方法能够有效降低对未标注样本的误标注率,从而获得比任一种半监督学习方法更好的分类效果。  相似文献   

11.
In this paper we study statistical properties of semi-supervised learning, which is considered to be an important problem in the field of machine learning. In standard supervised learning only labeled data is observed, and classification and regression problems are formalized as supervised learning. On the other hand, in semi-supervised learning, unlabeled data is also obtained in addition to labeled data. Hence, the ability to exploit unlabeled data is important to improve prediction accuracy in semi-supervised learning. This problem is regarded as a semiparametric estimation problem with missing data. Under discriminative probabilistic models, it was considered that unlabeled data is useless to improve the estimation accuracy. Recently, the weighted estimator using unlabeled data achieves a better prediction accuracy compared to the learning method using only labeled data, especially when the discriminative probabilistic model is misspecified. That is, improvement under the semiparametric model with missing data is possible when the semiparametric model is misspecified. In this paper, we apply the density-ratio estimator to obtain the weight function in semi-supervised learning. Our approach is advantageous because the proposed estimator does not require well-specified probabilistic models for the probability of the unlabeled data. Based on statistical asymptotic theory, we prove that the estimation accuracy of our method outperforms supervised learning using only labeled data. Some numerical experiments present the usefulness of our methods.  相似文献   

12.
在基于语义的视频检索系统中,为了弥补视频底层特征与高层用户需求之间的差异,提出了时序概率超图模型。它将时间序列因素融入到模型的构建中,在此基础上提出了一种基于时序概率超图模型的视频多语义标注框架(TPH-VMLAF)。该框架结合视频时间相关性,通过使用基于时序概率超图的镜头多标签半监督分类学习算法对视频镜头进行多语义标注。标注过程中同时解决了已标注视频数据不足和多语义标注的问题。实验结果表明,该框架提高了标注的精确度,表现出了良好的性能。  相似文献   

13.
Graph-based learning provides a useful approach for modeling data in classification problems. In this modeling scenario, the relationship between labeled and unlabeled data impacts the construction and performance of classifiers, and therefore a semi-supervised learning framework is adopted. We propose a graph classifier based on kernel smoothing. A regularization framework is also introduced, and it is shown that the proposed classifier optimizes certain loss functions. Its performance is assessed on several synthetic and real benchmark data sets with good results, especially in settings where only a small fraction of the data are labeled.  相似文献   

14.
针对基于图的半监督学习方法在多媒体研究应用中忽略视频相关性的问题,提出了一种基于相关核映射线性近邻传播的视频标注算法.该算法首先通过核函数按照半监督学习调整后的距离计算出迭代标记传播系数;其次利用传播系数求得表示低层特征空间的样本,再根据视频相关性建模构造出语义概念间的关联表;最后完成近邻图的构造,并利用已标注视频信息迭代传播到未标注视频中,完成视频标注.实验结果表明,该算法不仅可以提高视频标注的准确度,还能弥补已标注视频数据数量的不足.  相似文献   

15.
Confronted with the explosive growth of web images, the web image annotation has become a critical research issue for image search and index. Sparse feature selection plays an important role in improving the efficiency and performance of web image annotation. Meanwhile, it is beneficial to developing an effective mechanism to leverage the unlabeled training data for large-scale web image annotation. In this paper we propose a novel sparse feature selection framework for web image annotation, namely sparse Feature Selection based on Graph Laplacian (FSLG)2. FSLG applies the l2,1/2-matrix norm into the sparse feature selection algorithm to select the most sparse and discriminative features. Additional, graph Laplacian based semi-supervised learning is used to exploit both labeled and unlabeled data for enhancing the annotation performance. An efficient iterative algorithm is designed to optimize the objective function. Extensive experiments on two web image datasets are performed and the results illustrate that our method is promising for large-scale web image annotation.  相似文献   

16.
Face Annotation Using Transductive Kernel Fisher Discriminant   总被引:1,自引:0,他引:1  
Face annotation in images and videos enjoys many potential applications in multimedia information retrieval. Face annotation usually requires many training data labeled by hand in order to build effective classifiers. This is particularly challenging when annotating faces on large-scale collections of media data, in which huge labeling efforts would be very expensive. As a result, traditional supervised face annotation methods often suffer from insufficient training data. To attack this challenge, in this paper, we propose a novel Transductive Kernel Fisher Discriminant (TKFD) scheme for face annotation, which outperforms traditional supervised annotation methods with few training data. The main idea of our approach is to solve the Fisher's discriminant using deformed kernels incorporating the information of both labeled and unlabeled data. To evaluate the effectiveness of our method, we have conducted extensive experiments on three types of multimedia testbeds: the FRGC benchmark face dataset, the Yahoo! web image collection, and the TRECVID video data collection. The experimental results show that our TKFD algorithm is more effective than traditional supervised approaches, especially when there are very few training data.  相似文献   

17.
Multiple kernel learning (MKL) approach has been proposed for kernel methods and has shown high performance for solving some real-world applications. It consists on learning the optimal kernel from one layer of multiple predefined kernels. Unfortunately, this approach is not rich enough to solve relatively complex problems. With the emergence and the success of the deep learning concept, multilayer of multiple kernel learning (MLMKL) methods were inspired by the idea of deep architecture. They are introduced in order to improve the conventional MKL methods. Such architectures tend to learn deep kernel machines by exploring the combinations of multiple kernels in a multilayer structure. However, existing MLMKL methods often have trouble with the optimization of the network for two or more layers. Additionally, they do not always outperform the simplest method of combining multiple kernels (i.e., MKL). In order to improve the effectiveness of MKL approaches, we introduce, in this paper, a novel backpropagation MLMKL framework. Specifically, we propose to optimize the network over an adaptive backpropagation algorithm. We use the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error. We test our proposed method through a large set of experiments on a variety of benchmark data sets. We have successfully optimized the system over many layers. Empirical results over an extensive set of experiments show that our algorithm achieves high performance compared to the traditional MKL approach and existing MLMKL methods.  相似文献   

18.
The manifold regularization (MR) based semi-supervised learning could explore structural relationships from both labeled and unlabeled data. However, the model selection of MR seriously affects its predictive performance due to the inherent additional geometry regularizer of labeled and unlabeled data. In this paper, two continuous and two inherent discrete hyperparameters are selected as optimization variables, and a leave-one-out cross-validation (LOOCV) based Predicted REsidual Sum of Squares (PRESS) criterion is first presented for model selection of MR to choose appropriate regularization coefficients and kernel parameters. Considering the inherent discontinuity of the two hyperparameters, the minimization process is implemented by using a improved Nelder-Mead simplex algorithm to solve the inherent discrete and continues hybrid variables set. The manifold regularization and model selection algorithm are applied to six synthetic and real-life benchmark dataset. The proposed approach, leveraged by effectively exploiting the embedded intrinsic geometric manifolds and unbiased LOOCV estimation, outperforms the original MR and supervised learning approaches in the empirical study.  相似文献   

19.
Online social video websites such as YouTube allow users to manually annotate their video documents with textual labels. These labels can be used as indexing keywords to facilitate search and organization of video data. However, manual video annotation is usually a labor-intensive and time-consuming process. In this work, we propose a novel social video annotation approach that combines multiple feature sets based on a tri-adaptation approach. For the shots in each video, they are annotated by aggregating models that are learned from three complementary feature sets. Meanwhile, the models are collaboratively adapted by exploring unlabeled shots. In this sense, the method can be viewed as a novel semi-supervised algorithm that explores three complementary views. Our approach also exploits the temporal smoothness of video labels by applying a label correction strategy. Experiments on a web video dataset demonstrate the effectiveness of the proposed approach.  相似文献   

20.
Tri-training: exploiting unlabeled data using three classifiers   总被引:24,自引:0,他引:24  
In many practical data mining applications, such as Web page classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as co-training have attracted much attention. In this paper, a new co-training style semi-supervised learning algorithm, named tri-training, is proposed. This algorithm generates three classifiers from the original labeled example set. These classifiers are then refined using unlabeled examples in the tri-training process. In detail, in each round of tri-training, an unlabeled example is labeled for a classifier if the other two classifiers agree on the labeling, under certain conditions. Since tri-training neither requires the instance space to be described with sufficient and redundant views nor does it put any constraints on the supervised learning algorithm, its applicability is broader than that of previous co-training style algorithms. Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号