共查询到20条相似文献,搜索用时 0 毫秒
1.
Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class but not the labeled data categories. This problem has been widely studied in recent years and the semi-supervised PU learning is an efficient solution to learn from positive and unlabeled examples. Among all the semi-supervised PU learning methods, it is hard to choose just one approach to fit all unlabeled data distribution. In this paper, a new framework is designed to integrate different semi-supervised PU learning algorithms in order to take advantage of existing methods. In essence, we propose an automatic KL-divergence learning method by utilizing the knowledge of unlabeled data distribution. Meanwhile, the experimental results show that (1) data distribution information is very helpful for the semi-supervised PU learning method; (2) the proposed framework can achieve higher precision when compared with the state-of-the-art method. 相似文献
2.
In this paper, we propose a general graph-based semi-supervised learning algorithm. The core idea of our algorithm is to not only achieve the goal of semi-supervised learning, but also to discover the latent novel class in the data, which may be unlabeled by the user. Based on the normalized weights evaluated on data graph, our algorithm is able to output the probabilities of data points belonging to the labeled classes or the novel class. We also give the theoretical interpretations for the algorithm from three viewpoints on graph, i.e., regularization framework, label propagation, and Markov random walks. Experiments on toy examples and several benchmark datasets illustrate the effectiveness of our algorithm. 相似文献
3.
Traditional pattern recognition generally involves two tasks: unsupervised clustering and supervised classification. When class information is available, fusing the advantages of both clustering learning and classification learning into a single framework is an important problem worthy of study. To date, most algorithms generally treat clustering learning and classification learning in a sequential or two-step manner, i.e., first execute clustering learning to explore structures in data, and then perform classification learning on top of the obtained structural information. However, such sequential algorithms cannot always guarantee the simultaneous optimality for both clustering and classification learning. In fact, the clustering learning in these algorithms just aids the subsequent classification learning and does not benefit from the latter. To overcome this problem, a simultaneous learning framework for clustering and classification (SCC) is presented in this paper. SCC aims to achieve three goals: (1) acquiring the robust classification and clustering simultaneously; (2) designing an effective and transparent classification mechanism; (3) revealing the underlying relationship between clusters and classes. To this end, with the Bayesian theory and the cluster posterior probabilities of classes, we define a single objective function to which the clustering process is directly embedded. By optimizing this objective function, the effective and robust clustering and classification results are achieved simultaneously. Experimental results on both synthetic and real-life datasets show that SCC achieves promising classification and clustering results at one time. 相似文献
4.
Multimedia Tools and Applications - With the enormous growth in the number of images on the web, image clustering has become an essential part of any image retrieval system. Since web images are... 相似文献
5.
Currently, high dimensional data processing confronts two main difficulties: inefficient similarity measure and high computational complexity in both time and memory space. Common methods to deal with these two difficulties are based on dimensionality reduction and feature selection. In this paper, we present a different way to solve high dimensional data problems by combining the ideas of Random Forests and Anchor Graph semi-supervised learning. We randomly select a subset of features and use the Anchor Graph method to construct a graph. This process is repeated many times to obtain multiple graphs, a process which can be implemented in parallel to ensure runtime efficiency. Then the multiple graphs vote to determine the labels for the unlabeled data. We argue that the randomness can be viewed as a kind of regularization. We evaluate the proposed method on eight real-world data sets by comparing it with two traditional graph-based methods and one state-of-the-art semi-supervised learning method based on Anchor Graph to show its effectiveness. We also apply the proposed method to the subject of face recognition. 相似文献
6.
随着网络文本数据呈指数级增长,信息的人工分类和管理逐渐被计算机自动分类所替代,相关领域经过多年的研究和发展已经开发出一些相对成熟的算法。研究分析发现:在文本预处理阶段歧义语段的划分始终是影响分类准确率的一个重要因素,至今仍未完全解决。结合互信息度理论,提出一种基于背景学习的迭代式框架,在此基础上通过对分词数据预处理来改进传统的基于朴素贝叶斯模型的文本分类算法,并使用新浪网不同类别数据对提出的迭代式框架进行实验评估,实验结果表明提出的基于背景学习的迭代式文本分类框架可行有效。 相似文献
7.
In this paper a classification framework for incomplete data, based on electrostatic field model is proposed. An original
approach to exploiting incomplete training data with missing features, involving extensive use of electrostatic charge analogy,
has been used. The framework supports a hybrid supervised and unsupervised training scenario, enabling learning simultaneously
from both labelled and unlabelled data using the same set of rules and adaptation mechanisms. Classification of incomplete
patterns has been facilitated by introducing a local dimensionality reduction technique, which aims at exploiting all available
information using the data ‘as is’, rather than trying to estimate the missing values. The performance of all proposed methods
has been extensively tested in a wide range of missing data scenarios, using a number of standard benchmark datasets in order
to make the results comparable with those available in current and future literature. Several modifications to the original
Electrostatic Field Classifier aiming at improving speed and robustness in higher dimensional spaces have also been introduced
and discussed. 相似文献
8.
In practice, many applications require a dimensionality reduction method to deal with the partially labeled problem. In this paper, we propose a semi-supervised dimensionality reduction framework, which can efficiently handle the unlabeled data. Under the framework, several classical methods, such as principal component analysis (PCA), linear discriminant analysis (LDA), maximum margin criterion (MMC), locality preserving projections (LPP) and their corresponding kernel versions can be seen as special cases. For high-dimensional data, we can give a low-dimensional embedding result for both discriminating multi-class sub-manifolds and preserving local manifold structure. Experiments show that our algorithms can significantly improve the accuracy rates of the corresponding supervised and unsupervised approaches. 相似文献
9.
Handwriting analysis is a systematic study of preserved graphic structures. Which are generated in the human brain and produced on paper in cursive or printed style. The style in which a text is written reflects an array of meta-information. Personality is a combination of an individual’s behavior, emotion, motivation, and thought-pattern characteristics. It has an impact on one’s life choices, well-being, health, and numerous other preferences. This study investigates the correlation between handwriting features and personality characteristics. The prediction of personality through handwriting analysis needs to investigate the style and structure of writing. This study extracts eleven features from handwriting samples using a graph-based writing representation approach. The Big Five model of personality traits is utilized to find the personality of the writer. To improve classification accuracy utilizes a Semi-supervised Generative Adversarial Network (SGAN). This network uses a small amount of labeled data and a larger amount of unlabeled data to train the classifier. The discriminator works as a multi-class classifier and is trained on labeled, unlabeled, and generator created data. The proposed system predicts 91.3% correct personality results by utilizing the writing features of 173 participants. 相似文献
10.
Recent years have witnessed a surge of interest in graph-based semi-supervised learning. However, two of the major problems
in graph-based semi-supervised learning are: (1) how to set the hyperparameter in the Gaussian similarity; and (2) how to
make the algorithm scalable. In this article, we introduce a general framework for graphbased learning. First, we propose
a method called linear neighborhood propagation, which can automatically construct the optimal graph. Then we introduce a
novel multilevel scheme to make our algorithm scalable for large data sets. The applications of our algorithm to various real-world
problems are also demonstrated. 相似文献
13.
We investigate the issue of graph-based semi-supervised learning ( SSL). The labeled and unlabeled data points are represented as vertices in an undirected weighted neighborhood graph, with the edge weights encoding the pairwise similarities between data objects in the same neighborhood. The SSL problem can be then formulated as a regularization problem on this graph. In this paper we propose a robust self-tuning graph-based SSL method, which (1) can determine the similarities between pairwise data points automatically; (2) is not sensitive to outliers. Promising experimental results are given for both synthetic and real data sets. 相似文献
14.
A particle swarm optimization based simultaneous learning framework for clustering and classification (PSOSLCC) is proposed in this paper. Firstly, an improved particle swarm optimization (PSO) is used to partition the training samples, the number of clusters must be given in advance, an automatic clustering algorithm rather than the trial and error is adopted to find the proper number of clusters, and a set of clustering centers is obtained to form classification mechanism. Secondly, in order to exploit more useful local information and get a better optimizing result, a global factor is introduced to the update strategy update strategy of particle in PSO. PSOSLCC has been extensively compared with fuzzy relational classifier (FRC), vector quantization and learning vector quantization (VQ+LVQ3), and radial basis function neural network (RBFNN), a simultaneous learning framework for clustering and classification (SCC) over several real-life datasets, the experimental results indicate that the proposed algorithm not only greatly reduces the time complexity, but also obtains better classification accuracy for most datasets used in this paper. Moreover, PSOSLCC is applied to a real world application, namely texture image segmentation with a good performance obtained, which shows that the proposed algorithm has a potential of classifying the problems with large scale. 相似文献
15.
In this paper, we propose a new semi-supervised co-clustering algorithm Orthogonal Semi-Supervised Nonnegative Matrix Factorization (OSS-NMF) for document clustering. In this new approach, the clustering process is carried out by incorporating both prior domain knowledge of data points (documents) in the form of pair-wise constraints and category knowledge of features (words) into the NMF co-clustering framework. Under this framework, the clustering problem is formulated as the problem of finding the local minimizer of objective function, taking into account the dual prior knowledge. The update rules are derived, and an iterative algorithm is designed for the co-clustering process. Theoretically, we prove the correctness and convergence of our algorithm and demonstrate its mathematical rigorous. Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with those constraints. 相似文献
16.
Pattern Analysis and Applications - In manifold learning, the intrinsic geometry of the manifold is explored and preserved by identifying the optimal local neighborhood around each observation. It... 相似文献
17.
Multimedia Tools and Applications - In this era of technology, digital images turn out to be ubiquitous in a contemporary society and they can be generated and manipulated by a wide variety of... 相似文献
18.
Multimedia Tools and Applications - Graph-based semi-supervised learning has received considerable attention in machine learning community. The performance of existing methods highly depends on the... 相似文献
19.
链接预测是社会网络分析领域的关键问题,研究如何从已知网络中预测可能存在的新链接。现实网络中存在了大量未连接的节点对,从中挖掘潜在信息可以帮助实现链接预测任务。将链接预测视为二类分类问题,使用半监督学习技术,利用网络中的未标记数据帮助学习。使用了两种半监督范式:自我训练和协同训练。在现实数据集Enron和DBLP中的实验结果表明,链接预测任务中采用未标记数据能够有效提高预测的准确率。 相似文献
20.
Semi-supervised learning (SSL) involves the training of a decision rule from both labeled and unlabeled data. In this paper, we propose a novel SSL algorithm based on the multiple clusters per class assumption. The proposed algorithm consists of two stages. In the first stage, we aim to capture the local cluster structure of the training data by using the k-nearest-neighbor (kNN) algorithm to split the data into a number of disjoint subsets. In the second stage, a maximal margin classifier based on the second order cone programming (SOCP) is introduced to learn an inductive decision function from the obtained subsets globally. For linear classification problems, once the kNN algorithm has been performed, the proposed algorithm trains a classifier using only the first and second order moments of the subsets without considering individual data points. Since the number of subsets is usually much smaller than the number of training points, the proposed algorithm is efficient for handling big data sets with a large amount of unlabeled data. Despite its simplicity, the classification performance of the proposed algorithm is guaranteed by the maximal margin classifier. We demonstrate the efficiency and effectiveness of the proposed algorithm on both synthetic and real-world data sets. 相似文献
|