首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent advances in ? 1 optimization for imaging problems provide promising tools to solve the fundamental high-dimensional data classification in machine learning. In this paper, we extend the main result of Szlam and Bresson (Proceedings of the 27th International Conference on Machine Learning, pp. 1039–1046, 2010), which introduced an exact ? 1 relaxation of the Cheeger ratio cut problem for unsupervised data classification. The proposed extension deals with the multi-class transductive learning problem, which consists in learning several classes with a set of labels for each class. Learning several classes (i.e. more than two classes) simultaneously is generally a challenging problem, but the proposed method builds on strong results introduced in imaging to overcome the multi-class issue. Besides, the proposed multi-class transductive learning algorithms also benefit from recent fast ? 1 solvers, specifically designed for the total variation norm, which plays a central role in our approach. Finally, experiments demonstrate that the proposed ? 1 relaxation algorithms are more accurate and robust than standard ? 2 relaxation methods s.a. spectral clustering, particularly when considering a very small number of labels for each class to be classified. For instance, the mean error of classification for the benchmark MNIST dataset of 60,000 data in $\mathbb{R}^{784}$ using the proposed ? 1 relaxation of the multi-class Cheeger cut is 2.4 % when only one label is considered for each class, while the error of classification for the ? 2 relaxation method of spectral clustering is 24.7 %.  相似文献   

2.
3.
This paper presents a novel noise-robust graph-based semi-supervised learning algorithm to deal with the challenging problem of semi-supervised learning with noisy initial labels. Inspired by the successful use of sparse coding for noise reduction, we choose to give new L1-norm formulation of Laplacian regularization for graph-based semi-supervised learning. Since our L1-norm Laplacian regularization is explicitly defined over the eigenvectors of the normalized Laplacian matrix, we formulate graph-based semi-supervised learning as an L1-norm linear reconstruction problem which can be efficiently solved by sparse coding. Furthermore, by working with only a small subset of eigenvectors, we develop a fast sparse coding algorithm for our L1-norm semi-supervised learning. Finally, we evaluate the proposed algorithm in noise-robust image classification. The experimental results on several benchmark datasets demonstrate the promising performance of the proposed algorithm.  相似文献   

4.
Classification of agricultural data such as soil data and crop data is significant as it allows the stakeholders to make meaningful decisions for farming. Soil classification aids farmers in deciding the type of crop to be sown for a particular type of soil. Similarly, wheat variety classification assists in selecting the right type of wheat for a particular product. Current methods used for classifying agricultural data are mostly manual. These methods involve agriculture field visits and surveys and are labor-intensive, expensive, and prone to human error. Recently, data mining techniques such as decision trees, k-nearest neighbors (k-NN), support vector machine (SVM), and Naive Bayes (NB) have been used in classification of agricultural data such as soil, crops, and land cover. The resulting classification aid the decision making process of government organizations and agro-industries in the field of agriculture. SVM is a popular approach for data classification. A recent study on SVM highlighted the fact that using multiple kernels instead of a single kernel would lead to better performance because of the greater learning and generalization power. In this work, a hybrid kernel based support vector machine (H-SVM) is proposed for classifying multi-class agricultural datasets having continuous attributes. Genetic algorithm (GA) or gradient descent (GD) methods are utilized to select the SVM parameters C and γ. The proposed kernel is called the quadratic-radial-basis-function kernel (QRK) and it combines both quadratic and radial basis function (RBF) kernels. The proposed classifier has the ability to classify all kinds of multi-class agricultural datasets with continuous features. Rigorous experiments using the proposed method are performed on standard benchmark and real world agriculture datasets. The results reveal a significant performance improvement over state of the art methods such as NB, k-NN, and SVM in terms of performance metrics such as accuracy, sensitivity, specificity, precision, and F-score.  相似文献   

5.
Graph is a powerful representation formalism that has been widely employed in machine learning and data mining. In this paper, we present a graph-based classification method, consisting of the construction of a special graph referred to as K-associated graph, which is capable of representing similarity relationships among data cases and proportion of classes overlapping. The main properties of the K-associated graphs as well as the classification algorithm are described. Experimental evaluation indicates that the proposed technique captures topological structure of the training data and leads to good results on classification task particularly for noisy data. In comparison to other well-known classification techniques, the proposed approach shows the following interesting features: (1) A new measure, called purity, is introduced not only to characterize the degree of overlap among classes in the input data set, but also to construct the K-associated optimal graph for classification; (2) nonlinear classification with automatic local adaptation according to the input data. Contrasting to K-nearest neighbor classifier, which uses a fixed K, the proposed algorithm is able to automatically consider different values of K, in order to best fit the corresponding overlap of classes in different data subspaces, revealing both the local and global structure of input data. (3) The proposed classification algorithm is nonparametric, implicating high efficiency and no need for model selection in practical applications.  相似文献   

6.
霍纬纲  高小霞 《控制与决策》2012,27(12):1833-1838
提出一种适用于多类不平衡分布情形下的模糊关联分类方法,该方法以最小化AdaBoost.M1W集成学习迭代过程中训练样本的加权分类错误率和子分类器中模糊关联分类规则数目及规则中所含模糊项的数目为遗传优化目标,实现了AdaBoost.M1W和模糊关联分类建模过程的较好融合.通过5个多类不平衡UCI标准数据集和现有的针对不平衡分类问题的数据预处理方法实验对比结果,表明了所提出的方法能显著提高多类不平衡情形下的模糊关联分类模型的分类性能.  相似文献   

7.
王莉莉  付忠良  陶攀  朱锴 《计算机应用》2017,37(8):2253-2257
针对超声图像样本冗余、不同标准切面因疾病导致的高度相似性、感兴趣区域定位不准确问题,提出一种结合特征袋(BOF)特征、主动学习方法和多分类AdaBoost改进算法的经食管超声心动图(TEE)标准切面分类方法。首先采用BOF方法对超声图像进行描述;然后采用主动学习方法选择对分类器最有价值的样本作为训练集;最后,在AdaBoost算法对弱分类器的迭代训练中,根据临时强分类器的分类情况调整样本更新规则,实现对多分类AdaBoost算法的改进和TEE标准切面的分类。在TEE数据集和三个UCI数据集上的实验表明,相比AdaBoost.SAMME算法、多分类支持向量机(SVM)算法、BP神经网络和AdaBoost.M2算法,所提算法在各个数据集上的G-mean指标、整体分类准确率和大多数类别分类准确率都有不同程度的提升,且比较难分的类别分类准确率提升最为显著。实验结果表明,在包含类间相似样本的数据集上,分类器的性能有显著提升。  相似文献   

8.
A new manifold learning method, called improved semi-supervised local fisher discriminant analysis (iSELF), for gene expression data classification is proposed. Motivated by the fact that semi-supervised and parameter-free are two desirable and promising characteristics for dimension reduction, a new difference-based optimization objective function with unlabeled samples has been designed. The proposed method preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The semi-supervised method has an analytic form of the globally optimal solution and it can be computed based on Eigen decompositions. Experiments on synthetic data and SRBCT, DLBCL and brain tumor gene expression datasets are performed to test and evaluate the proposed method. The experimental results and comparisons demonstrate the effectiveness of the proposed method.  相似文献   

9.
为有效使用大量未标注的图像进行分类,提出一种基于半监督学习的图像分类方法。通过共同的隐含话题桥接少量已标注的图像和大量未标注的图像,利用已标注图像的Must-link约束和Cannot-link约束提高未标注图像分类的精度。实验结果表明,该方法有效提高Caltech-101数据集和7类图像集约10%的分类精度。此外,针对目前绝大部分半监督图像分类方法不具备增量学习能力这一缺点,提出该方法的增量学习模型。实验结果表明,增量学习模型相比无增量学习模型提高近90%的计算效率。关键词半监督学习,图像分类,增量学习中图法分类号TP391。41IncrementalImageClassificationMethodBasedonSemi-SupervisedLearningLIANGPeng1,2,LIShao-Fa2,QINJiang-Wei2,LUOJian-Gao31(SchoolofComputerScienceandEngineering,GuangdongPolytechnicNormalUniversity,Guangzhou510665)2(SchoolofComputerScienceandEngineering,SouthChinaUniversityofTechnology,Guangzhou510006)3(DepartmentofComputer,GuangdongAIBPolytechnicCollege,Guangzhou510507)ABSTRACTInordertouselargenumbersofunlabeledimageseffectively,animageclassificationmethodisproposedbasedonsemi-supervisedlearning。Theproposedmethodbridgesalargeamountofunlabeledimagesandlimitednumbersoflabeledimagesbyexploitingthecommontopics。Theclassificationaccuracyisimprovedbyusingthemust-linkconstraintandcannot-linkconstraintoflabeledimages。TheexperimentalresultsonCaltech-101and7-classesimagedatasetdemonstratethattheclassificationaccuracyimprovesabout10%bytheproposedmethod。Furthermore,duetothepresentsemi-supervisedimageclassificationmethodslackingofincrementallearningability,anincrementalimplementationofourmethodisproposed。Comparingwithnon-incrementallearningmodelinliterature,theincrementallearningmethodimprovesthecomputationefficiencyofnearly90%。  相似文献   

10.
In classification problems with hierarchical structures of labels, the target function must assign labels that are hierarchically organized and it can be used either for single-label (one label per instance) or multi-label classification problems (more than one label per instance). In parallel to these developments, the idea of semi-supervised learning has emerged as a solution to the problems found in a standard supervised learning procedure (used in most classification algorithms). It combines labelled and unlabelled data during the training phase. Some semi-supervised methods have been proposed for single-label classification methods. However, very little effort has been done in the context of multi-label hierarchical classification. Therefore, this paper proposes a new method for supervised hierarchical multi-label classification, called HMC-RAkEL. Additionally, we propose the use of semi-supervised learning, self-training, in hierarchical multi-label classification, leading to three new methods, called HMC-SSBR, HMC-SSLP and HMC-SSRAkEL. In order to validate the feasibility of these methods, an empirical analysis will be conducted, comparing the proposed methods with their corresponding supervised versions. The main aim of this analysis is to observe whether the semi-supervised methods proposed in this paper have similar performance of the corresponding supervised versions.  相似文献   

11.
Feature extraction is an important step before actual learning. Although many feature extraction methods have been proposed for clustering, classification and regression, very limited work has been done on multi-class classification problems. This paper proposes a novel feature extraction method, called orientation distance–based discriminative (ODD) feature extraction, particularly designed for multi-class classification problems. Our proposed method works in two steps. In the first step, we extend the Fisher Discriminant idea to determine an appropriate kernel function and map the input data with all classes into a feature space where the classes of the data are well separated. In the second step, we put forward two variants of ODD features, i.e., one-vs-all-based ODD and one-vs-one-based ODD features. We first construct hyper-plane (SVM) based on one-vs-all scheme or one-vs-one scheme in the feature space; we then extract one-vs-all-based or one-vs-one-based ODD features between a sample and each hyper-plane. These newly extracted ODD features are treated as the representative features and are thereafter used in the subsequent classification phase. Extensive experiments have been conducted to investigate the performance of one-vs-all-based and one-vs-one-based ODD features for multi-class classification. The statistical results show that the classification accuracy based on ODD features outperforms that of the state-of-the-art feature extraction methods.  相似文献   

12.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2005,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

13.
吕亚丽  苗钧重  胡玮昕 《计算机应用》2020,40(12):3430-3436
大多基于图的半监督学习方法,在样本间相似性度量时没有用到已有的和标签传播过程中得到的标签信息,同时,其度量方式相对固定,不能有效度量出分布结构复杂多样的数据样本间的相似性。针对上述问题,提出了基于标签进行度量学习的图半监督学习算法。首先,给定样本间相似性的度量方式,从而构建相似度矩阵。然后,基于相似度矩阵进行标签传播,筛选出k个低熵样本作为新确定的标签信息。最后,充分利用所有标签信息更新相似性度量方式,重复迭代优化直至学出所有标签信息。所提算法不仅利用标签信息改进了样本间相似性的度量方式,而且充分利用中间结果降低了半监督学习对标签数据的需求量。在6个真实数据集上的实验结果表明,该算法在超过95%的情况下相较三种传统的基于图的半监督学习算法取得了更高的分类准确率。  相似文献   

14.
钟明  薛惠锋 《测控技术》2010,29(12):18-21
通过Garbor小波提取人脸表情特征,为降低Garbor变换后向量维数和提取有效的鉴别特征,将手动选取特征点和监督局部线性嵌入(SLLE)结合起来,利用人脸表情图像数据本身的非线性流形结构信息和样本标签信息来调整点到点之间的距离,并形成距离矩阵,而后基于被调整的距离矩阵进行线性近邻重建来实现维数约简,提取低维鉴别特征用于人脸表情识别。结果表明该方法能更为有效地提取反映表情状态的特征,识别率优于传统的PCA算法,取得了较好的识别效果。最后实验分析了SLLE算法近邻数K和嵌入维数对识别率的影响,得到了SLLE算法的最优近邻数K和低维嵌入维数。  相似文献   

15.
Tri-training method proposed by Zhou et al., is an excellent method for semi-supervised classification; nevertheless, the heavy computational burden caused by the retraining strategy prevents the further application of tri-training method. To address this problem, this paper proposes the ternary reversible extreme learning machines (TRELM) which is an incremental tri-training method without relying on the retraining strategy. TRELM employs three reversible extreme learning machines (RELM) as its base learners and trains the RELM with extended (or detected) samples in each learning round. RELM is an incremental learning method with reversible derivation capability. RELM can overcome the difficulty for most incremental learning methods in removing the influence of previously learned mistaken samples. Experimental results indicate that TRELM significantly improves the learning speed of tri-training method. In addition, TRELM achieves comparable (or even better) classification performance to other effective semi-supervised learning methods. TRELM is an appropriate choice for semi-supervised classification tasks with large amounts of data sets or with strict demands for learning speed and classification accuracy.  相似文献   

16.
In the present article, semi-supervised learning is integrated with an unsupervised context-sensitive change detection technique based on modified self-organizing feature map (MSOFM) network. In the proposed methodology, training of the MSOFM network is initially performed using only a few labeled patterns. Thereafter, the membership values, in both the classes, for each unlabeled pattern are determined using the concept of fuzzy set theory. The soft class label for each of the unlabeled patterns is then estimated using the membership values of its K nearest neighbors. Here, training of the network using the unlabeled patterns along with a few labeled patterns is carried out iteratively. A heuristic method has been suggested to select some patterns from the unlabeled ones for training. To check the effectiveness of the proposed methodology, experiments are conducted on three multi-temporal and multi-spectral data sets. Performance of the proposed work is compared with that of two unsupervised techniques, a supervised technique and two semi-supervised techniques. Results are also statistically validated using paired t-test. The proposed method produced promising results.  相似文献   

17.
针对现有半监督分类方法无法对移动界面模式进行有效分类的问题,提出一种采用改进极限学习机的移动界面模式半监督分类方法。为了提高极限学习机的分类效果,利用改进的粒子群优化算法优化极限学习机的初始参数。根据移动界面模式数据的特点,利用主动学习和模糊[C]均值聚类提取信息丰富的未标记数据进行训练和标记。利用分类器实现对所有数据的分类。实验结果表明,该分类方法能够对移动界面模式数据进行有效和合理的分类。  相似文献   

18.
In many real-life problems, obtaining labelled data can be a very expensive and laborious task, while unlabeled data can be abundant. The availability of labeled data can seriously limit the performance of supervised learning methods. Here, we propose a semi-supervised classification tree induction algorithm that can exploit both the labelled and unlabeled data, while preserving all of the appealing characteristics of standard supervised decision trees: being non-parametric, efficient, having good predictive performance and producing readily interpretable models. Moreover, we further improve their predictive performance by using them as base predictive models in random forests. We performed an extensive empirical evaluation on 12 binary and 12 multi-class classification datasets. The results showed that the proposed methods improve the predictive performance of their supervised counterparts. Moreover, we show that, in cases with limited availability of labeled data, the semi-supervised decision trees often yield models that are smaller and easier to interpret than supervised decision trees.  相似文献   

19.
本文提出一种基于半监督主动学习的算法,用于解决在建立动态贝叶斯网络(DBN)分类模型时遇到的难以获得大量带有类标注的样本数据集的问题.半监督学习可以有效利用未标注样本数据来学习DBN分类模型,但是在迭代过程中易于加入错误的样本分类信息,并因而影响模型的准确性.在半监督学习中借鉴主动学习,可以自主选择有用的未标注样本来请求用户标注.把这些样本加入训练集之后,能够最大程度提高半监督学习对未标注样本分类的准确性.实验结果表明,该算法能够显著提高DBN学习器的效率和性能,并快速收敛于预定的分类精度.  相似文献   

20.
Text representation has received extensive attention in text mining tasks. There are various text representation models. Among them, vector space model is the most commonly used one. For vector space model, the core technique is term weighting. To date, a great deal of different term-weighting methods have been proposed, which can be divided into supervised group and unsupervised group. However, it is not advisable to use these two groups of methods directly in semi-supervised applications. In semi-supervised applications, the majority of the supervised term-weighting methods are not applicable as the label information is insufficient; meanwhile, the unsupervised term-weighting methods cannot make use of the provided category labels. Thus, a semi-supervised learning framework for iteratively revising the text representation by an EM-like strategy is proposed in this paper. Furthermore, a new supervised term-weighting method t f.sd f is proposed. T f.sd f has the ability to emphasize the importance of terms that are unevenly distributed among all the classes and weaken the importance of terms that are uniformly distributed. Experimental results on real text data show that the proposed semi-supervised learning framework with the aid of t f.sd f performs well. Also, t f.sd f is shown to be efficient for supervised learning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号