首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Many important machine learning models, supervised and unsupervised, are based on simple Euclidean distance or orthogonal projection in a high dimensional feature space. When estimating such models from small training sets we face the problem that the span of the training data set input vectors is not the full input space. Hence, when applying the model to future data the model is effectively blind to the missed orthogonal subspace. This can lead to an inflated variance of hidden variables estimated in the training set and when the model is applied to test data we may find that the hidden variables follow a different probability law with less variance. While the problem and basic means to reconstruct and deflate are well understood in unsupervised learning, the case of supervised learning is less well understood. We here investigate the effect of variance inflation in supervised learning including the case of Support Vector Machines (SVMS) and we propose a non-parametric scheme to restore proper generalizability. We illustrate the algorithm and its ability to restore performance on a wide range of benchmark data sets.  相似文献   

3.
Learning-based hashing methods are becoming the mainstream for approximate scalable multimedia retrieval. They consist of two main components: hash codes learning for training data and hash functions learning for new data points. Tremendous efforts have been devoted to designing novel methods for these two components, i.e., supervised and unsupervised methods for learning hash codes, and different models for inferring hashing functions. However, there is little work integrating supervised and unsupervised hash codes learning into a single framework. Moreover, the hash function learning component is usually based on hand-crafted visual features extracted from the training images. The performance of a content-based image retrieval system crucially depends on the feature representation and such hand-crafted visual features may degrade the accuracy of the hash functions. In this paper, we propose a semi-supervised deep learning hashing (DLH) method for fast multimedia retrieval. More specifically, in the first component, we utilize both visual and label information to learn an optimal similarity graph that can more precisely encode the relationship among training data, and then generate the hash codes based on the graph. In the second stage, we apply a deep convolutional network to simultaneously learn a good multimedia representation and a set of hash functions. Extensive experiments on five popular datasets demonstrate the superiority of our DLH over both supervised and unsupervised hashing methods.  相似文献   

4.
特征学习是模式识别领域的关键问题。基于自动编码器的深度神经网络通过无监督预训练与有监督微调能够有效地提取数据中关键信息,形成特征。提出一种基于栈式去噪自编码器的边际Fisher分析算法,该算法将边际Fisher分析运用于有监督微调阶段,进一步提升算法的特征学习能力。实验结果表明,该算法与标准的栈式去噪自编码器和基于受限玻尔兹曼机的深度信念网相比,具有更好的识别效果。  相似文献   

5.
Sentiment polarity detection is one of the most popular tasks related to Opinion Mining. Many papers have been presented describing one of the two main approaches used to solve this problem. On the one hand, a supervised methodology uses machine learning algorithms when training data exist. On the other hand, an unsupervised method based on a semantic orientation is applied when linguistic resources are available. However, few studies combine the two approaches. In this paper we propose the use of meta-classifiers that combine supervised and unsupervised learning in order to develop a polarity classification system. We have used a Spanish corpus of film reviews along with its parallel corpus translated into English. Firstly, we generate two individual models using these two corpora and applying machine learning algorithms. Secondly, we integrate SentiWordNet into the English corpus, generating a new unsupervised model. Finally, the three systems are combined using a meta-classifier that allows us to apply several combination algorithms such as voting system or stacking. The results obtained outperform those obtained using the systems individually and show that this approach could be considered a good strategy for polarity classification when we work with parallel corpora.  相似文献   

6.
The writer identification system identifies individuals based on their handwriting is a frequent topic in biometric authentication and verification systems. Due to its importance, numerous studies have been conducted in various languages. Researchers have established several learning methods for writer identification including supervised and unsupervised learning. However, supervised methods require a large amount of annotation data, which is impossible in most scenarios. On the other hand, unsupervised writer identification methods may be limited and dependent on feature extraction that cannot provide the proper objectives to the architecture and be misinterpreted. This paper introduces an unsupervised writer identification system that analyzes the data and recognizes the writer based on the inter-feature relations of the data to resolve the uncertainty of the features. A pairwise architecture-based Autoembedder was applied to generate clusterable embeddings for handwritten text images. Furthermore, the trained baseline architecture generates the embedding of the data image, and the K-means algorithm is used to distinguish the embedding of individual writers. The proposed model utilized the IAM dataset for the experiment as it is inconsistent with contributions from the authors but is easily accessible for writer identification tasks. In addition, traditional evaluation metrics are used in the proposed model. Finally, the proposed model is compared with a few unsupervised models, and it outperformed the state-of-the-art deep convolutional architectures in recognizing writers based on unlabeled data.  相似文献   

7.
面向社交媒体的事件聚类旨在根据事件特征实现短文本聚类.目前,事件聚类模型主要分为无监督模型和有监督模型.无监督模型聚类效果较差,有监督聚类模型依赖大量标注数据.基于此,该文提出了一种半监督事件聚类模型(SemiEC),该模型在小规模标注数据的基础上,利用LSTM表征事件,并基于线性模型计算文本相似度,进行增量聚类.然后...  相似文献   

8.
一种新的半监督入侵检测算法   总被引:3,自引:0,他引:3  
宋凌  李枚毅  李孝源 《计算机应用》2008,28(7):1781-1783
针对无监督学习的入侵检测算法准确度不高、监督学习的入侵检测算法训练样本难以获取的问题,提出了一种粒子群改进的K均值半监督入侵检测算法,利用少量的标记数据生成正确样本模型来指导大量的未标记数据聚类,对聚类后仍未能标记的数据采用粒群优化的K均值聚类,有效提高分类器的分类准确性,并实现了对新类型攻击的检测。实验结果表明,算法的整体检测效果明显优于基于无监督学习和监督学习的检测算法。  相似文献   

9.
Learning texture discrimination rules in a multiresolution system   总被引:1,自引:0,他引:1  
We describe a texture analysis system in which informative discrimination rules are learned from a multiresolution representation of time textured input. The system incorporates unsupervised and supervised learning via statistical machine learning and rule-based neural networks, respectively. The textured input is represented in the frequency-orientation space via a log-Gabor pyramidal decomposition. In the unsupervised learning stage a statistical clustering scheme is used for the quantization of the feature-vector attributes. A supervised stage follows in which labeling of the textured map is achieved using a rule-based network. Simulation results for the texture classification task are given. An application of the system to real-world problems is demonstrated  相似文献   

10.
ABSTRACT

Feature extraction (FE) methods play a central role in the classification of hyperspectral images (HSIs). However, all traditional FE methods work in original feature space (OFS), OFS may suffer from noise, outliers and poorly discriminative features. This paper presents a feature space enriching technique to address the problems of noise, outliers and poorly discriminative features which may exist in OFS. The proposed method is based on low-rank representation (LRR) with the capability of pairwise constraint preserving (PCP) termed LRR-PCP. LRR-PCP does not change the dimension of OFS and only can be used as an appropriate preprocessing procedure for any classification algorithm or DR methods. The proposed LRR-PCP aims to enrich the OFS and obtain extracted feature space (EFS) which results in features richer than OFS. The problems of noise and outliers can be decreased using LRR. But, LRR cannot preserve the intrinsic local structure of the original data and only capture the global structure of data. Therefore, two additional penalty terms are added into the objective function of LRR to keep the local discriminative ability and also preserve the data diversity. LRR-PCP method not only can be used in supervised learning but also in unsupervised and semi-supervised learning frameworks. The effectiveness of LRR-PCP is investigated on three HSI data sets using some existing DR methods and as a denoising procedure before the classification task. All experimental results and quantitative analysis demonstrate that applying LRR-PCP on OFS improves the performance of the classification and DR methods in supervised, unsupervised, and semi-supervised conditions.  相似文献   

11.
Complex application domains involve difficult pattern classification problems. The state space of these problems consists of regions that lie near class separation boundaries and require the construction of complex discriminants while for the rest regions the classification task is significantly simpler. The motivation for developing the Supervised Network Self-Organizing Map (SNet-SOM) model is to exploit this fact for designing computationally effective solutions. Specifically, the SNet-SOM utilizes unsupervised learning for classifying at the simple regions and supervised learning for the difficult ones in a two stage learning process. The unsupervised learning approach is based on the Self-Organizing Map (SOM) of Kohonen. The basic SOM is modified with a dynamic node insertion/deletion process controlled with an entropy based criterion that allows an adaptive extension of the SOM. This extension proceeds until the total number of training patterns that are mapped to neurons with high entropy (and therefore with ambiguous classification) reduces to a size manageable numerically with a capable supervised model. The second learning phase (the supervised training) has the objective of constructing better decision boundaries at the ambiguous regions. At this phase, a special supervised network is trained for the computationally reduced task of performing the classification at the ambiguous regions only. The performance of the SNet-SOM has been evaluated on both synthetic data and on an ischemia detection application with data extracted from the European ST-T database. In all cases, the utilization of SNet-SOM with supervised learning based on both Radial Basis Functions and Support Vector Machines has improved the results significantly related to those obtained with the unsupervised SOM and has enhanced the scalability of the supervised learning schemes. The highly disciplined design of the generalization performance of the Support Vector Machine allows to design the proper model for the particular training set.  相似文献   

12.
Neurocontroller design via supervised and unsupervised learning   总被引:1,自引:0,他引:1  
In this paper we study the role of supervised and unsupervised neural learning schemes in the adaptive control of nonlinear dynamic systems. We suggest and demonstrate that the teacher's knowledge in the supervised learning mode includes a-priori plant sturctural knowledge which may be employed in the design of exploratory schedules during learning that results in an unsupervised learning scheme. We further demonstrate that neurocontrollers may realize both linear and nonlinear control laws that are given explicitly in an automated teacher or implicitly through a human operator and that their robustness may be superior to that of a model based controller. Examples of both learning schemes are provided in the adaptive control of robot manipulators and a cart-pole system.  相似文献   

13.
Pang  Zhiqi  Guo  Jifeng  Sun  Wenbo  Xiao  Yanbang  Yu  Ming 《Applied Intelligence》2022,52(3):2987-3001

Although the single-domain person re-identification (Re-ID) method has achieved great accuracy, the dependence on the label in the same image domain severely limits the scalability of this method. Therefore, cross-domain Re-ID has received more and more attention. In this paper, a novel cross-domain Re-ID method combining supervised and unsupervised learning is proposed, which includes two models: a triple-condition generative adversarial network (TC-GAN) and a dual-task feature extraction network (DFE-Net). We first use TC-GAN to generate labeled images with the target style, and then we combine supervised and unsupervised learning to optimize DFE-Net. Specifically, we use labeled generated data for supervised learning. In addition, we mine effective information in the target data from two perspectives for unsupervised learning. To effectively combine the two types of learning, we design a dynamic weighting function to dynamically adjust the weights of these two approaches. To verify the validity of TC-GAN, DFE-Net, and the dynamic weight function, we conduct multiple experiments on Market-1501 and DukeMTMC-reID. The experimental results show that the dynamic weight function can improve the performance of the models, and our method is better than many state-of-the-art methods.

  相似文献   

14.
股票市场是金融分析领域中重要而困难的问题。股票数据的分析和预测具有重大的理论意义和诱人的应用价值。BP神经网络在目前的股票预测系统中应用广泛,但是作为有导师的学习系统,BP神经网络必须要求提供相关的经验数据才能正常运行。对此本文提出了一种基于强化学习BP算法应用于股票预测系统,通过强化学习体系来实现体统的自学习,通过网络集成来达到初始数据的预处理,提高系统的泛化能力,在实际应用中取的较好的效果。  相似文献   

15.
基于多示例的K-means聚类学习算法   总被引:1,自引:1,他引:0       下载免费PDF全文
谢红薇  李晓亮 《计算机工程》2009,35(22):179-181
多示例学习是继监督学习、非监督学习、强化学习后的又一机器学习框架。将多示例学习和非监督学习结合起来,在传统非监督聚类算法K-means的基础上提出MIK-means算法,该算法利用混合Hausdorff距离作为相似测度来实现数据聚类。实验表明,该方法能够有效揭示多示例数据集的内在结构,与K-means算法相比具有更好的聚类效果。  相似文献   

16.
一种半聚类的异常入侵检测算法   总被引:2,自引:0,他引:2  
俞研  黄皓 《计算机应用》2006,26(7):1640-1642
针对基于监督学习的入侵检测算法所面临的训练样本不足的问题,提出了一种结合改进k 近邻法的基于半监督聚类的异常入侵检测算法,利用少量的标记数据改善算法的学习能力,并实现了对新攻击类型的检测。实验结果表明,在标记数据极少的情况下,算法的检测结果明显好于非监督学习的算法,接近于监督学习的检测算法。  相似文献   

17.
Dimension reduction (DR) is important in the processing of data in domains such as multimedia or bioinformatics because such data can be of very high dimension. Dimension reduction in a supervised learning context is a well posed problem in that there is a clear objective of discovering a reduced representation of the data where the classes are well separated. By contrast DR in an unsupervised context is ill posed in that the overall objective is less clear. Nevertheless successful unsupervised DR techniques such as principal component analysis (PCA) exist—PCA has the pragmatic objective of transforming the data into a reduced number of dimensions that still captures most of the variation in the data. While one-class classification falls somewhere between the supervised and unsupervised learning categories, supervised DR techniques appear not to be applicable at all for one-class classification because of the absence of a second class label in the training data. In this paper we evaluate the use of a number of up-to-date unsupervised DR techniques for one-class classification and we show that techniques based on cluster coherence and locality preservation are effective.  相似文献   

18.
该文研究有监督学习方法在多文档文本情感摘要中的应用。利用从亚马逊中文网和亚马逊英文网上收集的产品评论语料,抽取文本内特征、PageRank特征、情感特征和评论质量特征,基于有监督方法进行多文档文本情感摘要抽取。实验结果表明有监督学习方法比无监督学习方法在ROUGE值上有显著的提高,情感特征和评论质量特征均有助于文本情感摘要。  相似文献   

19.
Neural-Fuzzy Models for Multispectral Image Analysis   总被引:4,自引:0,他引:4  
In this paper, we consider neural-fuzzy models for multispectral image analysis. We consider both supervised and unsupervised classification. The model for supervised classification consists of six layers. The first three layers map the input variables to fuzzy set membership functions. The last three layers implement the decision rules. The model learns decision rules using a supervised gradient descent procedure. The model for unsupervised classification consists of two layers. The algorithm is similar to competitive learning. However, here, for each input sample, membership functions of output categories are used to update weights. Input vectors are normalized, and Euclidean distance is used as the similarity measure. In this model if the input vector does not satisfy the similarity criterion, a new cluster is created; otherwise, the weights corresponding to the winner unit are updated using the fuzzy membership values of the output categories. We have developed software for these models. As an illustration, the models are used to analyze multispectral images.  相似文献   

20.
Image collections are currently widely available and are being generated in a fast pace due to mobile and accessible equipment. In principle, that is a good scenario taking into account the design of successful visual pattern recognition systems. However, in particular for classification tasks, one may need to choose which examples are more relevant in order to build a training set that well represents the data, since they often require representative and sufficient observations to be accurate. In this paper we investigated three methods for selecting relevant examples from image collections based on learning models from small portions of the available data. We considered supervised methods that need labels to allow selection, and an unsupervised method that is agnostic to labels. The image datasets studied were described using both handcrafted and deep learning features. A general purpose algorithm is proposed which uses learning methods as subroutines. We show that our relevance selection algorithm outperforms random selection, in particular when using unlabelled data in an unsupervised approach, significantly reducing the size of the training set with little decrease in the test accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号