首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
This work presents a strategy for the classification of astronomical objects based on spectrophotometric data and the use of unsupervised neural networks and statistical classification algorithms. Our strategy constitutes an essential part of the preparation phase of the automatic classification and parameterization algorithms for the data that are to be collected by the Gaia satellite of the European Space Agency (ESA), whose launch is foreseen for the spring of 2012. The proposed algorithm is based on a hierarchical structure of neural networks composed of various tree-structured SOM networks. The classification of possible astronomical objects (stars, galaxies, quasars, multiple objects, etc.) basically consists in the iterative segmentation of the inputs space and the ensuing generation of initial classifications and increase in classification precision by means of a refining process. Apart from providing a classification, our technique also measures the quality and precision of the classifications and segments the objects for which it cannot determine whether or not they belong to a pre-established class of astronomical objects (outliers).  相似文献   

2.
3.
针对部队快速机动作战的军事要求,提出基于高分辨率遥感影像的军用阵地动态监测方法。借助面向对象的多尺度分割技术将阵地影像分割为同质对象,以提取各个对象的特征;针对监督分类和非监督分类的弊端,提出通过一定的先验知识制定分类规则的方法对遥感影像进行地物识别,在此基础上定性和定量地输出变化检测结果。实验结果表明:利用基于对象影像分析方法具有较高的识别精度,能够有效监测军事阵地变化。  相似文献   

4.

Enabling information systems to face anomalies in the presence of uncertainty is a compelling and challenging task. In this work the problem of unsupervised outlier detection in large collections of data objects modeled by means of arbitrary multidimensional probability density functions is considered. We present a novel definition of uncertain distance-based outlier under the attribute level uncertainty model, according to which an uncertain object is an object that always exists but its actual value is modeled by a multivariate pdf. According to this definition an uncertain object is declared to be an outlier on the basis of the expected number of its neighbors in the dataset. To the best of our knowledge this is the first work that considers the unsupervised outlier detection problem on data objects modeled by means of arbitrarily shaped multidimensional distribution functions. We present the UDBOD algorithm which efficiently detects the outliers in an input uncertain dataset by taking advantages of three optimized phases, that are parameter estimation, candidate selection, and the candidate filtering. An experimental campaign is presented, including a sensitivity analysis, a study of the effectiveness of the technique, a comparison with related algorithms, also in presence of high dimensional data, and a discussion about the behavior of our technique in real case scenarios.

  相似文献   

5.
While several automatic keyphrase extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large annotator panel. Our annotator study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors.  相似文献   

6.
Japkowicz  Nathalie 《Machine Learning》2001,42(1-2):97-122
Binary classification is typically achieved by supervised learning methods. Nevertheless, it is also possible using unsupervised schemes. This paper describes a connectionist unsupervised approach to binary classification and compares its performance to that of its supervised counterpart. The approach consists of training an autoassociator to reconstruct the positive class of a domain at the output layer. After training, the autoassociator is used for classification, relying on the idea that if the network generalizes to a novel instance, then this instance must be positive, but that if generalization fails, then the instance must be negative. When tested on three real-world domains, the autoassociator proved more accurate at classification than its supervised counterpart, MLP, on two of these domains and as accurate on the third (Japkowicz, Myers, & Gluck, 1995). The paper seeks to generalize these results and concludes that, in addition to learning aconcept in the absence of negative examples, 1) autoassociation is more efficient than MLP in multi-modal domains, and 2) it is more accurate than MLP in multi-modal domains for which the negative class creates a particularly strong need for specialization or the positive class creates a particularly weak need for specialization. In multi-modal domains for which the positive class creates a particularly strong need for specialization, on the other hand, MLP is more accurate than autoassociation.  相似文献   

7.
The traditional CCA and 2D-CCA algorithms are unsupervised multiple feature extraction methods. Hence, introducing the supervised information of samples into these methods should be able to promote the classification performance. In this paper, a novel method is proposed to carry out the multiple feature extraction for classification, called two-dimensional supervised canonical correlation analysis (2D-SCCA), in which the supervised information is added to the criterion function. Then, by analyzing the relationship between GCCA and 2D-SCCA, another feature extraction method called multiple-rank supervised canonical correlation analysis (MSCCA) is also developed. Different from 2D-SCCA, in MSCCA k pairs left transforms and k pairs right transforms are sought to maximize the correlation. The convergence behavior and computational complexity of the algorithms are analyzed. Experimental results on real-world databases demonstrate the viability of the formulation, they also show that the classification results of our methods are higher than the other’s and the computing time is competitive. In this manner, the proposed methods proved to be the competitive multiple feature extraction and classification methods. As such, the two methods may well help to improve image recognition tasks, which are essential in many advanced expert and intelligent systems.  相似文献   

8.
This paper describes a clustering method for unsupervised classification of objects in large data sets. The new methodology combines the mixture likelihood approach with a sampling and subsampling strategy in order to cluster large data sets efficiently. This sampling strategy can be applied to a large variety of data mining methods to allow them to be used on very large data sets. The method is applied to the problem of automated star/galaxy classification for digital sky data and is tested using a sample from the Digitized Palomar Sky Survey (DPOSS) data. The method is quick and reliable and produces classifications comparable to previous work on these data using supervised clustering.  相似文献   

9.
Manifold learning methods for unsupervised nonlinear dimensionality reduction have proven effective in the visualization of high dimensional data sets. When dealing with classification tasks, supervised extensions of manifold learning techniques, in which class labels are used to improve the embedding of the training points, require an appropriate method for out-of-sample mapping.In this paper we propose multi-output kernel ridge regression (KRR) for out-of-sample mapping in supervised manifold learning, in place of general regression neural networks (GRNN) that have been adopted by previous studies on the subject. Specifically, we consider a supervised agglomerative variant of Isomap and compare the performance of classification methods when the out-of-sample embedding is based on KRR and GRNN, respectively. Extensive computational experiments, using support vector machines and k-nearest neighbors as base classifiers, provide statistical evidence that out-of-sample mapping based on KRR consistently dominates its GRNN counterpart, and that supervised agglomerative Isomap with KRR achieves a higher accuracy than direct classification methods on most data sets.  相似文献   

10.
11.
We have examined the contribution of multipolarized airborne radar data for the discrimination of crops. An unsupervised classification algorithm and a maximum likelihood supervised classification were used and compared. The results show that multipolarized radar data offer an accurate means of identifying crops. The average classification accuracies were 83 and 79 per cent for the supervised and unsupervised methods respectively. Comparison of the two methods using the same data suggests that the unsupervised method gives essentially similar results to those using the supervised classification method; however, the unsupervised method requires far less field effort and computer time.  相似文献   

12.
Roth V 《Neural computation》2006,18(4):942-960
The problem of detecting atypical objects or outliers is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The method presented in this letter bridges the gap between kernelized one-class classification and gaussian density estimation in the induced feature space. Having established the exact relation between the two concepts, it is now possible to identify atypical objects by quantifying their deviations from the gaussian model. This model-based formalization of outliers overcomes the main conceptual shortcoming of most one-class approaches, which, in a strict sense, are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. In order to overcome the inherent model selection problem of unsupervised kernel methods, a cross-validated likelihood criterion for selecting all free model parameters is applied. Experiments for detecting atypical objects in image databases effectively demonstrate the applicability of the proposed method in real-world scenarios.  相似文献   

13.
Remote-sensing images taken from the Landsat Enhanced Thematic Mapper Plus (ETM+) sensor with a spatial resolution of 30 m were applied for mapping and inventory of mangrove forest areas in Sundarbans, on both sides of the border between Bangladesh and India. Three different classification methods – unsupervised classification with k-means clustering, supervised classification using the maximum likelihood decision rule, and band-ratio supervised classification – were tested and compared in terms of the top of the atmosphere reflectance images. Spectral signature and principal component analyses were applied to select the appropriate band combinations prior to the band ratio–supervised classification. Our results show that the band ratio method is superior to the unsupervised or supervised classification methods considering the visual inspection, producer's and user's accuracy, as well as the overall accuracy of the all the classes in the image. The best discrimination of mangrove/nonmangrove boundary can be achieved when the combinations of B4/B2 (band 4/band 2), B5/B7, and B7/B4 are employed from the ETM+?bands.  相似文献   

14.
This work combines a set of available techniques – whichcould be further extended – to perform noun sense disambiguation. We use several unsupervised techniques (Rigau et al., 1997) that draw knowledge from a variety of sources. In addition, we also apply a supervised technique in order to show that supervised and unsupervised methods can be combined to obtain better results. This paper tries to prove that using an appropriate method to combine those heuristics we can disambiguate words in free running text with reasonable precision.  相似文献   

15.
In classification problems with hierarchical structures of labels, the target function must assign labels that are hierarchically organized and it can be used either for single-label (one label per instance) or multi-label classification problems (more than one label per instance). In parallel to these developments, the idea of semi-supervised learning has emerged as a solution to the problems found in a standard supervised learning procedure (used in most classification algorithms). It combines labelled and unlabelled data during the training phase. Some semi-supervised methods have been proposed for single-label classification methods. However, very little effort has been done in the context of multi-label hierarchical classification. Therefore, this paper proposes a new method for supervised hierarchical multi-label classification, called HMC-RAkEL. Additionally, we propose the use of semi-supervised learning, self-training, in hierarchical multi-label classification, leading to three new methods, called HMC-SSBR, HMC-SSLP and HMC-SSRAkEL. In order to validate the feasibility of these methods, an empirical analysis will be conducted, comparing the proposed methods with their corresponding supervised versions. The main aim of this analysis is to observe whether the semi-supervised methods proposed in this paper have similar performance of the corresponding supervised versions.  相似文献   

16.
This study analyzes the characteristics of unsupervised feature learning using a convolutional neural network (CNN) to investigate its efficiency for multi-task classification and compare it to supervised learning features. We keep the conventional CNN structure and introduce modifications into the convolutional auto-encoder design to accommodate a subsampling layer and make a fair comparison. Moreover, we introduce non-maximum suppression and dropout for a better feature extraction and to impose sparsity constraints. The experimental results indicate the effectiveness of our sparsity constraints. We also analyze the efficiency of unsupervised learning features using the t-SNE and variance ratio. The experimental results show that the feature representation obtained in unsupervised learning is more advantageous for multi-task learning than that obtained in supervised learning.  相似文献   

17.
In many large e-commerce organizations, multiple data sources are often used to describe the same customers, thus it is important to consolidate data of multiple sources for intelligent business decision making. In this paper, we propose a novel method that predicts the classification of data from multiple sources without class labels in each source. We test our method on artificial and real-world datasets, and show that it can classify the data accurately. From the machine learning perspective, our method removes the fundamental assumption of providing class labels in supervised learning, and bridges the gap between supervised and unsupervised learning.  相似文献   

18.
Dimension reduction (DR) is important in the processing of data in domains such as multimedia or bioinformatics because such data can be of very high dimension. Dimension reduction in a supervised learning context is a well posed problem in that there is a clear objective of discovering a reduced representation of the data where the classes are well separated. By contrast DR in an unsupervised context is ill posed in that the overall objective is less clear. Nevertheless successful unsupervised DR techniques such as principal component analysis (PCA) exist—PCA has the pragmatic objective of transforming the data into a reduced number of dimensions that still captures most of the variation in the data. While one-class classification falls somewhere between the supervised and unsupervised learning categories, supervised DR techniques appear not to be applicable at all for one-class classification because of the absence of a second class label in the training data. In this paper we evaluate the use of a number of up-to-date unsupervised DR techniques for one-class classification and we show that techniques based on cluster coherence and locality preservation are effective.  相似文献   

19.
Traditional pattern recognition generally involves two tasks: unsupervised clustering and supervised classification. When class information is available, fusing the advantages of both clustering learning and classification learning into a single framework is an important problem worthy of study. To date, most algorithms generally treat clustering learning and classification learning in a sequential or two-step manner, i.e., first execute clustering learning to explore structures in data, and then perform classification learning on top of the obtained structural information. However, such sequential algorithms cannot always guarantee the simultaneous optimality for both clustering and classification learning. In fact, the clustering learning in these algorithms just aids the subsequent classification learning and does not benefit from the latter. To overcome this problem, a simultaneous learning framework for clustering and classification (SCC) is presented in this paper. SCC aims to achieve three goals: (1) acquiring the robust classification and clustering simultaneously; (2) designing an effective and transparent classification mechanism; (3) revealing the underlying relationship between clusters and classes. To this end, with the Bayesian theory and the cluster posterior probabilities of classes, we define a single objective function to which the clustering process is directly embedded. By optimizing this objective function, the effective and robust clustering and classification results are achieved simultaneously. Experimental results on both synthetic and real-life datasets show that SCC achieves promising classification and clustering results at one time.  相似文献   

20.
陈翔  赵英全  顾庆  倪超  王赞 《软件学报》2019,30(12):3694-3713
软件缺陷预测技术通过挖掘和分析软件库训练出软件缺陷预测模型,随后利用该模型来预测出被测软件项目内的缺陷程序模块,因此可以有效地优化测试资源的分配.在基于代价感知的评测指标下,有监督学习方法与无监督学习方法之间的预测性能比较是最近的一个热门研究话题.其中在基于文件粒度的缺陷预测问题中,Yan等人最近对Yang等人考虑的无监督学习方法和有监督学习方法展开了大规模实证研究,结果表明存在一些无监督学习方法,其性能要优于有监督方法.基于来自开源社区的10个项目展开了实证研究.结果表明:在同项目缺陷预测场景中,若基于ACC评测指标,MULTI方法与最好的无监督方法和有监督方法相比,其预测性能平均有105.81%和123.84%的提高;若基于POPT评测指标,MULTI方法与最好的无监督方法和有监督方法相比,其预测性能平均有35.61%和38.70%的提高.在跨项目缺陷预测场景中,若基于ACC评测指标,MULTI方法与最好的无监督方法和有监督方法相比,其预测性能平均有22.42%和34.95%的提高.若基于POPT评测指标,MULTI方法与最好的无监督方法和有监督方法相比,其预测性能平均有11.45%和17.92%的提高.同时,基于Huang等人提出的PMI和IFA评测指标,MULTI方法的表现与代价感知的指标相比存在一定的折衷问题,但仍好于在ACC和POPT评测指标下表现最好的两种无监督学习方法.除此之外,将MULTI方法与最新提出的OneWay和CBS方法进行了比较,结果表明,MULTI方法在性能上仍然可以显著优于这两种方法.同时,基于F1评测指标的结果也验证了MULTI方法在预测性能上的显著优越性.最后,通过分析模型构建的时间开销,表明MULTI方法的模型构建开销对开发人员来说处于可接受的范围之内.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号