共查询到20条相似文献,搜索用时 0 毫秒
1.
Clustering with constraints is a powerful method that allows users to specify background knowledge and the expected cluster
properties. Significant work has explored the incorporation of instance-level constraints into non-hierarchical clustering
but not into hierarchical clustering algorithms. In this paper we present a formal complexity analysis of the problem and
show that constraints can be used to not only improve the quality of the resultant dendrogram but also the efficiency of the
algorithms. This is particularly important since many agglomerative style algorithms have running times that are quadratic
(or faster growing) functions of the number of instances to be clustered. We present several bounds on the improvement in
the running times of algorithms obtainable using constraints.
A preliminary version of this paper appeared as Davidson and Ravi (2005b). 相似文献
2.
Michel Jambu 《Computers & Geosciences》1981,7(3):297-310
A rapid hierarchical classification program enables the clustering of 5000 elements in only a few minutes of central processor time using an IBM 370/168 computer. The program algorithm, based on the reductibility axiom in graph theory, is related to the criterion of correspondence analysis. Its application to a set of hydrogeological data is described briefly. 相似文献
3.
《Expert systems with applications》2014,41(17):7671-7677
Hierarchical classification can be seen as a multidimensional classification problem where the objective is to predict a class, or set of classes, according to a taxonomy. There have been different proposals for hierarchical classification, including local and global approaches. Local approaches can suffer from the inconsistency problem, that is, if a local classifier has a wrong prediction, the error propagates down the hierarchy. Global approaches tend to produce more complex models. In this paper, we propose an alternative approach inspired in multidimensional classification. It starts by building a multi-class classifier per each parent node in the hierarchy. In the classification phase, all the local classifiers are applied simultaneously to each instance, providing a probability for each class in the taxonomy. Then the probability of the subset of classes, for each path in the hierarchy, is obtained by combining the local classifiers results. The path with highest probability is returned as the result for all the levels in the hierarchy. As an extension of the proposal method, we also developed a new technique, based on information gain, to classifies at different levels in the hierarchy. The proposed method was tested on different hierarchical classification data sets and was compared against state-of-the-art methods, resulting in superior predictive performance and/or efficiency to the other approaches in all the datasets. 相似文献
4.
Decision trees for hierarchical multi-label classification 总被引:3,自引:0,他引:3
Celine Vens Jan Struyf Leander Schietgat Sašo Džeroski Hendrik Blockeel 《Machine Learning》2008,73(2):185-214
Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes
at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction
of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single
HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees
(one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously,
the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited
by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy
of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents
(DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified
to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS’s FunCat
(tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions:
predictive accuracy, model size, and induction time. We conclude that HMC trees should definitely be considered in HMC tasks
where interpretable models are desired. 相似文献
5.
Alena Lukasov 《Pattern recognition》1979,11(5-6):365-381
In this paper the hierarchical agglomerative clustering procedure with the dissimilarity coefficient D [HACP, D] and the definite hierarchical clustering procedure [DHACP, D] including some of the published hierarchical clustering methods are introduced. The formal definitions of both of them are given, by means of which the properties of these procedures can be investigated. An example of applying the procedure concerns the classification in paleobiology. 相似文献
6.
In this paper, a hierarchical multi-classification approach using support vector machines (SVM) has been proposed for road intersection detection and classification. Our method has two main steps. The first involves the road detection. For this purpose, an edge-based approach has been developed using the bird’s eye view image which is mapped from the perspective view of the road scene. Then, the concept of vertical spoke has been introduced for road boundary form extraction. The second step deals with the problem of road intersection detection and classification. It consists on building a hierarchical SVM classifier of the extracted road forms using the unbalanced decision tree architecture. Many measures are incorporated for good evaluation of the proposed solution. The obtained results are compared to those of Choi et al. (2007). 相似文献
7.
《Expert systems with applications》2014,41(14):6075-6085
In classification problems with hierarchical structures of labels, the target function must assign labels that are hierarchically organized and it can be used either for single-label (one label per instance) or multi-label classification problems (more than one label per instance). In parallel to these developments, the idea of semi-supervised learning has emerged as a solution to the problems found in a standard supervised learning procedure (used in most classification algorithms). It combines labelled and unlabelled data during the training phase. Some semi-supervised methods have been proposed for single-label classification methods. However, very little effort has been done in the context of multi-label hierarchical classification. Therefore, this paper proposes a new method for supervised hierarchical multi-label classification, called HMC-RAkEL. Additionally, we propose the use of semi-supervised learning, self-training, in hierarchical multi-label classification, leading to three new methods, called HMC-SSBR, HMC-SSLP and HMC-SSRAkEL. In order to validate the feasibility of these methods, an empirical analysis will be conducted, comparing the proposed methods with their corresponding supervised versions. The main aim of this analysis is to observe whether the semi-supervised methods proposed in this paper have similar performance of the corresponding supervised versions. 相似文献
8.
Geometrical representation of objects by means of tree models is based on order relations of the interdistances between the individuals that constitute the classifying set as a whole. Therefore, in this paper we show the pseudocontinuity of some classification methods that satisfy certain regularity conditions based on the conservation of the order in hierarchy construction. Finally, we prove that the required regularity conditions are not restrictive in excess and are accomplished among other methods for single linkage method, complete linkage method and UPGMA method. 相似文献
9.
针对文本情感分类准确率不高的问题,提出基于CCA-VSM分类器和KFD的多级文本情感分类方法。采用典型相关性分析对文档的权重特征向量和词性特征向量进行降维,在约简向量集上构建向量空间模型,根据模型之间的差异度设计VSM分类器,筛选出与测试文档差异度较小的R个模型作为核Fisher判别的输入,最终判别出文档的情感观点。实验结果表明:该方法比传统支持向量机有较高的分类准确率和较快的分类速度,权重特征和词性特征对分类准确率的影响较大。 相似文献
10.
本文研究了用模糊聚类分析这一方法,对由吸附丝油气化探法得到的大量数据进行处理,找出油气综合异常晕圈。用已知地区对整个处理方法进行验证,所得结果与已知地质情况极为吻合。随又将该方法用于未知地区的数据处理,整个数据处理过程用自编的 BASIC 程序在 IBM/PC 计算机上完成。 相似文献
11.
Fingerprint classification is still a challenging problem due to large intra-class variability, small inter-class variability and the presence of noise. To deal with these difficulties, we propose a regularized orientation diffusion model for fingerprint orientation extraction and a hierarchical classifier for fingerprint classification in this paper. The proposed classification algorithm is composed of five cascading stages. The first stage rapidly distinguishes a majority of Arch by using complex filter responses. The second stage distinguishes a majority of Whorl by using core points and ridge line flow classifier. In the third stage, K-NN classifier finds the top two categories by using orientation field and complex filter responses. In the fourth stage, ridge line flow classifier is used to distinguish Loop from other classes except Whorl. SVM is adopted to make the final classification in the last stage. The regularized orientation diffusion model has been evaluated on a web-based automated evaluation system FVC-onGoing, and a promising result is obtained. The classification method has been evaluated on the NIST SD 4. It achieved a classification accuracy of 95.9% for five-class classification and 97.2% for four-class classification without rejection. 相似文献
12.
K. Ichida 《Computers & Industrial Engineering》1996,31(3-4):933-937
An interval analysis method is described for finding the global maximum of a multimodal multivariable function subject to equality and/or inequality constraints. By discarding subregions where the global solution can not exist and applying the interval Newton method to solve the Lagrange equation, one can always find the solution with the rigorous error bound. Some numerical examples are given. 相似文献
13.
Peter de Souza 《Pattern recognition》1982,15(3):193-200
This article is concerned with the problem of labelling an unidentified parameter vector as belonging to one of a number of given classes. A cluster-analytic approach to the design of binary decision trees is discussed, but the major part of the paper is devoted to the construction of binary features and the creation of a binary feature vector as a means of pattern classification. Complete algorithms are described and some worked examples are also presented. 相似文献
14.
《国际计算机数学杂志》2012,89(13):2887-2902
Taking a satellite module layout design as engineering background, this paper gives constrained test problems for an unequal circle packing whose optimal solutions are all given. Given a circular container D with radius R, the test problem can be constructed in the following steps. First, M=217 circles are packed into D without overlaps by ‘packing with a tangent circle’ to get the values of radii and centroid coordinates of the circles, which are expressed by R. Then the 217 circles are arranged in descending sequence of radius and are divided into 23 groups according to the radius. Finally, seven test problems are constructed according to the circles of q=1, 2, …, 7 groups. The optimal solution to the test problems as well as its optimality and uniqueness proof are also presented. The experimental results show that the test problems can effectively evaluate performances of different evolutionary algorithms. 相似文献
15.
Image classification is an essential task in content-based image retrieval.However,due to the semantic gap between low-level visual features and high-level semantic concepts,and the diversification of Web images,the performance of traditional classification approaches is far from users’ expectations.In an attempt to reduce the semantic gap and satisfy the urgent requirements for dimensionality reduction,high-quality retrieval results,and batch-based processing,we propose a hierarchical image manifold with novel distance measures for calculation.Assuming that the images in an image set describe the same or similar object but have various scenes,we formulate two kinds of manifolds,object manifold and scene manifold,at different levels of semantic granularity.Object manifold is developed for object-level classification using an algorithm named extended locally linear embedding(ELLE) based on intra-and inter-object difference measures.Scene manifold is built for scene-level classification using an algorithm named locally linear submanifold extraction(LLSE) by combining linear perturbation and region growing.Experimental results show that our method is effective in improving the performance of classifying Web images. 相似文献
16.
Thea Ghiselli-Crippa
Amro El-Jaroudi
《Engineering Applications of Artificial Intelligence》1993,6(6):549-557This paper describes a fast training algorithm for feedforward neural nets, as applied to a two-layer neural network to classify segments of speech as voiced, unvoiced, or silence. The speech classification method is based on five features computed for each speech segment and used as input to the network. The network weights are trained using a new fast training algorithm which minimizes the total least squares error between the actual output of the network and the corresponding desired output. The iterative training algorithm uses a quasi-Newtonian error-minimization method and employs a positive-definite approximation of the Hessian matrix to quickly converge to a locally optimal set of weights. Convergence is fast, with a local minimum typically reached within ten iterations; in terms of convergence speed, the algorithm compares favorably with other training techniques. When used for voiced-unvoiced-silence classification of speech frames, the network performance compares favorably with current approaches. Moreover, the approach used has the advantage of requiring no assumption of a particular probability distribution for the input features. 相似文献
17.
This paper presents an analysis of the design of classifiers for use in a hierarchical object recognition approach. In this approach, a cascade of classifiers is arranged in a tree in order to recognize multiple object classes. We are interested in the problem of recognizing multiple patterns as it is closely related to the problem of locating an articulated object. Each different pattern class corresponds to the hand in a different pose, or set of poses. For this problem obtaining labelled training data of the hand in a given pose can be problematic. Given a parametric 3D model, generating training data in the form of example images is cheap, and we demonstrate that it can be used to design classifiers almost as good as those trained using non-synthetic data. We compare a variety of different template-based classifiers and discuss their merits. 相似文献
18.
Share price trends can be recognized by using data clustering methods. However, the accuracy of these methods may be rather low. This paper presents a novel supervised classification scheme for the recognition and prediction of share price trends. We first produce a smooth time series using zero-phase filtering and singular spectrum analysis from the original share price data. We train pattern classifiers using the classification results of both original and filtered time series and then use these classifiers to predict the future share price trends. Experiment results obtained from both synthetic data and real share prices show that the proposed method is effective and outperforms the well-known K-means clustering algorithm. 相似文献
19.
唐凯 《计算机工程与应用》2007,43(3):168-172,193
提出了一种以XML文件内在的分层结构为基础的文件分类方法,井与改良的VSM方击的实验结果进行了比较。和以往XML文件的分类方法不同的是.此方法更加注重XML文件特有的结构信息。首先利用TF-IDF方法针对XML文件非蛄构的信息产生一般特征集,然后再针对XML文件各个屡次重要性赋予一定的权重。从而产生层次特征集,然后根据一些领域知识,产生知识特征榘。将三个特征集结合起来对XML进行分类。试验结果表明,这种方法比改良的VSM方法在分类的准确性方面有大幅的提高。 相似文献