首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A local distance measure for the nearest neighbor classification rule is shown to achieve high compression rates and high accuracy on real data sets. In the approach proposed here, first, a set of prototypes is extracted during training and, then, a feedback learning algorithm is used to optimize the metric. Even if the prototypes are randomly selected, the proposed metric outperforms, both in compression rate and accuracy, common editing procedures like ICA, RNN, and PNN. Finally, when accuracy is the major concern, we show how compression can be traded for accuracy by exploiting voting techniques. That indicates how voting can be successfully integrated with instance-based approaches, overcoming previous negative results  相似文献   

2.
In this paper, we address the problem of image set classification, where each set contains a different number of images acquired from the same subject. In most of the existing literature, each image set is modeled using all its available samples. As a result, the corresponding time and storage costs are high. To address this problem, we propose a joint prototype and metric learning approach. The prototypes are learned to represent each gallery image set using fewer samples without affecting the recognition performance. A Mahalanobis metric is learned simultaneously to measure the similarity between sets more accurately. In particular, each gallery set is represented as a regularized affine hull spanned by the learned prototypes. The set-to-set distance is optimized via updating the prototypes and the Mahalanobis metric in an alternating manner. To highlight the importance of representing image sets using fewer samples, we analyzed the corresponding test time complexity with respect to the number of images used per set. Experimental results using YouTube Celebrity, YouTube Faces, and ETH-80 datasets illustrate the efficiency on the task of video face recognition, and object categorization.  相似文献   

3.
A variant of nearest-neighbor (NN) pattern classification and supervised learning by learning vector quantization (LVQ) is described. The decision surface mapping method (DSM) is a fast supervised learning algorithm and is a member of the LVQ family of algorithms. A relatively small number of prototypes are selected from a training set of correctly classified samples. The training set is then used to adapt these prototypes to map the decision surface separating the classes. This algorithm is compared with NN pattern classification, learning vector quantization, and a two-layer perceptron trained by error backpropagation. When the class boundaries are sharply defined (i.e., no classification error in the training set), the DSM algorithm outperforms these methods with respect to error rates, learning rates, and the number of prototypes required to describe class boundaries.  相似文献   

4.
Clustering is an important unsupervised learning technique widely used to discover the inherent structure of a given data set. Some existing clustering algorithms uses single prototype to represent each cluster, which may not adequately model the clusters of arbitrary shape and size and hence limit the clustering performance on complex data structure. This paper proposes a clustering algorithm to represent one cluster by multiple prototypes. The squared-error clustering is used to produce a number of prototypes to locate the regions of high density because of its low computational cost and yet good performance. A separation measure is proposed to evaluate how well two prototypes are separated. Multiple prototypes with small separations are grouped into a given number of clusters in the agglomerative method. New prototypes are iteratively added to improve the poor cluster separations. As a result, the proposed algorithm can discover the clusters of complex structure with robustness to initial settings. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the proposed clustering algorithm.  相似文献   

5.
Clustering Incomplete Data Using Kernel-Based Fuzzy C-means Algorithm   总被引:3,自引:0,他引:3  
  相似文献   

6.
基于自然邻居和最小生成树的原型选择算法   总被引:1,自引:0,他引:1  
朱庆生  段浪军  杨力军 《计算机科学》2017,44(4):241-245, 268
K最近邻居是最流行的有监督分类算法之一。然而,传统的K最近邻居有两个主要的问题:参数K的选择以及在大规模数据集下过高的时间和空间复杂度需求。为了解决这些问题,提出了一种新的原型选择算法,它保留了一些对分类贡献很大的关键原型点,同时移除噪声点和大多数对分类贡献较小的点。不同于其他原型选择算法,该算法使用了自然邻居这个新的邻居概念来做数据预处理,然后基于设定的终止条件构建若干个最小生成树。基于最小生成树,保留边界原型,同时生成一些具有代表性的内部原型。基于UCI基准数据集进行实验,结果表明提出的算法有效地约简了原型的数量,同时保持了与传统KNN相同水平的分类准确率;而且,该算法在分类准确率和原型保留率上优于其他原型选择算法。  相似文献   

7.
Prototype classifiers have been studied for many years. However, few methods can realize incremental learning. On the other hand, most prototype classifiers need users to predetermine the number of prototypes; an improper prototype number might undermine the classification performance. To deal with these issues, in the paper we propose an online supervised algorithm named Incremental Learning Vector Quantization (ILVQ) for classification tasks. The proposed method has three contributions. (1) By designing an insertion policy, ILVQ incrementally learns new prototypes, including both between-class incremental learning and within-class incremental learning. (2) By employing an adaptive threshold scheme, ILVQ automatically learns the number of prototypes needed for each class dynamically according to the distribution of training data. Therefore, unlike most current prototype classifiers, ILVQ needs no prior knowledge of the number of prototypes or their initial value. (3) A technique for removing useless prototypes is used to eliminate noise interrupted into the input data. Results of experiments show that the proposed ILVQ can accommodate the incremental data environment and provide good recognition performance and storage efficiency.  相似文献   

8.
针对传统K近邻分类器在大规模数据集中存在时间和空间复杂度过高的问题,可采取原型选择的方法进行处理,即从原始数据集中挑选出代表原型(样例)进行K近邻分类而不降低其分类准确率.本文在CURE聚类算法的基础上,针对CURE的噪声点不易确定及代表点分散性差的特点,利用共享邻居密度度量给出了一种去噪方法和使用最大最小距离选取代表点进行改进,从而提出了一种新的原型选择算法PSCURE (improved prototype selection algorithm based on CURE algorithm).基于UCI数据集进行实验,结果表明:提出的PSCURE原型选择算法与相关原型算法相比,不仅能筛选出较少的原型,而且可获得较高的分类准确率.  相似文献   

9.
We propose two new comprehensive schemes for designing prototype-based classifiers. The scheme addresses all major issues (number of prototypes, generation of prototypes, and utilization of the prototypes) involved in the design of a prototype-based classifier. First we use Kohonen's self-organizing feature map (SOFM) algorithm to produce a minimum number (equal to the number of classes) of initial prototypes. Then we use a dynamic prototype generation and tuning algorithm (DYNAGEN) involving merging, splitting, deleting, and retraining of the prototypes to generate an adequate number of useful prototypes. These prototypes are used to design a "1 nearest multiple prototype (1-NMP)" classifier. Though the classifier performs quite well, it cannot reasonably deal with large variation of variance among the data from different classes. To overcome this deficiency we design a "1 most similar prototype (1-MSP)" classifier. We use the prototypes generated by the SOFM-based DYNAGEN algorithm and associate with each of them a zone of influence. A norm (Euclidean)-induced similarity measure is used for this. The prototypes and their zones of influence are fine-tuned by minimizing an error function. Both classifiers are trained and tested using several data sets, and a consistent improvement in performance of the latter over the former has been observed. We also compared our classifiers with some benchmark results available in the literature.  相似文献   

10.
In this paper, we present a new geodesic distance transform that uses a non-Euclidean metric suitable for non-convex discrete 2D domains. The geodesic metric used is defined as the shortest path length through a set of pixels called Locally Nearest Hidden Pixels, and manages visibility zones using bounding angles. The algorithm is designed using ordered propagation, which makes it extremely efficient and linear in the number of pixels in the domain. We have compared our algorithm with the four most similar geodesic distance transform techniques, and we show that our approach has higher accuracy and lower computational complexity.  相似文献   

11.
基于多克隆的进化免疫网络聚类算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对传统的聚类算法存在对初始值敏感、易陷入局部最小值,且对类别数和聚类原型的先验知识依赖比较大等问题。提出了一种基于多克隆的进化免疫网络聚类算法,该算法使用了多克隆算子,增加了种群的多样性,扩大了解空间的搜索范围。利用禁忌克隆运算,使处于模糊边界的抗体处于抑制状态,提高了聚类的精度。仿真实验表明,当对具有数值和类属的混合特征属性的数据及具有模糊边界的数据进行聚类时,收敛速度快且不依赖初始原型的选择。  相似文献   

12.
基于混合策略的多分辨率算法是当前3D医学图像刚体配准中普遍采用的方法,不过其仅仅是优化算法的混合。通过研究不同分辨率对一阶互信息(常称为互信息)和二阶互信息配准的影响,在二级多分辨率策略的配准中,各级采用相对更适合的相似性测度,提出了混合优化算法和混合测度的改进算法。实验表明,改进算法在配准精度上达到了亚体素级,且明显优于基于单一测度的算法,在配准速度上远远快于基于二阶互信息单一测度的算法,略慢于基于一阶互信息单一测度的算法。  相似文献   

13.
14.
This paper deals with the task of finding a set of prototypes from the training set. A reduced set is obtained which is used instead of the training set when nearest neighbour classification is used. Prototypes are added in an incremental fashion, where at each step of the algorithm, the number of prototypes selected keeps on increasing. The number of patterns in the training data classified correctly also keeps on increasing till all patterns are classified properly. After this, a deletion operator is used where some prototypes which are not so useful are removed. This method has been used to obtain the prototypes for a variety of benchmark data sets and results have been presented.  相似文献   

15.
Self-splitting competitive learning: a new on-line clusteringparadigm   总被引:2,自引:0,他引:2  
Clustering in the neural-network literature is generally based on the competitive learning paradigm. The paper addresses two major issues associated with conventional competitive learning, namely, sensitivity to initialization and difficulty in determining the number of prototypes. In general, selecting the appropriate number of prototypes is a difficult task, as we do not usually know the number of clusters in the input data a priori. It is therefore desirable to develop an algorithm that has no dependency on the initial prototype locations and is able to adaptively generate prototypes to fit the input data patterns. We present a new, more powerful competitive learning algorithm, self-splitting competitive learning (SSCL), that is able to find the natural number of clusters based on the one-prototype-take-one-cluster (OPTOC) paradigm and a self-splitting validity measure. It starts with a single prototype randomly initialized in the feature space and splits adaptively during the learning process until all clusters are found; each cluster is associated with a prototype at its center. We have conducted extensive experiments to demonstrate the effectiveness of the SSCL algorithm. The results show that SSCL has the desired ability for a variety of applications, including unsupervised classification, curve detection, and image segmentation.  相似文献   

16.
The self-organizing map (SOM) and neural gas (NG) and generalizations thereof such as the generative topographic map constitute popular algorithms to represent data by means of prototypes arranged on a (hopefully) topology representing map. Most standard methods rely on the Euclidean metric, hence the resulting clusters tend to have isotropic form and they cannot account for local distortions or correlations of data. For this reason, several proposals exist in the literature which extend prototype-based clustering towards more general models which, for example, incorporate local principal directions into the winner computation. This allows to represent data faithfully using less prototypes. In this contribution, we establish a link of models which rely on local principal components (PCA), matrix learning, and a formal cost function of NG and SOM which allows to show convergence of the algorithm. For this purpose, we consider an extension of prototype-based clustering algorithms such as NG and SOM towards a more general metric which is given by a full adaptive matrix such that ellipsoidal clusters are accounted for. The approach is derived from a natural extension of the standard cost functions of NG and SOM (in the form of Heskes). We obtain batch optimization learning rules for prototype and matrix adaptation based on these generalized cost functions and we show convergence of the algorithm. The batch optimization schemes can be interpreted as local principal component analysis (PCA) and the local eigenvectors correspond to the main axes of the ellipsoidal clusters. Thus, this approach provides a cost function associated to proposals in the literature which combine SOM or NG with local PCA models. We demonstrate the behavior of matrix NG and SOM in several benchmark examples and in an application to image compression.  相似文献   

17.
In this paper, we focus on the study of evolutionary algorithms for solving multiobjective optimization problems with a large number of objectives. First, a comparative study of a newly developed dynamical multiobjective evolutionary algorithm (DMOEA) and some modern algorithms, such as the indicator-based evolutionary algorithm, multiple single objective Pareto sampling, and nondominated sorting genetic algorithm II, is presented by employing the convergence metric and relative hypervolume metric. For three scalable test problems (namely, DTLZ1, DTLZ2, and DTLZ6), which represent some of the most difficult problems studied in the literature, the DMOEA shows good performance in both converging to the true Pareto-optimal front and maintaining a widely distributed set of solutions. Second, a new definition of optimality (namely, L-optimality) is proposed in this paper, which not only takes into account the number of improved objective values but also considers the values of improved objective functions if all objectives have the same importance. We prove that L-optimal solutions are subsets of Pareto-optimal solutions. Finally, the new algorithm based on L-optimality (namely, MDMOEA) is developed, and simulation and comparative results indicate that well-distributed L-optimal solutions can be obtained by utilizing the MDMOEA but cannot be achieved by applying L-optimality to make a posteriori selection within the huge Pareto nondominated solutions. We can conclude that our new algorithm is suitable to tackle many-objective problems.   相似文献   

18.
In clustering algorithms, it is usually assumed that the number of clusters is known or given. In the absence of such a priori information, a procedure is needed to find an appropriate number of clusters. This paper presents a clustering algorithm that incorporates a mechanism for finding the appropriate number of clusters as well as the locations of cluster prototypes. This algorithm, called multi-scale clustering, is based on scale-space theory by considering that any prominent data structure ought to survive over many scales. The number of clusters as well as the locations of cluster prototypes are found in an objective manner by defining and using lifetime and drift speed clustering criteria. The outcome of this algorithm does not depend on the initial prototype locations that affect the outcome of many clustering algorithms. As an application of this algorithm, it is used to enhance the Hough transform technique.  相似文献   

19.
张伟  曾瑞弼  胡明晓 《计算机应用》2012,32(4):1116-1118
针对带权无向图的输出需用边长反映权值大小的问题,提出了一种基于遗传算法的带权无向图画图算法,通过对顶点坐标的编码进行交叉和变异来得到理想的节点坐标,变异算子结合了非一致性变异和单点邻域变异,并在适应度函数中运用顶点平均距离、边交叉数、多度顶点相关边夹角均匀度、边的权值长度比一致程度四个美学标准。实验结果表明,该算法画出的图形连线无交叉,分支清晰,权值—长度相合,能得到清晰、美观且能直观反映权值的可视化输出结果,可应用于带权无向图的可视化输出系统的设计。  相似文献   

20.
P.A.  M.  D.K.   《Pattern recognition》2006,39(12):2344-2355
Hybrid hierarchical clustering techniques which combine the characteristics of different partitional clustering techniques or partitional and hierarchical clustering techniques are interesting. In this paper, efficient bottom-up hybrid hierarchical clustering (BHHC) techniques have been proposed for the purpose of prototype selection for protein sequence classification. In the first stage, an incremental partitional clustering technique such as leader algorithm (ordered leader no update (OLNU) method) which requires only one database (db) scan is used to find a set of subcluster representatives. In the second stage, either a hierarchical agglomerative clustering (HAC) scheme or a partitional clustering algorithm—‘K-medians’ is used on these subcluster representatives to obtain a required number of clusters. Thus, this hybrid scheme is scalable and hence would be suitable for clustering large data sets and we also get a hierarchical structure consisting of clusters and subclusters and the representatives of which are used for pattern classification. Even if more number of prototypes are generated, classification time does not increase much as only a part of the hierarchical structure is searched. The experimental results (classification accuracy (CA) using the prototypes obtained and the computation time) of the proposed algorithms are compared with that of the hierarchical agglomerative schemes, K-medians and nearest neighbour classifier (NNC) methods. The proposed methods are found to be computationally efficient with reasonably good CA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号