首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于自然邻居和最小生成树的原型选择算法   总被引:1,自引:0,他引:1  
朱庆生  段浪军  杨力军 《计算机科学》2017,44(4):241-245, 268
K最近邻居是最流行的有监督分类算法之一。然而,传统的K最近邻居有两个主要的问题:参数K的选择以及在大规模数据集下过高的时间和空间复杂度需求。为了解决这些问题,提出了一种新的原型选择算法,它保留了一些对分类贡献很大的关键原型点,同时移除噪声点和大多数对分类贡献较小的点。不同于其他原型选择算法,该算法使用了自然邻居这个新的邻居概念来做数据预处理,然后基于设定的终止条件构建若干个最小生成树。基于最小生成树,保留边界原型,同时生成一些具有代表性的内部原型。基于UCI基准数据集进行实验,结果表明提出的算法有效地约简了原型的数量,同时保持了与传统KNN相同水平的分类准确率;而且,该算法在分类准确率和原型保留率上优于其他原型选择算法。  相似文献   

2.
针对传统K近邻分类器在大规模数据集中存在时间和空间复杂度过高的问题,可采取原型选择的方法进行处理,即从原始数据集中挑选出代表原型(样例)进行K近邻分类而不降低其分类准确率.本文在CURE聚类算法的基础上,针对CURE的噪声点不易确定及代表点分散性差的特点,利用共享邻居密度度量给出了一种去噪方法和使用最大最小距离选取代表点进行改进,从而提出了一种新的原型选择算法PSCURE (improved prototype selection algorithm based on CURE algorithm).基于UCI数据集进行实验,结果表明:提出的PSCURE原型选择算法与相关原型算法相比,不仅能筛选出较少的原型,而且可获得较高的分类准确率.  相似文献   

3.
Prototype classifiers are a type of pattern classifiers, whereby a number of prototypes are designed for each class so as they act as representatives of the patterns of the class. Prototype classifiers are considered among the simplest and best performers in classification problems. However, they need careful positioning of prototypes to capture the distribution of each class region and/or to define the class boundaries. Standard methods, such as learning vector quantization (LVQ), are sensitive to the initial choice of the number and the locations of the prototypes and the learning rate. In this article, a new prototype classification method is proposed, namely self-generating prototypes (SGP). The main advantage of this method is that both the number of prototypes and their locations are learned from the training set without much human intervention. The proposed method is compared with other prototype classifiers such as LVQ, self-generating neural tree (SGNT) and K-nearest neighbor (K-NN) as well as Gaussian mixture model (GMM) classifiers. In our experiments, SGP achieved the best performance in many measures of performance, such as training speed, and test or classification speed. Concerning number of prototypes, and test classification accuracy, it was considerably better than the other methods, but about equal on average to the GMM classifiers. We also implemented the SGP method on the well-known STATLOG benchmark, and it beat all other 21 methods (prototype methods and non-prototype methods) in classification accuracy.  相似文献   

4.
代表点选择是面向数据挖掘与模式识别的数据预处理的重要内容之一,是提高分类器分类正确率和执行效率的重要途径。提出了一种基于投票机制的代表点选择算法,该算法能使所得到的代表点尽可能分布在类别边界上,且投票选择机制易于排除异常点,减少数据量,从而有利于提高最近邻分类器的分类精度和效率。通过与多个经典的代表点选择算法的实验比较分析,表明所提出的基于投票机制的代表点选择算法在提高最近邻分类器分类精度和数据降低率上都具有一定的优势。  相似文献   

5.
Prototype generation deals with the problem of generating a small set of instances, from a large data set, to be used by KNN for classification. The two key aspects to consider when developing a prototype generation method are: (1) the generalization performance of a KNN classifier when using the prototypes; and (2) the amount of data set reduction, as given by the number of prototypes. Both factors are in conflict because, in general, maximizing data set reduction implies decreasing accuracy and viceversa. Therefore, this problem can be naturally approached with multi-objective optimization techniques. This paper introduces a novel multi-objective evolutionary algorithm for prototype generation where the objectives are precisely the amount of reduction and an estimate of generalization performance achieved by the selected prototypes. Through a comprehensive experimental study we show that the proposed approach outperforms most of the prototype generation methods that have been proposed so far. Specifically, the proposed approach obtains prototypes that offer a better tradeoff between accuracy and reduction than alternative methodologies.  相似文献   

6.
The k-nearest neighbour rule is commonly considered for classification tasks given its straightforward implementation and good performance in many applications. However, its efficiency represents an obstacle in real-case scenarios because the classification requires computing a distance to every single prototype of the training set. Prototype Selection (PS) is a typical approach to alleviate this problem, which focuses on reducing the size of the training set by selecting the most interesting prototypes. In this context, rank methods have been postulated as a good solution: following some heuristics, these methods perform an ordering of the prototypes according to their relevance in the classification task, which is then used to select the most relevant ones. This work presents a significant improvement of existing rank methods by proposing two extensions: (i) a greater robustness against noise at label level by considering the parameter ‘k’ of the classification in the selection process; and (ii) a new parameter-free rule to select the prototypes once they have been ordered. The experiments performed in different scenarios and datasets demonstrate the goodness of these extensions. Also, it is empirically proved that the new full approach is competitive with respect to existing PS algorithms.  相似文献   

7.
A conventional way to discriminate between objects represented by dissimilarities is the nearest neighbor method. A more efficient and sometimes a more accurate solution is offered by other dissimilarity-based classifiers. They construct a decision rule based on the entire training set, but they need just a small set of prototypes, the so-called representation set, as a reference for classifying new objects. Such alternative approaches may be especially advantageous for non-Euclidean or even non-metric dissimilarities.The choice of a proper representation set for dissimilarity-based classifiers is not yet fully investigated. It appears that a random selection may work well. In this paper, a number of experiments has been conducted on various metric and non-metric dissimilarity representations and prototype selection methods. Several procedures, like traditional feature selection methods (here effectively searching for prototypes), mode seeking and linear programming are compared to the random selection. In general, we find out that systematic approaches lead to better results than the random selection, especially for a small number of prototypes. Although there is no single winner as it depends on data characteristics, the k-centres works well, in general. For two-class problems, an important observation is that our dissimilarity-based discrimination functions relying on significantly reduced prototype sets (3-10% of the training objects) offer a similar or much better classification accuracy than the best k-NN rule on the entire training set. This may be reached for multi-class data as well, however such problems are more difficult.  相似文献   

8.
In recent years, heuristic algorithms have been successfully applied to solve clustering and classification problems. In this paper, gravitational search algorithm (GSA) which is one of the newest swarm based heuristic algorithms is used to provide a prototype classifier to face the classification of instances in multi-class data sets. The proposed method employs GSA as a global searcher to find the best positions of the representatives (prototypes). The proposed GSA-based classifier is used for data classification of some of the well-known benchmark sets. Its performance is compared with the artificial bee colony (ABC), the particle swarm optimization (PSO), and nine other classifiers from the literature. The experimental results of twelve data sets from UCI machine learning repository confirm that the GSA can successfully be applied as a classifier to classification problems.  相似文献   

9.
In this paper, we propose a prototype classification method that employs a learning process to determine both the number and the location of prototypes. This learning process decides whether to stop adding prototypes according to a certain termination condition, and also adjusts the location of prototypes using either the K-means (KM) or the fuzzy c-means (FCM) clustering algorithms. When the prototype classification method is applied, the support vector machine (SVM) method can be used to post-process the top-rank candidates obtained during the prototype learning or matching process. We apply this hybrid solution to handwriting recognition and address the convergence behavior and runtime consumption of the prototype construction process, and discuss how to combine our prototype classifier with SVM classifiers to form an effective hybrid classifier.  相似文献   

10.
One of the most accurate types of prototype selection algorithms, preprocessing techniques that select a subset of instances from the data before applying nearest neighbor classification to it, are evolutionary approaches. These algorithms result in very high accuracy and reduction rates, but unfortunately come at a substantial computational cost. In this paper, we introduce a framework that allows to efficiently use the intermediary results of the prototype selection algorithms to further increase their accuracy performance. Instead of only using the fittest prototype subset generated by the evolutionary algorithm, we use multiple prototype subsets in an ensemble setting. Secondly, in order to classify a test instance, we only use prototype subsets that accurately classify training instances in the neighborhood of that test instance. In an experimental evaluation, we apply our new framework to four state-of-the-art prototype selection algorithms and show that, by using our framework, more accurate results are obtained after less evaluations of the prototype selection method. We also present a case study with a prototype generation algorithm, showing that our framework is easily extended to other preprocessing paradigms as well.  相似文献   

11.
黄云  洪佳明  覃遵跃 《计算机工程》2012,38(19):167-169,174
代表点选择是实现缩减数据集规模的有效途径,可以提高分类的准确率和执行效率.为此,通过引入分类置信度熵的概念,提出适应度评价函数,用于评估代表点的选择效果,以此找到最优的代表点集.该方法可与其他代表点选择方法结合,得到性能更优的代表点选择方法.与多个经典代表点选择方法进行实验比较,结果表明基于置信度的代表点选择方法在分类准确率和数据降低率上有一定优势.  相似文献   

12.
Local feature weighting in nearest prototype classification.   总被引:1,自引:0,他引:1  
The distance metric is the corner stone of nearest neighbor (NN)-based methods, and therefore, of nearest prototype (NP) algorithms. That is because they classify depending on the similarity of the data. When the data is characterized by a set of features which may contribute to the classification task in different levels, feature weighting or selection is required, sometimes in a local sense. However, local weighting is typically restricted to NN approaches. In this paper, we introduce local feature weighting (LFW) in NP classification. LFW provides each prototype its own weight vector, opposite to typical global weighting methods found in the NP literature, where all the prototypes share the same one. Providing each prototype its own weight vector has a novel effect in the borders of the Voronoi regions generated: They become nonlinear. We have integrated LFW with a previously developed evolutionary nearest prototype classifier (ENPC). The experiments performed both in artificial and real data sets demonstrate that the resulting algorithm that we call LFW in nearest prototype classification (LFW-NPC) avoids overfitting on training data in domains where the features may have different contribution to the classification task in different areas of the feature space. This generalization capability is also reflected in automatically obtaining an accurate and reduced set of prototypes.  相似文献   

13.
Evolutionary algorithms are adaptive methods based on natural evolution that may be used for search and optimization. As data reduction in knowledge discovery in databases (KDDs) can be viewed as a search problem, it could be solved using evolutionary algorithms (EAs). In this paper, we have carried out an empirical study of the performance of four representative EA models in which we have taken into account two different instance selection perspectives, the prototype selection and the training set selection for data reduction in KDD. This paper includes a comparison between these algorithms and other nonevolutionary instance selection algorithms. The results show that the evolutionary instance selection algorithms consistently outperform the nonevolutionary ones, the main advantages being: better instance reduction rates, higher classification accuracy, and models that are easier to interpret.  相似文献   

14.
Prototype-based classification relies on the distances between the examples to be classified and carefully chosen prototypes. A small set of prototypes is of interest to keep the computational complexity low, while maintaining high classification accuracy. An experimental study of some old and new prototype optimisation techniques is presented, in which the prototypes are either selected or generated from the given data. These condensing techniques are evaluated on real data, represented in vector spaces, by comparing their resulting reduction rates and classification performance.Usually the determination of prototypes is studied in relation with the nearest neighbour rule. We will show that the use of more general dissimilarity-based classifiers can be more beneficial. An important point in our study is that the adaptive condensing schemes here discussed allow the user to choose the number of prototypes freely according to the needs. If such techniques are combined with linear dissimilarity-based classifiers, they provide the best trade-off of small condensed sets and high classification accuracy.  相似文献   

15.
We propose a framework for learning good prototypes, called prototype generation and filtering (PGF), by integrating the strength of instance-filtering and instance-abstraction techniques using two different integration methods. The two integration methods differ in the filtering granularity as well as the degree of coupling of the techniques. In order to characterize the behavior of the effect of integration, we categorize instance-filtering techniques into three kinds, namely, (1) removing border instances, (2) retaining border instance, (3) retaining center instances. The effect of using different kinds of filtering in different variants of our PGF framework are investigated. We have conducted experiments on 35 real-world benchmark data sets. We found that our PGF framework maintains or achieves better classification accuracy and gains a significant improvement in data reduction compared with pure filtering and pure abstraction techniques as well as KNN and C4.5.  相似文献   

16.
Prototype selection problem consists of reducing the size of databases by removing samples that are considered noisy or not influential on nearest neighbour classification tasks. Evolutionary algorithms have been used recently for prototype selection showing good results. However, due to the complexity of this problem when the size of the databases increases, the behaviour of evolutionary algorithms could deteriorate considerably because of a lack of convergence. This additional problem is known as the scaling up problem.

Memetic algorithms are approaches for heuristic searches in optimization problems that combine a population-based algorithm with a local search. In this paper, we propose a model of memetic algorithm that incorporates an ad hoc local search specifically designed for optimizing the properties of prototype selection problem with the aim of tackling the scaling up problem. In order to check its performance, we have carried out an empirical study including a comparison between our proposal and previous evolutionary and non-evolutionary approaches studied in the literature.

The results have been contrasted with the use of non-parametric statistical procedures and show that our approach outperforms previously studied methods, especially when the database scales up.  相似文献   


17.
针对传统井下指纹定位算法存在需要采集大量指纹数据和定位精度不高的问题,提出了一种差分鱼群优化最小二乘支持向量机(DEAFSA-LSSVM)的井下人员无线定位算法.首先将井下实验区域划分为多个小区域,并利用克里金插值算法建立指纹数据库;然后利用差分进化与人工鱼群混合智能算法优化正则化参数和核函数宽度,建立最小二乘支持向量机算法模型,利用无线采集接收终端采集待定位点的无线信息数据,通过最小二乘支持向量机算法模型计算出其所属小区域;最后利用小区域内无线信息数据,通过加权K近邻算法进行实时定位.实验结果表明:该定位算法的收敛速度快,分类准确,准确率达到98.87%;定位精度高,平均定位误差为1.51 m,比未经优化的最小二乘支持向量机算法的定位精度提高18.82%.  相似文献   

18.
Instance-based learning methods like the nearest neighbour classifier generally suffer from the indiscriminate storage of all training instances, resulting in large memory requirements and slow execution speed. In this paper, new training set size reduction methods based on prototype generation and space partitioning are proposed. Experimental results show that the new algorithms achieve a very high reduction rate with still an important classification accuracy.  相似文献   

19.
P.A.  M.  D.K.   《Pattern recognition》2006,39(12):2344-2355
Hybrid hierarchical clustering techniques which combine the characteristics of different partitional clustering techniques or partitional and hierarchical clustering techniques are interesting. In this paper, efficient bottom-up hybrid hierarchical clustering (BHHC) techniques have been proposed for the purpose of prototype selection for protein sequence classification. In the first stage, an incremental partitional clustering technique such as leader algorithm (ordered leader no update (OLNU) method) which requires only one database (db) scan is used to find a set of subcluster representatives. In the second stage, either a hierarchical agglomerative clustering (HAC) scheme or a partitional clustering algorithm—‘K-medians’ is used on these subcluster representatives to obtain a required number of clusters. Thus, this hybrid scheme is scalable and hence would be suitable for clustering large data sets and we also get a hierarchical structure consisting of clusters and subclusters and the representatives of which are used for pattern classification. Even if more number of prototypes are generated, classification time does not increase much as only a part of the hierarchical structure is searched. The experimental results (classification accuracy (CA) using the prototypes obtained and the computation time) of the proposed algorithms are compared with that of the hierarchical agglomerative schemes, K-medians and nearest neighbour classifier (NNC) methods. The proposed methods are found to be computationally efficient with reasonably good CA.  相似文献   

20.
李净  郭洪禹 《计算机应用》2012,32(10):2899-2903
针对基于区域的图像检索系统检索精度不高的问题,提出结合文本信息的多示例原型选择算法和反馈标注机制。在示例原型选择时,首先使用文本信息进行正例拓展,然后通过估计负示例分布进行最初示例选择,最后通过示例更新和分类器学习的交替优化获得真的示例原型。相关反馈采用了多策略相结合的主动学习机制,通过信息值控制主动学习策略的自动切换,使系统能够自动选择当前最适合的主动学习策略。实验结果表明,该方法有效且性能优于其他方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号