首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A method of prototype sample selection from a training set for a classifier of K nearest neighbors (KNN), based on minimization of the complete cross validation functional, is proposed. The optimization leads to reduction of the training set to the minimum sufficient number of prototypes, removal (censoring) of noise samples, and improvement of the generalization ability, simultaneously.  相似文献   

2.
基于自然邻居和最小生成树的原型选择算法   总被引:1,自引:0,他引:1  
朱庆生  段浪军  杨力军 《计算机科学》2017,44(4):241-245, 268
K最近邻居是最流行的有监督分类算法之一。然而,传统的K最近邻居有两个主要的问题:参数K的选择以及在大规模数据集下过高的时间和空间复杂度需求。为了解决这些问题,提出了一种新的原型选择算法,它保留了一些对分类贡献很大的关键原型点,同时移除噪声点和大多数对分类贡献较小的点。不同于其他原型选择算法,该算法使用了自然邻居这个新的邻居概念来做数据预处理,然后基于设定的终止条件构建若干个最小生成树。基于最小生成树,保留边界原型,同时生成一些具有代表性的内部原型。基于UCI基准数据集进行实验,结果表明提出的算法有效地约简了原型的数量,同时保持了与传统KNN相同水平的分类准确率;而且,该算法在分类准确率和原型保留率上优于其他原型选择算法。  相似文献   

3.
Nearest neighbor classification is one of the most used and well known methods in data mining. Its simplest version has several drawbacks, such as low efficiency, high storage requirements and sensitivity to noise. Data reduction techniques have been used to alleviate these shortcomings. Among them, prototype selection and generation techniques have been shown to be very effective. Positioning adjustment of prototypes is a successful trend within the prototype generation methodology.Evolutionary algorithms are adaptive methods based on natural evolution that may be used for searching and optimization. Positioning adjustment of prototypes can be viewed as an optimization problem, thus it can be solved using evolutionary algorithms. This paper proposes a differential evolution based approach for optimizing the positioning of prototypes. Specifically, we provide a complete study of the performance of four recent advances in differential evolution. Furthermore, we show the good synergy obtained by the combination of a prototype selection stage with an optimization of the positioning of prototypes previous to nearest neighbor classification. The results are contrasted with non-parametrical statistical tests and show that our proposals outperform previously proposed methods.  相似文献   

4.
The problem addressed in this paper concerns the prototype generation for a cluster-based nearest-neighbour classifier.It considers,to classify a test pattern,the lines that link the patterns of the training set and a set of prototypes. An efficient method based on clustering is here used for finding subgroups of similar patterns with centroid being used as prototype.A learning method is used for iteratively adjusting both position and local-metric of the prototypes.Finally, we show that a simple adaptive distance measure improves the performance of our nearest-neighbour-based classifier.The performance improvement with respect to other nearest-neighbour-based classifiers is validated by testing our method on a lightning classification task using data acquired from the Fast On-orbit Recording of Transient Events (FORTE) satellite, moreover the performance improvement is validated through experiments with several benchmark datasets.The performance of the proposed methods are also validated using the Wilcoxon Signed-Rank test.  相似文献   

5.
This paper presents some new approaches for computing graph prototypes in the context of the design of a structural nearest prototype classifier. Four kinds of prototypes are investigated and compared: set median graphs, generalized median graphs, set discriminative graphs and generalized discriminative graphs. They differ according to (i) the graph space where they are searched for and (ii) the objective function which is used for their computation. The first criterion allows to distinguish set prototypes which are selected in the initial graph training set from generalized prototypes which are generated in an infinite set of graphs. The second criterion allows to distinguish median graphs which minimize the sum of distances to all input graphs of a given class from discriminative graphs, which are computed using classification performance as criterion, taking into account the inter-class distribution. For each kind of prototype, the proposed approach allows to identify one or many prototypes per class, in order to manage the trade-off between the classification accuracy and the classification time.Each graph prototype generation/selection is performed through a genetic algorithm which can be specialized to each case by setting the appropriate encoding scheme, fitness and genetic operators.An experimental study performed on several graph databases shows the superiority of the generation approach over the selection one. On the other hand, discriminative prototypes outperform the generative ones. Moreover, we show that the classification rates are improved while the number of prototypes increases. Finally, we show that discriminative prototypes give better results than the median graph based classifier.  相似文献   

6.
针对传统K近邻分类器在大规模数据集中存在时间和空间复杂度过高的问题,可采取原型选择的方法进行处理,即从原始数据集中挑选出代表原型(样例)进行K近邻分类而不降低其分类准确率.本文在CURE聚类算法的基础上,针对CURE的噪声点不易确定及代表点分散性差的特点,利用共享邻居密度度量给出了一种去噪方法和使用最大最小距离选取代表点进行改进,从而提出了一种新的原型选择算法PSCURE (improved prototype selection algorithm based on CURE algorithm).基于UCI数据集进行实验,结果表明:提出的PSCURE原型选择算法与相关原型算法相比,不仅能筛选出较少的原型,而且可获得较高的分类准确率.  相似文献   

7.
基于模糊软集合理论的文本分类方法   总被引:3,自引:0,他引:3  
为提高文本分类精度,提出一种基于模糊软集合理论的文本分类方法。该方法把文本训练集表示成模糊软集合表格形式,通过约简、构造软集合对照表方法找出待分类文本所属类别,并针对文本特征提取过程中由于相近特征而导致分类精度下降问题给出一种基于正则化互信息特征选择算法,有效地解决了上述问题。与传统的KNN和SVM分类算法相比,模糊软集合方法在文本分类的精度和准度上都有所提高。  相似文献   

8.
We propose a framework for learning good prototypes, called prototype generation and filtering (PGF), by integrating the strength of instance-filtering and instance-abstraction techniques using two different integration methods. The two integration methods differ in the filtering granularity as well as the degree of coupling of the techniques. In order to characterize the behavior of the effect of integration, we categorize instance-filtering techniques into three kinds, namely, (1) removing border instances, (2) retaining border instance, (3) retaining center instances. The effect of using different kinds of filtering in different variants of our PGF framework are investigated. We have conducted experiments on 35 real-world benchmark data sets. We found that our PGF framework maintains or achieves better classification accuracy and gains a significant improvement in data reduction compared with pure filtering and pure abstraction techniques as well as KNN and C4.5.  相似文献   

9.
基于样本密度和分类误差率的增量学习矢量量化算法研究   总被引:1,自引:0,他引:1  
李娟  王宇平 《自动化学报》2015,41(6):1187-1200
作为一种简单而成熟的分类方法, K最近邻(K nearest neighbor, KNN)算法在数据挖掘、模式识别等领域获得了广泛的应用, 但仍存在计算量大、高空间消耗、运行时间长等问题. 针对这些问题, 本文在增量学习型矢量量化(Incremental learning vector quantization, ILVQ)的单层竞争学习基础上, 融合样本密度和分类误差率的邻域思想, 提出了一种新的增量学习型矢量量化方法, 通过竞争学习策略对代表点邻域实现自适应增删、合并、分裂等操作, 快速获取原始数据集的原型集, 进而在保障分类精度基础上, 达到对大规模数据的高压缩效应. 此外, 对传统近邻分类算法进行了改进, 将原型近邻集的样本密度和分类误差率纳入到近邻判决准则中. 所提出算法通过单遍扫描学习训练集可快速生成有效的代表原型集, 具有较好的通用性. 实验结果表明, 该方法同其他算法相比较, 不仅可以保持甚至提高分类的准确性和压缩比, 且具有快速分类的优势.  相似文献   

10.
周靖  刘晋胜 《计算机应用》2011,31(7):1785-1788
特征参数分类泛化性差及分类计算量大影响着K近邻(KNN)的分类性能。提出了一种降维条件下基于联合熵的改进KNN算法,其具体思路是,通过计算任意两个条件属性下对应的特征参数的联合熵衡量数据特征针对分类影响程度的大小,建立特征分类特性与具体分类过程的内在联系,并给出根据特征联合熵集约简条件属性的方法。理论分析与仿真实验表明,与经典KNN等算法相比,提出的算法具有更高的分类性能。  相似文献   

11.
In solving pattern recognition problems, many classification methods, such as the nearest-neighbor (NN) rule, need to determine prototypes from a training set. To improve the performance of these classifiers in finding an efficient set of prototypes, this paper introduces a training sample sequence planning method. In particular, by estimating the relative nearness of the training samples to the decision boundary, the approach proposed here incrementally increases the number of prototypes until the desired classification accuracy has been reached. This approach has been tested with a NN classification method and a neural network training approach. Studies based on both artificial and real data demonstrate that higher classification accuracy can be achieved with fewer prototypes.  相似文献   

12.
Allograph prototype approaches for writer identification have been gaining popularity recently due to its simplicity and promising identification rates. Character prototypes that are used as allographs produce a consistent set of templates that models the handwriting styles of writers, thereby allowing high accuracies to be attained. We hypothesize that the alphabet knowledge inherent in such character prototypes can provide additional writer information pertaining to their styles of writing and their identities. This paper utilizes a character prototype approach to establish evidence that knowledge of the alphabet offers additional clues which help in the writer identification process. This paper then introduces an alphabet information coefficient (AIC) to better exploit such alphabet knowledge for writer identification. Our experiments showed an increase in writer identification accuracy from 66.0 to 87.0% on a database of 200 reference writers when alphabet knowledge was used. Experiments related to the reduction in dimensionality of the writer identification system are also reported. Our results show that the discriminative power of the alphabet can be used to reduce the complexity while maintaining the same level of performance for the writer identification system.  相似文献   

13.
A prototype reduction algorithm is proposed, which simultaneously trains both a reduced set of prototypes and a suitable local metric for these prototypes. Starting with an initial selection of a small number of prototypes, it iteratively adjusts both the position (features) of these prototypes and the corresponding local-metric weights. The resulting prototypes/metric combination minimizes a suitable estimation of the classification error probability. Good performance of this algorithm is assessed through experiments with a number of benchmark data sets and with a real task consisting in the verification of images of human faces.  相似文献   

14.
Prototype-based classification relies on the distances between the examples to be classified and carefully chosen prototypes. A small set of prototypes is of interest to keep the computational complexity low, while maintaining high classification accuracy. An experimental study of some old and new prototype optimisation techniques is presented, in which the prototypes are either selected or generated from the given data. These condensing techniques are evaluated on real data, represented in vector spaces, by comparing their resulting reduction rates and classification performance.Usually the determination of prototypes is studied in relation with the nearest neighbour rule. We will show that the use of more general dissimilarity-based classifiers can be more beneficial. An important point in our study is that the adaptive condensing schemes here discussed allow the user to choose the number of prototypes freely according to the needs. If such techniques are combined with linear dissimilarity-based classifiers, they provide the best trade-off of small condensed sets and high classification accuracy.  相似文献   

15.
针对情感识别中堆叠式自动编码器存在反向传播方法收敛速度慢和容易陷入局部最优的问题,提出一种基于堆叠式降噪自动编码器(SDAE)和正则化极限学习机(RELM)的情感状态识别方法。从脑电信号的时域、频域和时频域中提取表征情感状态的初始特征,使用SDAE进行无监督特征学习,提取初始特征的高层抽象表示。在网络的回归层,使用RELM进行情感分类。在DEAP数据集上的实验结果表明,与SDAE以及DT、KNN等传统基于机器学习的方法相比,该方法在实时性、准确性和泛化性能等方面均有明显提升。  相似文献   

16.
本文通过研究KNN(K-最近邻)算法在疾病预测领域的使用与分析,总结出KNN的2点不足,针对不足进行相应改进并提出F_KNN(循环最近邻搜索)算法:1)针对KNN计算量大、效率低下的缺点,本文采用FLANN(快速最近邻搜索)循环搜索与待测样本距离最近的点,记录若干个最近邻点作为最近邻点子集,利用此子集取代全集对待测样本进行计算,可以降低计算量,极大地提高了KNN算法效率;2)针对KNN难以对高维数据集分类的缺点,本文采用AHP(层次分析法)对样本的特征属性进行相关性研究,使用合适的参数分配权重,提高了KNN算法准确率。本文采用一组脑中风数据集对优化后的算法进行实验,实验结果表明,F_KNN准确率达96.2%。与传统KNN相比,F_KNN提高了分类性能且极大地提高了算法效率。在处理高维且较大的数据集时,F_KNN算法优势明显,具有较好的应用前景。  相似文献   

17.
镁砂熔炼过程具有多工况、群炉并行生产、高能耗等特点.在全厂供电容量约束下,为了最大化能源使用效率,需要根据全厂每台炉子的工况变化实时分配电能,实现全厂镁砂单位能耗与平均品位的多目标优化. 本文基于最小二乘支持向量机技术建立了镁砂熔炼过程全厂电能分配优化模型.根据不同工况下降低镁炉供电量对镁砂熔炼过程的影响程度,提出了基于工况优先级的电能分配策略.根据主熔工况下镁砂产量与品位指标函数的特性分析,推导出 主熔工况下电能分配模型决策变量维数缩减的条件. 为了提高多目标优化算法的运行效率,设计了一种快速非支配解集构造方法,用来提高传统多目标粒子群优化算法的寻优效率. 基于标准测试问题与现场实际例子对所提出的方法进行了检验.基于现场例子的实验结果证明所提出的方法能够避免工厂出现的用电超容量情况,并且提高了全厂用电效率.  相似文献   

18.
代表点选择是面向数据挖掘与模式识别的数据预处理的重要内容之一,是提高分类器分类正确率和执行效率的重要途径。提出了一种基于投票机制的代表点选择算法,该算法能使所得到的代表点尽可能分布在类别边界上,且投票选择机制易于排除异常点,减少数据量,从而有利于提高最近邻分类器的分类精度和效率。通过与多个经典的代表点选择算法的实验比较分析,表明所提出的基于投票机制的代表点选择算法在提高最近邻分类器分类精度和数据降低率上都具有一定的优势。  相似文献   

19.
作为数据挖掘领域十大算法之一,K-近邻算法(K-Nearest-Neighbor,KNN)因具有非参数、无需训练时间、简单有效等特点而得到广泛应用。然而,KNN算法在面对高维的大训练样本集时,分类时间复杂度高的问题成为其应用的瓶颈。另外,因训练样本的类分布不均匀而导致的类不平衡问题也会影响其分类性能。针对这两个问题,提出了一种基于冗余度的KNN分类器训练样本裁剪新算法(简记为RBKNN)。RBKNN通过引入训练样本集预处理过程,对每个训练样本进行冗余度计算,并随机裁剪掉部分高冗余度的训练样本,从而达到减小训练样本规模、均衡样本分布的目的。实验结果表明,RBKNN可在保持或改善分类精度的前提下显著提升KNN的分类效率。  相似文献   

20.
针对扩展置信规则库(extended belief rule base,EBRB)系统在不一致的激活规则过多时推理准确性不高的问题,引入带精英策略的快速非支配排序遗传算法(NSGA-Ⅱ),提出一种基于NSGA-Ⅱ的激活规则多目标优化方法。该方法首先将激活权重大于零的规则(即激活规则)进行二进制编码,把最终参与合成推理的激活规则集合的不一致性以及激活权重和作为多目标优化问题的目标函数,通过带精英策略的快速非支配排序遗传算法求解不一致性更小的激活规则集合,从而降低不一致激活规则对于EBRB系统推理准确性的影响。为了验证本文方法的有效性和可行性,引入非线性函数和输油管道检漏实例进行测试。实验结果表明,基于NSGA-Ⅱ的扩展置信规则库激活规则多目标优化方法能够有效提高EBRB系统的推理能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号