首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The problem of selecting of prototypes is to select a subset in the learning sample for which the set of minimum cardinality would provide the optimum of a given learning quality functional. In this article the problem of classification is considered in two classes, the method of classification by nearest neighbor, and three functional characteristics: the frequency of errors on the entire sample, a cross validation with one separated object, and a complete cross validation with k separated objects. It is shown that the problem of selection of prototypes in all three cases is NP-complete, which justifies the use of well-known heuristic methods for the prototype search.  相似文献   

2.
3.
Some new rank methods to select the best prototypes from a training set are proposed in this paper in order to establish its size according to an external parameter, while maintaining the classification accuracy. The traditional methods that filter the training set in a classification task like editing or condensing have some rules that apply to the set in order to remove outliers or keep some prototypes that help in the classification. In our approach, new voting methods are proposed to compute the prototype probability and help to classify correctly a new sample. This probability is the key to sorting the training set out, so a relevance factor from 0 to 1 is used to select the best candidates for each class whose accumulated probabilities are less than that parameter. This approach makes it possible to select the number of prototypes necessary to maintain or even increase the classification accuracy. The results obtained in different high dimensional databases show that these methods maintain the final error rate while reducing the size of the training set.  相似文献   

4.
The k-nearest neighbour rule is commonly considered for classification tasks given its straightforward implementation and good performance in many applications. However, its efficiency represents an obstacle in real-case scenarios because the classification requires computing a distance to every single prototype of the training set. Prototype Selection (PS) is a typical approach to alleviate this problem, which focuses on reducing the size of the training set by selecting the most interesting prototypes. In this context, rank methods have been postulated as a good solution: following some heuristics, these methods perform an ordering of the prototypes according to their relevance in the classification task, which is then used to select the most relevant ones. This work presents a significant improvement of existing rank methods by proposing two extensions: (i) a greater robustness against noise at label level by considering the parameter ‘k’ of the classification in the selection process; and (ii) a new parameter-free rule to select the prototypes once they have been ordered. The experiments performed in different scenarios and datasets demonstrate the goodness of these extensions. Also, it is empirically proved that the new full approach is competitive with respect to existing PS algorithms.  相似文献   

5.
6.
基于模糊模型相似测量的字符无监督分类法   总被引:2,自引:0,他引:2  
该文提出一种基于模糊模型相似测量的文本分析系统的字符预分类方法 ,用于对字符的无监督分类 ,以提高整个字符识别系统的速度、正确性和鲁棒性 .作者在字符印刷结构归类的基础上 ,采用模板匹配方法将各类字符分别转换成基于一非线性加权相似函数的模糊样板集合 .模糊字符的无监督分类是字符匹配的一种自然范例并发展了加权模糊相似测量的研究 .该文讨论了该模糊模型的特性、模糊样板匹配的规则 ,并用于加快字符分类处理 ,经过字符分类 ,在字符识别时由于只需针对较小的模糊样板集合而变得容易和快速  相似文献   

7.
Since given classification data often contains redundant, useless or misleading features, feature selection is an important pre-processing step for solving classification problems. This problem is often solved by applying evolutionary algorithms to decrease the dimensional number of features involved. Removing irrelevant features in the feature space and identifying relevant features correctly is the primary objective, which can increase classification accuracy. In this paper, a novel QBGSA–K-NN hybrid system which hybridizes the quantum-inspired binary gravitational search algorithm (QBGSA) with the K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) is proposed. The main aim of this system is to improve classification accuracy with an appropriate feature subset in binary problems. We evaluate the proposed hybrid system on several UCI machine learning benchmark examples. The experimental results show that the proposed method is able to select the discriminating input features correctly and achieve high classification accuracy which is comparable to or better than well-known similar classifier systems.  相似文献   

8.
Multimodal biometric can overcome the limitation possessed by single biometric trait and give better classification accuracy. This paper proposes face-iris multimodal biometric system based on fusion at matching score level using support vector machine (SVM). The performances of face and iris recognition can be enhanced using a proposed feature selection method to select an optimal subset of features. Besides, a simple computation speed-up method is proposed for SVM. The results show that the proposed feature selection method is able improve the classification accuracy in terms of total error rate. The support vector machine-based fusion method also gave very promising results.  相似文献   

9.
Most of the prototype reduction schemes (PRS), which have been reported in the literature, process the data in its entirety to yield a subset of prototypes that are useful in nearest-neighbor-like classification. Foremost among these are the prototypes for nearest neighbor classifiers, the vector quantization technique, and the support vector machines. These methods suffer from a major disadvantage, namely, that of the excessive computational burden encountered by processing all the data. In this paper, we suggest a recursive and computationally superior mechanism referred to as adaptive recursive partitioning (ARP)_PRS. Rather than process all the data using a PRS, we propose that the data be recursively subdivided into smaller subsets. This recursive subdivision can be arbitrary, and need not utilize any underlying clustering philosophy. The advantage of ARP_PRS is that the PRS processes subsets of data points that effectively sample the entire space to yield smaller subsets of prototypes. These prototypes are then, in turn, gathered and processed by the PRS to yield more refined prototypes. In this manner, prototypes which are in the interior of the Voronoi spaces, and thus ineffective in the classification, are eliminated at the subsequent invocations of the PRS. We are unaware of any PRS that employs such a recursive philosophy. Although we marginally forfeit accuracy in return for computational efficiency, our experimental results demonstrate that the proposed recursive mechanism yields classification comparable to the best reported prototype condensation schemes reported to-date. Indeed, this is true for both artificial data sets and for samples involving real-life data sets. The results especially demonstrate that a fair computational advantage can be obtained by using such a recursive strategy for "large" data sets, such as those involved in data mining and text categorization applications.  相似文献   

10.
This paper presents a new efficient technique for supervised pixel-based classification of textured images. A prototype selection algorithm that relies on the normalized cut criterion is utilized for automatically determining a subset of prototypes in order to characterize each texture class at the local level based on the outcome of a multichannel Gabor filter bank. Then, a simple minimum distance classifier fed with the previously determined prototypes is used to classify every image pixel into one of the given texture classes. Multi-sized evaluation windows following a top-down approach are used during classification in order to improve accuracy near frontiers of regions of different texture. Results with standard Brodatz, VisTex and MeasTex compositions and with complex real images are presented and discussed. The proposed technique is also compared with alternative texture classifiers.  相似文献   

11.
人机对话技术近年来受到学术界和工业界的广泛关注。人机对话系统的一个关键任务就是如何让聊天机器人理解用户的问句意图并将用户的输入正确地分类到相应领域中,其性能直接影响到特定领域的人机对话质量。该文针对对话问句具有句子长度短、局部特征明显等特点,单通道卷积神经网络(Convolutional Neural Network,CNN)视角单一,不能充分学习到问句的特征信息和语义信息。该文在研究和分析了CNN算法的基础上,提出了意图分类双通道卷积神经网(Intent Classification Dual-channel Convolutional Neural Networks,ICDCNN)算法。该方法首先采用Word2Vec工具和Embedding层进行训练词向量提取问句中的语义信息特征;然后采用两个不同的通道进行卷积运算,一个通道传入字级别的词向量,另一个通道传入词级别的词向量,使用细粒度的字级别词向量协助词级别的词向量捕获自然语言问句中更深层次的语义信息;最后通过设置不同尺寸的卷积核,学习问句内部更深层次的抽象特征。通过对比实验结果表明,该算法在选用的中文实验数据集上取得了较高的准确率,较其他算法具有一定的优势。  相似文献   

12.
针对目前大部分钓鱼网站检测方法存在检测准确率低、误判率高等问题,提出了一种基于特征选择与集成学习的钓鱼网站检测方法。该检测方法首先使用FSIGR算法进行特征选择,FSIGR算法结合过滤和封装模式的优点,从信息相关性和分类能力两个方面对特征进行综合度量,并采用前向递增后向递归剔除策略对特征进行选择,以分类精度作为评价指标对特征子集进行评价,从而获取最优特征子集;然后使用最优特征子集数据对随机森林分类算法模型进行训练。在UCI数据集上的实验表明,所提方法能够有效提高钓鱼网站检测的正确率,降低误判率,具有实际应用意义。  相似文献   

13.
样本数据集的不一致性和冗余特征会降低分类的质量和效率。提出了一种一致化特征选择约简方法,该方法基于贝叶斯公式,采用阈值,将非一致数据归为最可能的一类,使数据集一致化。并在一致数据集上,运用类别区分矩阵选择可准确区分各类数据的最小特征变量集。给出的启发式搜索策略和应用实例表明:一致化特征选择约简方法能有效消除分类数据集的不一致性,选择最优的特征变量、降低数据的维数、减少数据集中的冗余信息。  相似文献   

14.
Past work on object detection has emphasized the issues of feature extraction and classification, however, relatively less attention has been given to the critical issue of feature selection. The main trend in feature extraction has been representing the data in a lower dimensional space, for example, using principal component analysis (PCA). Without using an effective scheme to select an appropriate set of features in this space, however, these methods rely mostly on powerful classification algorithms to deal with redundant and irrelevant features. In this paper, we argue that feature selection is an important problem in object detection and demonstrate that genetic algorithms (GAs) provide a simple, general, and powerful framework for selecting good subsets of features, leading to improved detection rates. As a case study, we have considered PCA for feature extraction and support vector machines (SVMs) for classification. The goal is searching the PCA space using GAs to select a subset of eigenvectors encoding important information about the target concept of interest. This is in contrast to traditional methods selecting some percentage of the top eigenvectors to represent the target concept, independently of the classification task. We have tested the proposed framework on two challenging applications: vehicle detection and face detection. Our experimental results illustrate significant performance improvements in both cases.  相似文献   

15.
阚峻岭  李锋刚 《计算机工程》2010,36(24):167-168
属性的选择和评价是知识基系统设计中的重要任务和影响系统性能优劣的关键因素。为此,利用遗传算法的遗传算子搜索机制和相关性分析的启发式作为评价机制,提出一种新颖的属性选择策略,用于从属性集中选择给定案例最优的属性子集。实验结果表明,该方法可以确定与分类和预测最相关的属性子集,同时在几乎不降低分类准确性的情况下,极大地减小属性的表示空间。  相似文献   

16.
随着DNA微阵列技术的出现,大量关于不同肿瘤的基因表达谱数据集被发布到网络上,从而使得对肿瘤特征基因选择和亚型分类的研究成为生物信息学领域的热点。基于Lasso(least absolute shrinkage and selection operator)方法提出了K-split Lasso特征选择方法,其基本思想是将数据集平均划分为K份,分别使用Lasso方法对每份进行特征选择,而后将选择出来的每份特征子集合并,重新进行特征选择,得到最终的特征基因。实验采用支持向量机作为分类器,结果表明K-split Lasso方法减少了冗余特征,提高了分类精度,具有良好的稳定性。由于每次计算的维数降低,K-split Lasso方法解决了计算开销过大的问题,并在一定程度上解决了"过拟合"问题。因此K-split Lasso方法是一种有效的肿瘤特征基因选择方法。  相似文献   

17.
Granular prototyping in fuzzy clustering   总被引:5,自引:0,他引:5  
We introduce a logic-driven clustering in which prototypes are formed and evaluated in a sequential manner. The way of revealing a structure in data is realized by maximizing a certain performance index (objective function) that takes into consideration an overall level of matching (to be maximized) and a similarity level between the prototypes (the component to be minimized). The prototypes identified in the process come with the optimal weight vector that serves to indicate the significance of the individual features (coordinates) in the data grouping represented by the prototype. Since the topologies of these groupings are in general quite diverse the optimal weight vectors are reflecting the anisotropy of the feature space, i.e., they show some local ranking of features in the data space. Having found the prototypes we consider an inverse similarity problem and show how the relevance of the prototypes translates into their granularity.  相似文献   

18.
特征选择是数据挖掘、机器学习和模式识别中始终面临的一个重要问题。针对类和特征分布不均时,传统信息增益在特征选择中存在的选择偏好问题,本文提出了一种基于信息增益率与随机森林的特征选择算法。该算法结合Filter和Wrapper模式的优点,首先从信息相关性和分类能力两个方面对特征进行综合度量,然后采用序列前向选择(Sequential Forward Selection, SFS)策略对特征进行选择,并以分类精度作为评价指标对特征子集进行度量,从而获取最优特征子集。实验结果表明,本文算法不仅能够达到特征空间降维的效果,而且能够有效提高分类算法的分类性能和查全率。  相似文献   

19.
文章提出了一种手写汉字预分类的新方法,该方法分两步进行,首先提取笔划密度特征并用模糊规则产生四个预分类组;然后通过模糊逻辑处理将各组字符分别转换成基于非线性加权函数的模糊样板并通过基于模糊相似测量的匹配算法、相似性测量样板的分级分类进行预分类。测试结果表明,该方法效果良好,预分类正确率达到98.17%。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号