首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
The problem of handwritten digit recognition has long been an open problem in the field of pattern classification and of great importance in industry. The heart of the problem lies within the ability to design an efficient algorithm that can recognize digits written and submitted by users via a tablet, scanner, and other digital devices. From an engineering point of view, it is desirable to achieve a good performance within limited resources. To this end, we have developed a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve the overall performance achieved in classification task, the literature suggests combining the decision of multiple classifiers rather than using the output of the best classifier in the ensemble; so, in this new approach, an ensemble of classifiers is used for the recognition of handwritten digit. The classifiers used in proposed system are based on singular value decomposition (SVD) algorithm. The experimental results and the literature show that the SVD algorithm is suitable for solving sparse matrices such as handwritten digit. The decisions obtained by SVD classifiers are combined by a novel proposed combination rule which we named reliable multi-phase particle swarm optimization. We call the method “Reliable” because we have introduced a novel reliability parameter which is applied to tackle the problem of PSO being trapped in local minima. In comparison with previous methods, one of the significant advantages of the proposed method is that it is not sensitive to the size of training set. Unlike other methods, the proposed method uses just 15 % of the dataset as a training set, while other methods usually use (60–75) % of the whole dataset as the training set. To evaluate the proposed method, we tested our algorithm on Farsi/Arabic handwritten digit dataset. What makes the recognition of the handwritten Farsi/Arabic digits more challenging is that some of the digits can be legally written in different shapes. Therefore, 6000 hard samples (600 samples per class) are chosen by K-nearest neighbor algorithm from the HODA dataset which is a standard Farsi/Arabic digit dataset. Experimental results have shown that the proposed method is fast, accurate, and robust against the local minima of PSO. Finally, the proposed method is compared with state of the art methods and some ensemble classifier based on MLP, RBF, and ANFIS with various combination rules.  相似文献   

2.
This paper presents a novel method for facial expression recognition that employs the combination of two different feature sets in an ensemble approach. A pool of base support vector machine classifiers is created using Gabor filters and Local Binary Patterns. Then a multi-objective genetic algorithm is used to search for the best ensemble using as objective functions the minimization of both the error rate and the size of the ensemble. Experimental results on JAFFE and Cohn-Kanade databases have shown the efficiency of the proposed strategy in finding powerful ensembles, which improves the recognition rates between 5% and 10% over conventional approaches that employ single feature sets and single classifiers.  相似文献   

3.
In this paper, the concept of finding an appropriate classifier ensemble for named entity recognition is posed as a multiobjective optimization (MOO) problem. Our underlying assumption is that instead of searching for the best-fitting feature set for a particular classifier, ensembling of several classifiers those are trained using different feature representations could be a more fruitful approach, but it is crucial to determine the appropriate subset of classifiers that are most suitable for the ensemble. We use three heterogenous classifiers namely maximum entropy, conditional random field, and support vector machine in order to build a number of models depending upon the various representations of the available features. The proposed MOO-based ensemble technique is evaluated for three resource-constrained languages, namely Bengali, Hindi, and Telugu. Evaluation results yield the recall, precision, and F-measure values of 92.21, 92.72, and 92.46%, respectively, for Bengali; 97.07, 89.63, and 93.20%, respectively, for Hindi; and 80.79, 93.18, and 86.54%, respectively, for Telugu. We also evaluate our proposed technique with the CoNLL-2003 shared task English data sets that yield the recall, precision, and F-measure values of 89.72, 89.84, and 89.78%, respectively. Experimental results show that the classifier ensemble identified by our proposed MOO-based approach outperforms all the individual classifiers, two different conventional baseline ensembles, and the classifier ensemble identified by a single objective?Cbased approach. In a part of the paper, we formulate the problem of feature selection in any classifier under the MOO framework and show that our proposed classifier ensemble attains superior performance to it.  相似文献   

4.
Ping  Tien D.  Ching Y. 《Pattern recognition》2007,40(12):3415-3429
This paper presents a novel cascade ensemble classifier system for the recognition of handwritten digits. This new system aims at attaining a very high recognition rate and a very high reliability at the same time, in other words, achieving an excellent recognition performance of handwritten digits. The trade-offs among recognition, error, and rejection rates of the new recognition system are analyzed. Three solutions are proposed: (i) extracting more discriminative features to attain a high recognition rate, (ii) using ensemble classifiers to suppress the error rate and (iii) employing a novel cascade system to enhance the recognition rate and to reduce the rejection rate. Based on these strategies, seven sets of discriminative features and three sets of random hybrid features are extracted and used in the different layers of the cascade recognition system. The novel gating networks (GNs) are used to congregate the confidence values of three parallel artificial neural networks (ANNs) classifiers. The weights of the GNs are trained by the genetic algorithms (GAs) to achieve the overall optimal performance. Experiments conducted on the MNIST handwritten numeral database are shown with encouraging results: a high reliability of 99.96% with minimal rejection, or a 99.59% correct recognition rate without rejection in the last cascade layer.  相似文献   

5.
为了从分类器集成系统中选择出一组差异性大的子分类器,从而提高集成系统的泛化能力,提出了一种基于混合选择策略的直觉模糊核匹配追踪算法.基本思想是通过扰动训练集和特征空间生成一组子分类器;然后采用k均值聚类算法将对所得子分类器进行修剪,删去其中的冗余分类器;最后根据实际识别目标动态选择出较高识别率的分类器组合,使选择性集成规模能够随识别目标的复杂程度而自适应地变化,并基于预期识别精度实现循环集成.实验结果表明,与其他常用的分类器选择方法相比,本文方法灵活高效,具有更好的识别效果和泛化能力.  相似文献   

6.
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.  相似文献   

7.
Independent component analysis (ICA) has been widely used to tackle the microarray dataset classification problem, but there still exists an unsolved problem that the independent component (IC) sets may not be reproducible after different ICA transformations. Inspired by the idea of ensemble feature selection, we design an ICA based ensemble learning system to fully utilize the difference among different IC sets. In this system, some IC sets are generated by different ICA transformations firstly. A multi-objective genetic algorithm (MOGA) is designed to select different biologically significant IC subsets from these IC sets, which are then applied to build base classifiers. Three schemes are used to fuse these base classifiers. The first fusion scheme is to combine all individuals in the final generation of the MOGA. In addition, in the evolution, we design a global-recording technique to record the best IC subsets of each IC set in a global-recording list. Then the IC subsets in the list are deployed to build base classifier so as to implement the second fusion scheme. Furthermore, by pruning about half of less accurate base classifiers obtained by the second scheme, a compact and more accurate ensemble system is built, which is regarded as the third fusion scheme. Three microarray datasets are used to test the ensemble systems, and the corresponding results demonstrate that these ensemble schemes can further improve the performance of the ICA based classification model, and the third fusion scheme leads to the most accurate ensemble system with the smallest ensemble size.  相似文献   

8.
《Information Fusion》2005,6(1):83-98
Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. Ensembles allow us to achieve higher accuracy, which is often not achievable with single models. It was shown theoretically and experimentally that in order for an ensemble to be effective, it should consist of base classifiers that have diversity in their predictions. One technique, which proved to be effective for constructing an ensemble of diverse base classifiers, is the use of different feature subsets, or so-called ensemble feature selection. Many ensemble feature selection strategies incorporate diversity as an objective in the search for the best collection of feature subsets. A number of ways are known to quantify diversity in ensembles of classifiers, and little research has been done about their appropriateness to ensemble feature selection. In this paper, we compare five measures of diversity with regard to their possible use in ensemble feature selection. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the ensemble accuracy and other characteristics for the ensembles built with ensemble feature selection based on the considered measures of diversity. We consider four search strategies for ensemble feature selection together with the simple random subspacing: genetic search, hill-climbing, and ensemble forward and backward sequential selection. In the experiments, we show that, in some cases, the ensemble feature selection process can be sensitive to the choice of the diversity measure, and that the question of the superiority of a particular measure depends on the context of the use of diversity and on the data being processed. In many cases and on average, the plain disagreement measure is the best. Genetic search, kappa, and dynamic voting with selection form the best combination of a search strategy, diversity measure and integration method.  相似文献   

9.
A novel ensemble of classifiers for microarray data classification   总被引:1,自引:0,他引:1  
Yuehui  Yaou   《Applied Soft Computing》2008,8(4):1664-1669
Micorarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes and a few hundreds of samples. Such extreme asymmetry between the dimensionality of genes and samples presents several challenges to conventional clustering and classification methods. In this paper, a novel ensemble method is proposed. Firstly, in order to extract useful features and reduce dimensionality, different feature selection methods such as correlation analysis, Fisher-ratio is used to form different feature subsets. Then a pool of candidate base classifiers is generated to learn the subsets which are re-sampling from the different feature subsets with PSO (Particle Swarm Optimization) algorithm. At last, appropriate classifiers are selected to construct the classification committee using EDAs (Estimation of Distribution Algorithms). Experiments show that the proposed method produces the best recognition rates on four benchmark databases.  相似文献   

10.
Rotation Forest, an effective ensemble classifier generation technique, works by using principal component analysis (PCA) to rotate the original feature axes so that different training sets for learning base classifiers can be formed. This paper presents a variant of Rotation Forest, which can be viewed as a combination of Bagging and Rotation Forest. Bagging is used here to inject more randomness into Rotation Forest in order to increase the diversity among the ensemble membership. The experiments conducted with 33 benchmark classification data sets available from the UCI repository, among which a classification tree is adopted as the base learning algorithm, demonstrate that the proposed method generally produces ensemble classifiers with lower error than Bagging, AdaBoost and Rotation Forest. The bias–variance analysis of error performance shows that the proposed method improves the prediction error of a single classifier by reducing much more variance term than the other considered ensemble procedures. Furthermore, the results computed on the data sets with artificial classification noise indicate that the new method is more robust to noise and kappa-error diagrams are employed to investigate the diversity–accuracy patterns of the ensemble classifiers.  相似文献   

11.
The automatic detection of construction materials in images acquired on a construction site has been regarded as a critical topic. Recently, several data mining techniques have been used as a way to solve the problem of detecting construction materials. These studies have applied single classifiers to detect construction materials—and distinguish them from the background—by using color as a feature. Recent studies suggest that combining multiple classifiers (into what is called a heterogeneous ensemble classifier) would show better performance than using a single classifier. However, the performance of ensemble classifiers in construction material detection is not fully understood. In this study, we investigated the performance of six single classifiers and potential ensemble classifiers on three data sets: one each for concrete, steel, and wood. A heterogeneous voting-based ensemble classifier was created by selecting base classifiers which are diverse and accurate; their prediction probabilities for each target class were averaged to yield a final decision for that class. In comparison with the single classifiers, the ensemble classifiers performed better in the three data sets overall. This suggests that it is better to use an ensemble classifier to enhance the detection of construction materials in images acquired on a construction site.  相似文献   

12.
Feature selection is an important data preprocessing step for the construction of an effective bankruptcy prediction model. The prediction performance can be affected by the employed feature selection and classification techniques. However, there have been very few studies of bankruptcy prediction that identify the best combination of feature selection and classification techniques. In this study, two types of feature selection methods, including filter‐ and wrapper‐based methods, are considered, and two types of classification techniques, including statistical and machine learning techniques, are employed in the development of the prediction methods. In addition, bagging and boosting ensemble classifiers are also constructed for comparison. The experimental results based on three related datasets that contain different numbers of input features show that the genetic algorithm as the wrapper‐based feature selection method performs better than the filter‐based one by information gain. It is also shown that the lowest prediction error rates for the three datasets are provided by combining the genetic algorithm with the naïve Bayes and support vector machine classifiers without bagging and boosting.  相似文献   

13.
14.
In this paper, we fill a gap in the literature by studying the problem of Arabic handwritten digit recognition. The performances of different classification and feature extraction techniques on recognizing Arabic digits are going to be reported to serve as a benchmark for future work on the problem. The performance of well known classifiers and feature extraction techniques will be reported in addition to a novel feature extraction technique we present in this paper that gives a high accuracy and competes with the state-of-the-art techniques. A total of 54 different classifier/features combinations will be evaluated on Arabic digits in terms of accuracy and classification time. The results are analyzed and the problem of the digit ‘0’ is identified with a proposed method to solve it. Moreover, we propose a strategy to select and design an optimal two-stage system out of our study and, hence, we suggest a fast two-stage classification system for Arabic digits which achieves as high accuracy as the highest classifier/features combination but with much less recognition time.  相似文献   

15.
Projection Functions have been widely used for facial feature extraction and optical/handwritten character recognition due to their simplicity and efficiency. Because these transformations are not one-to-one, they may result in mapping distinct points into one point, and consequently losing detailed information. Here, we solve this problem by defining an N-dimensional space to represent a single image. Then, we propose a one-to-one transformation in this new image space. The proposed method, which we referred to as Linear Principal Transformation (LPT), utilizes Eigen analysis to extract the vector with the highest Eigenvalue. Afterwards, extrema in this vector were analyzed to extract the features of interest. In order to evaluate the proposed method, we performed two sets of experiments on facial feature extraction and optical character recognition in three different data sets. The results show that the proposed algorithm outperforms the observed algorithms in the paper and achieves accuracy from 1.4 % up to 14 %, while it has a comparable time complexity and efficiency.  相似文献   

16.
李琼  陈利  王维虎 《微机发展》2014,(2):205-208
手写体数字识别是图像处理与模式识别中具有较高实用价值的研究热点之一。在保证较高识别精度的前提下,为提高手写体数字的识别速度,提出了一种基于SVM的快速手写体数字识别方法。该方法通过各类别在特征空间中的可分性强度确定SVM最优核参数,快速训练出SVM分类器对手写体数字进行分类识别。由于可分性强度的计算是一个简单的迭代过程,所需时间远小于传统参数优化方法中训练相应SVM分类器所需时间,故参数确定时间被大大缩减,训练速度得到相应提高,从而加快了手写体数字的识别过程,同时保证了较好的分类准确率。通过对MNIST手写体数字库的实验验证,结果表明该算法是可行有效的。  相似文献   

17.
手写体字符识别的多特征多分类器设计   总被引:4,自引:0,他引:4  
特征选取和分类器设计是字符识别系统设计的关键。文章针对手写体汉字和阿拉伯数字混和字符集的识别提出了依据不同的分类要求,分别选取不同的字符特征并采用神经网络多分类器进行识别的设计方法。实验结果表明,该方法用于手写体混合字符集的识别是行之有效的。  相似文献   

18.
In recent years, ensemble learning has become a prolific area of study in pattern recognition, based on the assumption that using and combining different learning models in the same problem could lead to better performance results than using a single model. This idea of ensemble learning has traditionally been used for classification tasks, but has more recently been adapted to other machine learning tasks such as clustering and feature selection. We propose several feature selection ensemble configurations based on combining rankings of features from individual rankers according to the combination method and threshold value used. The performance of each proposed ensemble configuration was tested for synthetic datasets (to assess the adequacy of the selection), real classical datasets (with more samples than features), and DNA microarray datasets (with more features than samples). Five different classifiers were studied in order to test the suitability of the proposed ensemble configurations and assess the results.  相似文献   

19.
为提高中医诊断的智能化以及辩证的准确度,提出一种基于多模态扰动策略的集成学习算法(MPEL算法)。首先,在样本域多次抽样产生不同的样本子空间;其次,在属性域采用改进的层次聚类特征选择算法,划分不同的属性子空间,进而训练出具有较大差异性的基分类器;然后,采用贪心策略选取最优的基分类器组合,提高算法整体性能。选择中医哮喘病症状-证型病案进行验证,并与其它集成学习算法对比,实验结果表明,改进的集成学习算法在哮喘病症状-证型分类预测中训练速度较快、识别准确率更高,最高识别率高达98.16%。  相似文献   

20.
Artificial neural networks have been recognized as a powerful tool for pattern classification problems, but a number of researchers have also suggested that straightforward neural-network approaches to pattern recognition are largely inadequate for difficult problems such as handwritten numeral recognition. In this paper, we present three sophisticated neural-network classifiers to solve complex pattern recognition problems: multiple multilayer perceptron (MLP) classifier, hidden Markov model (HMM)/MLP hybrid classifier, and structure-adaptive self-organizing map (SOM) classifier. In order to verify the superiority of the proposed classifiers, experiments were performed with the unconstrained handwritten numeral database of Concordia University, Montreal, Canada. The three methods have produced 97.35%, 96.55%, and 96.05% of the recognition rates, respectively, which are better than those of several previous methods reported in the literature on the same database.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号