首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 556 毫秒
The aim of bankruptcy prediction in the areas of data mining and machine learning is to develop an effective model which can provide the higher prediction accuracy. In the prior literature, various classification techniques have been developed and studied, in/with which classifier ensembles by combining multiple classifiers approach have shown their outperformance over many single classifiers. However, in terms of constructing classifier ensembles, there are three critical issues which can affect their performance. The first one is the classification technique actually used/adopted, and the other two are the combination method to combine multiple classifiers and the number of classifiers to be combined, respectively. Since there are limited, relevant studies examining these aforementioned disuses, this paper conducts a comprehensive study of comparing classifier ensembles by three widely used classification techniques including multilayer perceptron (MLP) neural networks, support vector machines (SVM), and decision trees (DT) based on two well-known combination methods including bagging and boosting and different numbers of combined classifiers. Our experimental results by three public datasets show that DT ensembles composed of 80–100 classifiers using the boosting method perform best. The Wilcoxon signed ranked test also demonstrates that DT ensembles by boosting perform significantly different from the other classifier ensembles. Moreover, a further study over a real-world case by a Taiwan bankruptcy dataset was conducted, which also demonstrates the superiority of DT ensembles by boosting over the others.  相似文献   

Extreme Learning Machine (ELM) is a supervised learning technique for a class of feedforward neural networks with random weights that has recently been used with success for the classification of hyperspectral images. In this work, we show that the morphological techniques can be integrated in this kind of classifiers using several composite feature mappings which are proposed for ELM. In particular, we present a spectral–spatial ELM-based classifier for hyperspectral remote-sensing images that integrates the information provided by extended morphological profiles. The proposed spectral–spatial classifier allows different weights for both spatial and spectral features, outperforming other ELM-based classifiers in terms of accuracy for land-cover applications. The accuracy classification results are also better than those obtained by equivalent spectral–spatial Support-Vector-Machine-based classifiers.  相似文献   

Non-parametric classification procedures based on a certainty measure and nearest neighbour rule for motor unit potential classification (MUP) during electromyographic (EMG) signal decomposition were explored. A diversity-based classifier fusion approach is developed and evaluated to achieve improved classification performance. The developed system allows the construction of a set of non-parametric base classifiers and then automatically chooses, from the pool of base classifiers, subsets of classifiers to form candidate classifier ensembles. The system selects the classifier ensemble members by exploiting a diversity measure for selecting classifier teams. The kappa statistic is used as the diversity measure to estimate the level of agreement between base classifier outputs, i.e., to measure the degree of decision similarity between base classifiers. The pool of base classifiers consists of two kinds of classifiers: adaptive certainty-based classifiers (ACCs) and adaptive fuzzy k-NN classifiers (AFNNCs) and both utilize different types of features. Once the patterns are assigned to their classes, by the classifier fusion system, firing pattern consistency statistics for each class are calculated to detect classification errors in an adaptive fashion. Performance of the developed system was evaluated using real and simulated EMG signals and was compared with the performance of the constituent base classifiers and the performance of the fixed ensemble containing the full set of base classifiers. Across the EMG signal data sets used, the diversity-based classifier fusion approach had better average classification performance overall, especially in terms of reducing classification errors.  相似文献   

集成分类通过将若干个弱分类器依据某种规则进行组合,能有效改善分类性能。在组合过程中,各个弱分类器对分类结果的重要程度往往不一样。极限学习机是最近提出的一个新的训练单隐层前馈神经网络的学习算法。以极限学习机为基分类器,提出了一个基于差分进化的极限学习机加权集成方法。提出的方法通过差分进化算法来优化集成方法中各个基分类器的权值。实验结果表明,该方法与基于简单投票集成方法和基于Adaboost集成方法相比,具有较高的分类准确性和较好的泛化能力。  相似文献   

This study deals with the evaluation of accuracy benefits offered by a fuzzy classifier as compared to hard classifiers using satellite imagery for thematic mapping applications. When a crisp classifier approach is adopted to classify moderate resolution data, the presence of mixed coverage pixels implies that the final product will have errors, either of omission or commission, which are not avoidable and are solely due to the spatial resolution of the data. Theoretically, a soft classifier is not affected by such errors, and in principle can produce a classification that is more accurate than any hard classifier. In this study we use the Pareto boundary of optimal solutions as a quantitative method to compare the performance of a fuzzy statistical classifier to the one of two hard classifiers, and to determine the highest accuracy which could be achieved by hard classifiers. As an application, the method is applied to a case of snow mapping from Moderate-Resolution Imaging Spectroradiometer (MODIS) data on two alpine sites, validated with contemporaneous fine-resolution Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) data. The results for this case study showed that the soft classifier not only outperformed the two crisp classifiers, but also yielded higher accuracy than the maximum theoretical accuracy of any crisp classifier on the study areas. While providing a general assessment framework for the performance of soft classifiers, the results obtained by this inter-comparison exercise showed that soft classifiers can be an effective solution to overcome errors which are intrinsic in the classification of coarse and moderate resolution data.  相似文献   

Multiple classifier systems (MCSs) based on the combination of outputs of a set of different classifiers have been proposed in the field of pattern recognition as a method for the development of high performance classification systems. Previous work clearly showed that multiple classifier systems are effective only if the classifiers forming them are accurate and make different errors. Therefore, the fundamental need for methods aimed to design “accurate and diverse” classifiers is currently acknowledged. In this paper, an approach to the automatic design of multiple classifier systems is proposed. Given an initial large set of classifiers, our approach is aimed at selecting the subset made up of the most accurate and diverse classifiers. A proof of the optimality of the proposed design approach is given. Reported results on the classification of multisensor remote sensing images show that this approach allows the design of effective multiple classifier systems.  相似文献   

Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text classification problem is that no labeled negative documents are available in the training example set. Firstly, we identify many more reliable negative documents by an improved 1-DNF algorithm with a very low error rate. Secondly, we build a set of classifiers by iteratively applying the SVM algorithm on a training data set, which is augmented during iteration. Thirdly, different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Finally, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on PSO (Particle Swarm Optimization), which can discover the best combination of the weights. In addition, we built a focused crawler based on link-contexts guided by different classifiers to evaluate our method. Several comprehensive experiments have been conducted using the Reuters data set and thousands of web pages. Experimental results show that our method increases the performance (F1-measure) compared with PEBL, and a focused web crawler guided by our PSO-based classifier outperforms other several classifiers both in harvest rate and target recall.  相似文献   

It has been widely accepted that the classification accuracy can be improved by combining outputs of multiple classifiers. However, how to combine multiple classifiers with various (potentially conflicting) decisions is still an open problem. A rich collection of classifier combination procedures-many of which are heuristic in nature-have been developed for this goal. In this brief, we describe a dynamic approach to combine classifiers that have expertise in different regions of the input space. To this end, we use local classifier accuracy estimates to weight classifier outputs. Specifically, we estimate local recognition accuracies of classifiers near a query sample by utilizing its nearest neighbors, and then use these estimates to find the best weights of classifiers to label the query. The problem is formulated as a convex quadratic optimization problem, which returns optimal nonnegative classifier weights with respect to the chosen objective function, and the weights ensure that locally most accurate classifiers are weighted more heavily for labeling the query sample. Experimental results on several data sets indicate that the proposed weighting scheme outperforms other popular classifier combination schemes, particularly on problems with complex decision boundaries. Hence, the results indicate that local classification-accuracy-based combination techniques are well suited for decision making when the classifiers are trained by focusing on different regions of the input space.  相似文献   

This paper describes a performance evaluation study in which some efficient classifiers are tested in handwritten digit recognition. The evaluated classifiers include a statistical classifier (modified quadratic discriminant function, MQDF), three neural classifiers, and an LVQ (learning vector quantization) classifier. They are efficient in that high accuracies can be achieved at moderate memory space and computation cost. The performance is measured in terms of classification accuracy, sensitivity to training sample size, ambiguity rejection, and outlier resistance. The outlier resistance of neural classifiers is enhanced by training with synthesized outlier data. The classifiers are tested on a large data set extracted from NIST SD19. As results, the test accuracies of the evaluated classifiers are comparable to or higher than those of the nearest neighbor (1-NN) rule and regularized discriminant analysis (RDA). It is shown that neural classifiers are more susceptible to small sample size than MQDF, although they yield higher accuracies on large sample size. As a neural classifier, the polynomial classifier (PC) gives the highest accuracy and performs best in ambiguity rejection. On the other hand, MQDF is superior in outlier rejection even though it is not trained with outlier data. The results indicate that pattern classifiers have complementary advantages and they should be appropriately combined to achieve higher performance. Received: July 18, 2001 / Accepted: September 28, 2001  相似文献   

In machine learning, a combination of classifiers, known as an ensemble classifier, often outperforms individual ones. While many ensemble approaches exist, it remains, however, a difficult task to find a suitable ensemble configuration for a particular dataset. This paper proposes a novel ensemble construction method that uses PSO generated weights to create ensemble of classifiers with better accuracy for intrusion detection. Local unimodal sampling (LUS) method is used as a meta-optimizer to find better behavioral parameters for PSO. For our empirical study, we took five random subsets from the well-known KDD99 dataset. Ensemble classifiers are created using the new approaches as well as the weighted majority algorithm (WMA) approach. Our experimental results suggest that the new approach can generate ensembles that outperform WMA in terms of classification accuracy.  相似文献   

In integrated segmentation and recognition of character strings, the underlying classifier is trained to be resistant to noncharacters. We evaluate the performance of state-of-the-art pattern classifiers of this kind. First, we build a baseline numeral string recognition system with simple but effective presegmentation. The classification scores of the candidate patterns generated by presegmentation are combined to evaluate the segmentation paths and the optimal path is found using the beam search strategy. Three neural classifiers, two discriminative density models, and two support vector classifiers are evaluated. Each classifier has some variations depending on the training strategy: maximum likelihood, discriminative learning both with and without noncharacter samples. The string recognition performances are evaluated on the numeral string images of the NIST special database 19 and the zipcode images of the CEDAR CDROM-1. The results show that noncharacter training is crucial for neural classifiers and support vector classifiers, whereas, for the discriminative density models, the regularization of parameters is important. The string recognition results compare favorably to the best ones reported in the literature though we totally ignored the geometric context. The best results were obtained using a support vector classifier, but the neural classifiers and discriminative density models show better trade-off between accuracy and computational overhead.  相似文献   

The ability to accurately predict business failure is a very important issue in financial decision-making. Incorrect decision-making in financial institutions is very likely to cause financial crises and distress. Bankruptcy prediction and credit scoring are two important problems facing financial decision support. As many related studies develop financial distress models by some machine learning techniques, more advanced machine learning techniques, such as classifier ensembles and hybrid classifiers, have not been fully assessed. The aim of this paper is to develop a novel hybrid financial distress model based on combining the clustering technique and classifier ensembles. In addition, single baseline classifiers, hybrid classifiers, and classifier ensembles are developed for comparisons. In particular, two clustering techniques, Self-Organizing Maps (SOMs) and k-means and three classification techniques, logistic regression, multilayer-perceptron (MLP) neural network, and decision trees, are used to develop these four different types of bankruptcy prediction models. As a result, 21 different models are compared in terms of average prediction accuracy and Type I & II errors. By using five related datasets, combining Self-Organizing Maps (SOMs) with MLP classifier ensembles performs the best, which provides higher predication accuracy and lower Type I & II errors.  相似文献   

A modified k-nearest neighbour (k-NN) classifier is proposed for supervised remote sensing classification of hyperspectral data. To compare its performance in terms of classification accuracy and computational cost, k-NN and a back-propagation neural network classifier were used. A classification accuracy of 91.2% was achieved by the proposed classifier with the data set used. Results from this study suggest that the accuracy achieved with this classifier is significantly better than the k-NN and comparable to a back-propagation neural network. Comparison in terms of computational cost also suggests the effectiveness of modified k-NN classifier for hyperspectral data classification. A fuzzy entropy-based filter approach was used for feature selection to compare the performance of modified and k-NN classifiers with a reduced data set. The results suggest a significant increase in classification accuracy by the modified k-NN classifier in comparison with k-NN classifier with selected features.  相似文献   

One of the solutions to the classification problem are the ensemble methods, in particular a hierarchical approach. This method bases on dynamically splitting the original problem during training into smaller subproblems which should be easier to train. Then the answers are combined together to obtain the final classification. The main problem here is how to divide (cluster) the original problem to obtain best possible accuracy expressed in terms of risk function value. The exact value for a given clustering is known only after the whole training process. In this paper we propose the risk estimation method based on the analysis of the root classifier. This makes it possible to evaluate the risks for all subproblems without any training of children classifiers. Together with some earlier theoretical results on hierarchical approach, we show how to use the proposed method to evaluate the risk for the whole ensemble. A variant, which uses a genetic algorithm (GA), is proposed. We compare this method with an earlier one, based on the Bayes law. We show that the subproblem risk evaluation is highly correlated with the true risk, and that the Bayes/GA approaches give hierarchical classifiers which are superior to single ones. Our method works for any classifier which returns a class probability vector for a given example.  相似文献   

A new approach for estimating classification errors is presented. In the model, there are two types of classification error: empirical and generalization error. The first is the error observed over the training samples, and the second is the discrepancy between the error probability and empirical error. In this research, the Vapnik and Chervonenkis dimension (VCdim) is used as a measure for classifier complexity. Based on this complexity measure, an estimate for generalization error is developed. An optimal classifier design criterion (the generalized minimum empirical error criterion (GMEE)) is used. The GMEE criterion consists of two terms: the empirical and the estimate of generalization error. As an application, the criterion is used to design the optimal neural network classifier. A corollary to the Γ optimality of neural-network-based classifiers is proven. Thus, the approach provides a theoretic foundation for the connectionist approach to optimal classifier design. Experimental results to validate this approach  相似文献   

Using neural network ensembles for bankruptcy prediction and credit scoring   总被引:2,自引:0,他引:2  
Bankruptcy prediction and credit scoring have long been regarded as critical topics and have been studied extensively in the accounting and finance literature. Artificial intelligence and machine learning techniques have been used to solve these financial decision-making problems. The multilayer perceptron (MLP) network trained by the back-propagation learning algorithm is the mostly used technique for financial decision-making problems. In addition, it is usually superior to other traditional statistical models. Recent studies suggest combining multiple classifiers (or classifier ensembles) should be better than single classifiers. However, the performance of multiple classifiers in bankruptcy prediction and credit scoring is not fully understood. In this paper, we investigate the performance of a single classifier as the baseline classifier to compare with multiple classifiers and diversified multiple classifiers by using neural networks based on three datasets. By comparing with the single classifier as the benchmark in terms of average prediction accuracy, the multiple classifiers only perform better in one of the three datasets. The diversified multiple classifiers trained by not only different classifier parameters but also different sets of training data perform worse in all datasets. However, for the Type I and Type II errors, there is no exact winner. We suggest that it is better to consider these three classifier architectures to make the optimal financial decision.  相似文献   

基于改进的Adaboost-BP模型在降水中的预测   总被引:1,自引:0,他引:1  
王军  费凯  程勇 《计算机应用》2017,37(9):2689-2693
针对目前分类算法对降水预测过程存在着泛化能力低、精度不足的问题,提出改进Adaboost算法集成反向传播(BP)神经网络组合分类模型。该模型通过构造多个神经网络弱分类器,赋予弱分类器权值,将其线性组合为强分类器。改进后的Adaboost算法以最优化归一化因子为目标,在提升过程中调整样本权值更新策略,以此达到最小化归一化因子的目的,从而确保增加弱分类器个数的同时降低误差上界估计,通过最终集成的强分类器来提高模型的泛化能力和分类精度。选取江苏境内6个站点的逐日气象资料作为实验数据,建立7个降水等级的预报模型,从对降雨量有影响的众多因素中,选取12个与降水相关性较大的属性作为预报因子。通过多次实验统计,结果表明基于改进的Adaboost-BP组合模型具有较好的性能,尤其对58259站点的适应性较好,总体分类精度达到81%,在7个等级中,对0级降雨的预测精度最好,对其他等级的降雨预测有不同程度的精度提升,理论推导及实验结果证明该种改进可以提高预测精度。  相似文献   

An Electrocardiogram or ECG is an electrical recording of the heart and is used in the investigation of heart disease. This ECG can be classified as normal and abnormal signals. The classification of the ECG signals is presently performed with the support vector machine. The generalization performance of the SVM classifier is not sufficient for the correct classification of ECG signals. To overcome this problem, the ELM classifier is used which works by searching for the best value of the parameters that tune its discriminant function and upstream by looking for the best subset of features that feed the classifier. The experiments were conducted on the ECG data from the Physionet arrhythmia database to classify five kinds of abnormal waveforms and normal beats. In this paper, a thorough experimental study was done to show the superiority of the generalization capability of the Extreme Learning Machine (ELM) that is presented and compared with support vector machine (SVM) approach in the automatic classification of ECG beats. In particular, the sensitivity of the ELM classifier is tested and that is compared with SVM combined with two classifiers, and they are the k-nearest Neighbor Classifier and the radial basis function neural network classifier, with respect to the curse of dimensionality and the number of available training beats. The obtained results clearly confirm the superiority of the ELM approach as compared with traditional classifiers.  相似文献   

Credit-risk evaluation is a very challenging and important problem in the domain of financial analysis. Many classification methods have been proposed in the literature to tackle this problem. Statistical and neural network based approaches are among the most popular paradigms. However, most of these methods produce so-called “hard” classifiers, those generate decisions without any accompanying confidence measure. In contrast, “soft” classifiers, such as those designed using fuzzy set theoretic approach; produce a measure of support for the decision (and also alternative decisions) that provides the analyst with greater insight. In this paper, we propose a method of building credit-scoring models using fuzzy rule based classifiers. First, the rule base is learned from the training data using a SOM based method. Then the fuzzy k-nn rule is incorporated with it to design a contextual classifier that integrates the context information from the training set for more robust and qualitatively better classification. Further, a method of seamlessly integrating business constraints into the model is also demonstrated.  相似文献   

During the last few years there has been marked attention towards hybrid and ensemble systems development, having proved their ability to be more accurate than single classifier models. However, among the hybrid and ensemble models developed in the literature there has been little consideration given to: 1) combining data filtering and feature selection methods 2) combining classifiers of different algorithms; and 3) exploring different classifier output combination techniques other than the traditional ones found in the literature. In this paper, the aim is to improve predictive performance by presenting a new hybrid ensemble credit scoring model through the combination of two data pre-processing methods based on Gabriel Neighbourhood Graph editing (GNG) and Multivariate Adaptive Regression Splines (MARS) in the hybrid modelling phase. In addition, a new classifier combination rule based on the consensus approach (ConsA) of different classification algorithms during the ensemble modelling phase is proposed. Several comparisons will be carried out in this paper, as follows: 1) Comparison of individual base classifiers with the GNG and MARS methods applied separately and combined in order to choose the best results for the ensemble modelling phase; 2) Comparison of the proposed approach with all the base classifiers and ensemble classifiers with the traditional combination methods; and 3) Comparison of the proposed approach with recent related studies in the literature. Five of the well-known base classifiers are used, namely, neural networks (NN), support vector machines (SVM), random forests (RF), decision trees (DT), and naïve Bayes (NB). The experimental results, analysis and statistical tests prove the ability of the proposed approach to improve prediction performance against all the base classifiers, hybrid and the traditional combination methods in terms of average accuracy, the area under the curve (AUC) H-measure and the Brier Score. The model was validated over seven real world credit datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号