首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Credit scoring focuses on the development of empirical models to support the financial decision‐making processes of financial institutions and credit industries. It makes use of applicants' historical data and statistical or machine learning techniques to assess the risk associated with an applicant. However, the historical data may consist of redundant and noisy features that affect the performance of credit scoring models. The main focus of this paper is to develop a hybrid model, combining feature selection and a multilayer ensemble classifier framework, to improve the predictive performance of credit scoring. The proposed hybrid credit scoring model is modeled in three phases. The initial phase constitutes preprocessing and assigns ranks and weights to classifiers. In the next phase, the ensemble feature selection approach is applied to the preprocessed dataset. Finally, in the last phase, the dataset with the selected features is used in a multilayer ensemble classifier framework. In addition, a classifier placement algorithm based on the Choquet integral value is designed, as the classifier placement affects the predictive performance of the ensemble framework. The proposed hybrid credit scoring model is validated on real‐world credit scoring datasets, namely, Australian, Japanese, German‐categorical, and German‐numerical datasets.  相似文献   

2.
Feature selection is an important data preprocessing step for the construction of an effective bankruptcy prediction model. The prediction performance can be affected by the employed feature selection and classification techniques. However, there have been very few studies of bankruptcy prediction that identify the best combination of feature selection and classification techniques. In this study, two types of feature selection methods, including filter‐ and wrapper‐based methods, are considered, and two types of classification techniques, including statistical and machine learning techniques, are employed in the development of the prediction methods. In addition, bagging and boosting ensemble classifiers are also constructed for comparison. The experimental results based on three related datasets that contain different numbers of input features show that the genetic algorithm as the wrapper‐based feature selection method performs better than the filter‐based one by information gain. It is also shown that the lowest prediction error rates for the three datasets are provided by combining the genetic algorithm with the naïve Bayes and support vector machine classifiers without bagging and boosting.  相似文献   

3.
The aim of this paper is to propose a new hybrid data mining model based on combination of various feature selection and ensemble learning classification algorithms, in order to support decision making process. The model is built through several stages. In the first stage, initial dataset is preprocessed and apart of applying different preprocessing techniques, we paid a great attention to the feature selection. Five different feature selection algorithms were applied and their results, based on ROC and accuracy measures of logistic regression algorithm, were combined based on different voting types. We also proposed a new voting method, called if_any, that outperformed all other voting methods, as well as a single feature selection algorithm's results. In the next stage, a four different classification algorithms, including generalized linear model, support vector machine, naive Bayes and decision tree, were performed based on dataset obtained in the feature selection process. These classifiers were combined in eight different ensemble models using soft voting method. Using the real dataset, the experimental results show that hybrid model that is based on features selected by if_any voting method and ensemble GLM + DT model performs the highest performance and outperforms all other ensemble and single classifier models.  相似文献   

4.
为了从分类器集成系统中选择出一组差异性大的子分类器,从而提高集成系统的泛化能力,提出了一种基于混合选择策略的直觉模糊核匹配追踪算法.基本思想是通过扰动训练集和特征空间生成一组子分类器;然后采用k均值聚类算法将对所得子分类器进行修剪,删去其中的冗余分类器;最后根据实际识别目标动态选择出较高识别率的分类器组合,使选择性集成规模能够随识别目标的复杂程度而自适应地变化,并基于预期识别精度实现循环集成.实验结果表明,与其他常用的分类器选择方法相比,本文方法灵活高效,具有更好的识别效果和泛化能力.  相似文献   

5.
Protein function prediction is an important problem in functional genomics. Typically, protein sequences are represented by feature vectors. A major problem of protein datasets that increase the complexity of classification models is their large number of features. Feature selection (FS) techniques are used to deal with this high dimensional space of features. In this paper, we propose a novel feature selection algorithm that combines genetic algorithms (GA) and ant colony optimization (ACO) for faster and better search capability. The hybrid algorithm makes use of advantages of both ACO and GA methods. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of two prominent population-based algorithms, ACO and genetic algorithms. Experimentation is carried out using two challenging biological datasets, involving the hierarchical functional classification of GPCRs and enzymes. The criteria used for comparison are maximizing predictive accuracy, and finding the smallest subset of features. The results of experiments indicate the superiority of proposed algorithm.  相似文献   

6.
基于分形布朗运动和Ada Boosting的多类音频例子识别   总被引:2,自引:0,他引:2  
提出了一种基于分形布朗运动的音频特征提取和识别方法.这种方法使用分形布朗运动模型计算出音频例子的分形维数,并作为其分形特征.针对音频分形特征符合高斯分布的特点,使用Ada Boosting算法进行特征约减.然后分别使用Ada-加权高斯分类器和支持向量机对约减特征后的音频分类,并在两类分类的基础上构造多类分类的模型.实验表明,经过特征约减后的音频分形特征在音乐和语音的分类中都优于其他音频特征.  相似文献   

7.
针对医学疾病数据中存在特征冗余的问题,以XGBoost特征选择方法度量特征重要度,删除冗余特征,选择最佳分类特征;针对识别精度不高的问题,使用Stacking方法集成XGBoost、LightGBM等多种异质分类器,并在异质分类器中引入性能更好的CatBoost分类器提升集成分类器分类精度。为了避免过拟合,选择基层分类器输出的分类概率作为高层分类器输入。实验结果表明,提出的基于XGBoost特征选择的XLC-Stacking方法相比当前主流分类算法以及单一的XGBoost算法和Stacking方法有较大提升,识别的准确率和F1-Score达到97.73%和98.21%,更加适用于疾病的诊断。  相似文献   

8.
由于高维数据通常存在冗余和噪声,在其上直接构造覆盖模型不能充分反映数据的分布信息,导致分类器性能下降.为此提出一种基于精简随机子空间多树集成分类方法.该方法首先生成多个随机子空间,并在每个子空间上构造独立的最小生成树覆盖模型.其次对每个子空间上构造的分类模型进行精简处理,通过一个评估准则(AUC值),对生成的一类分类器进行精简.最后均值合并融合这些分类器为一个集成分类器.实验结果表明,与其它直接覆盖分类模型和bagging算法相比,多树集成覆盖分类器具有更高的分类正确率.  相似文献   

9.
Gender recognition has been playing a very important role in various applications such as human–computer interaction, surveillance, and security. Nonlinear support vector machines (SVMs) were investigated for the identification of gender using the Face Recognition Technology (FERET) image face database. It was shown that SVM classifiers outperform the traditional pattern classifiers (linear, quadratic, Fisher linear discriminant, and nearest neighbour). In this context, this paper aims to improve the SVM classification accuracy in the gender classification system and propose new models for a better performance. We have evaluated different SVM learning algorithms; the SVM‐radial basis function with a 5% outlier fraction outperformed other SVM classifiers. We have examined the effectiveness of different feature selection methods. AdaBoost performs better than the other feature selection methods in selecting the most discriminating features. We have proposed two classification methods that focus on training subsets of images among the training images. Method 1 combines the outcome of different classifiers based on different image subsets, whereas method 2 is based on clustering the training data and building a classifier for each cluster. Experimental results showed that both methods have increased the classification accuracy.  相似文献   

10.
We present attribute bagging (AB), a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features. AB is a wrapper method that can be used with any learning algorithm. It establishes an appropriate attribute subset size and then randomly selects subsets of features, creating projections of the training set on which the ensemble classifiers are built. The induced classifiers are then used for voting. This article compares the performance of our AB method with bagging and other algorithms on a hand-pose recognition dataset. It is shown that AB gives consistently better results than bagging, both in accuracy and stability. The performance of ensemble voting in bagging and the AB method as a function of the attribute subset size and the number of voters for both weighted and unweighted voting is tested and discussed. We also demonstrate that ranking the attribute subsets by their classification accuracy and voting using only the best subsets further improves the resulting performance of the ensemble.  相似文献   

11.
In fault detection systems, a massive amount of data gathered from the life-cycle of equipment is often used to learn models or classifiers that aims at diagnosing different kinds of errors or failures. Among this huge quantity of information, some features (or sets of features) are more correlated with a kind of failure than another. The presence of irrelevant features might affect the performance of the classifier. To improve the performance of a detection system, feature selection is hence a key step. We propose in this paper an algorithm named STRASS, which aims at detecting relevant features for classification purposes. In certain cases, when there exists a strong correlation between some features and the associated class, conventional feature selection algorithms fail at selecting the most relevant features. In order to cope with this problem, STRASS algorithm uses k-way correlation between features and the class to select relevant features. To assess the performance of STRASS, we apply it on simulated data collected from the Tennessee Eastman chemical plant simulator. The Tennessee Eastman process (TEP) has been used in many fault detection studies and three specific faults are not well discriminated with conventional algorithms. The results obtained by STRASS are compared to those obtained with reference feature selection algorithms. We show that the features selected by STRASS always improve the performance of a classifier compared to the whole set of original features and that the obtained classification is better than with most of the other feature selection algorithms.  相似文献   

12.
In recent years, fuzzy rough set theory has emerged as a suitable tool for performing feature selection. Fuzzy rough feature selection enables us to analyze the discernibility of the attributes, highlighting the most attractive features in the construction of classifiers. However, its results can be enhanced even more if other data reduction techniques, such as instance selection, are considered.In this work, a hybrid evolutionary algorithm for data reduction, using both instance and feature selection, is presented. A global process of instance selection, carried out by a steady-state genetic algorithm, is combined with a fuzzy rough set based feature selection process, which searches for the most interesting features to enhance both the evolutionary search process and the final preprocessed data set. The experimental study, the results of which have been contrasted through nonparametric statistical tests, shows that our proposal obtains high reduction rates on training sets which greatly enhance the behavior of the nearest neighbor classifier.  相似文献   

13.
Ensemble methods have proven to be highly effective in improving the performance of base learners under most circumstances. In this paper, we propose a new algorithm that combines the merits of some existing techniques, namely, bagging, arcing, and stacking. The basic structure of the algorithm resembles bagging. However, the misclassification cost of each training point is repeatedly adjusted according to its observed out-of-bag vote margin. In this way, the method gains the advantage of arcing-building the classifier the ensemble needs - without fixating on potentially noisy points. Computational experiments show that this algorithm performs consistently better than bagging and arcing with linear and nonlinear base classifiers. In view of the characteristics of bacing, a hybrid ensemble learning strategy, which combines bagging and different versions of bacing, is proposed and studied empirically.  相似文献   

14.
A novel self‐organizing neuro‐fuzzy multilayered classifier (SONeFMUC) is introduced in this paper, with feature selection capabilities, for the classification of an IKONOS image. The structure of the proposed network is developed in a sequential fashion using the group method of data handling (GMDH) algorithm. The node models, regarded as generic classifiers, are represented by fuzzy rule‐based systems, combined with a fusion scheme. A data splitting mechanism is incorporated to discriminate between correctly classified and ambiguous pixels. The classifier was tested on the wetland of international importance of Lake Koronia, Greece, and the surrounding agricultural area. To achieve higher classification accuracy, the image was decomposed into two zones: the wetland and the agricultural zones. Apart from the initial bands, additional input features were considered: textural features, intensity–hue–saturation (IHS) and tasseled cap transformation. To assess the quality of the suggested model, the SONeFMUC was compared with a maximum likelihood classifier (MLC). The experimental results show that the SONeFMUC exhibited superior performance to the MLC, providing less confusion of the dominant classes in both zones. In the wetland zone, an overall accuracy of 89.5% was attained.  相似文献   

15.
A genetic algorithm-based method for feature subset selection   总被引:5,自引:2,他引:3  
As a commonly used technique in data preprocessing, feature selection selects a subset of informative attributes or variables to build models describing data. By removing redundant and irrelevant or noise features, feature selection can improve the predictive accuracy and the comprehensibility of the predictors or classifiers. Many feature selection algorithms with different selection criteria has been introduced by researchers. However, it is discovered that no single criterion is best for all applications. In this paper, we propose a framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. We conducted experiments using three data sets and three existing feature selection methods. The experimental results demonstrate that our approach is a robust and effective approach to find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm.  相似文献   

16.
This paper presents cluster‐based ensemble classifier – an approach toward generating ensemble of classifiers using multiple clusters within classified data. Clustering is incorporated to partition data set into multiple clusters of highly correlated data that are difficult to separate otherwise and different base classifiers are used to learn class boundaries within the clusters. As the different base classifiers engage on different difficult‐to‐classify subsets of the data, the learning of the base classifiers is more focussed and accurate. A selection rather than fusion approach achieves the final verdict on patterns of unknown classes. The impact of clustering on the learning parameters and accuracy of a number of learning algorithms including neural network, support vector machine, decision tree and k‐NN classifier is investigated. A number of benchmark data sets from the UCI machine learning repository were used to evaluate the cluster‐based ensemble classifier and the experimental results demonstrate its superiority over bagging and boosting.  相似文献   

17.
Web page classification has become a challenging task due to the exponential growth of the World Wide Web. Uniform Resource Locator (URL)‐based web page classification systems play an important role, but high accuracy may not be achievable as URL contains minimal information. Nevertheless, URL‐based classifiers along with rejection framework can be used as a first‐level filter in a multistage classifier, and a costlier feature extraction from contents may be done in later stages. However, noisy and irrelevant features present in URL demand feature selection methods for URL classification. Therefore, we propose a supervised feature selection method by which relevant URL features are identified using statistical methods. We propose a new feature weighting method for a Naive Bayes classifier by embedding the term goodness obtained from the feature selection method. We also propose a rejection framework to the Naive Bayes classifier by using posterior probability for determining the confidence score. The proposed method is evaluated on the Open Directory Project and WebKB data sets. Experimental results show that our method can be an effective first‐level filter. McNemar tests confirm that our approach significantly improves the performance.  相似文献   

18.
特征选择对于分类器的分类精度和泛化性能起重要作用。目前的多标记特征选择算法主要利用最大相关性最小冗余性准则在全部特征集中进行特征选择,没有考虑专家特征,因此多标记特征选择算法的运行时间较长、复杂度较高。实际上,在现实生活中专家依据几个或者多个关键特征就能够直接决定整体的预测方向。如果提取关注这些信息,必将减少特征选择的计算时间,甚至提升分类器性能。基于此,提出一种基于专家特征的条件互信息多标记特征选择算法。首先将专家特征与剩余的特征相联合,再利用条件互信息得出一个与标记集合相关性由强到弱的特征序列,最后通过划分子空间去除冗余性较大的特征。该算法在7个多标记数据集上进行了实验对比,结果表明该算法较其他特征选择算法有一定优势,统计假设检验与稳定性分析进一步证明了所提出算法的有效性和合理性。  相似文献   

19.
This paper proposes a hybrid-boost learning algorithm for multi-pose face detection and facial expression recognition. To speed-up the detection process, the system searches the entire frame for the potential face regions by using skin color detection and segmentation. Then it scans the skin color segments of the image and applies the weak classifiers along with the strong classifier for face detection and expression classification. This system detects human face in different scales, various poses, different expressions, partial-occlusion, and defocus. Our major contribution is proposing the weak hybrid classifiers selection based on the Harr-like (local) features and Gabor (global) features. The multi-pose face detection algorithm can also be modified for facial expression recognition. The experimental results show that our face detection system and facial expression recognition system have better performance than the other classifiers.  相似文献   

20.
王丽娟 《计算机工程》2010,36(16):166-168
为改善维数灾难对K近邻分类器的影响,提出一种基于遗传算法(GA)的多扰动的K近邻融合算法,简称GA-MKNNC算法。目标扰动将所识别的问题划分成多个子分类问题进行单独识别。针对不同子分类问题,数据扰动选取相关的数据,特征扰动确定相关的特征,参数扰动明确相关参数值。数据扰动由Bagging算法确定。特征扰动和参数扰动通过GA学习得到。多个子分类问题的决策通过最大融合得到最终决策。实验结果表明,该算法的性能优于K近邻分类器及多数融合算法,且选用的子分类器数目少于FASBIR算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号