首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Features selection is the process of choosing the relevant subset of features from the high-dimensional dataset to enhance the performance of the classifier. Much research has been carried out in the present world for the process of feature selection. Algorithms such as Naïve Bayes (NB), decision tree, and genetic algorithm are applied to the high-dimensional dataset to select the relevant features and also to increase the computational speed. The proposed model presents a solution for selection of features using ensemble classifier algorithms. The proposed algorithm is the combination of minimum redundancy and maximum relevance (mRMR) and forest optimization algorithm (FOA). Ensemble-based algorithms such as support vector machine (SVM), K-nearest neighbor (KNN), and NB is further used to enhance the performance of the classifier algorithm. The mRMR-FOA is used to select the relevant features from the various datasets and 21% to 24% improvement is recorded in the feature selection. The ensemble classifier algorithms further improves the performance of the algorithm and provides accuracy of 96%.  相似文献   

2.
3.
为提高多分类器系统的分类精度,提出了一种基于粗糙集属性约简的分类器集成方法 MCS_ARS。该方法利用粗糙集属性约简和数据子集划分方法获得若干个特征约简子集和数据子集,并据此训练基分类器;然后利用分类结果相似性得到验证集的若干个预测类别;最后利用多数投票法得到验证集的最终类别。利用UCI标准数据集对方法 MCS_ARS的性能进行测试。实验结果表明,相较于经典的集成方法,方法 MCS_ARS可以获得更高的分类准确率和稳定性。  相似文献   

4.
5.
In this paper, the concept of finding an appropriate classifier ensemble for named entity recognition is posed as a multiobjective optimization (MOO) problem. Our underlying assumption is that instead of searching for the best-fitting feature set for a particular classifier, ensembling of several classifiers those are trained using different feature representations could be a more fruitful approach, but it is crucial to determine the appropriate subset of classifiers that are most suitable for the ensemble. We use three heterogenous classifiers namely maximum entropy, conditional random field, and support vector machine in order to build a number of models depending upon the various representations of the available features. The proposed MOO-based ensemble technique is evaluated for three resource-constrained languages, namely Bengali, Hindi, and Telugu. Evaluation results yield the recall, precision, and F-measure values of 92.21, 92.72, and 92.46%, respectively, for Bengali; 97.07, 89.63, and 93.20%, respectively, for Hindi; and 80.79, 93.18, and 86.54%, respectively, for Telugu. We also evaluate our proposed technique with the CoNLL-2003 shared task English data sets that yield the recall, precision, and F-measure values of 89.72, 89.84, and 89.78%, respectively. Experimental results show that the classifier ensemble identified by our proposed MOO-based approach outperforms all the individual classifiers, two different conventional baseline ensembles, and the classifier ensemble identified by a single objective?Cbased approach. In a part of the paper, we formulate the problem of feature selection in any classifier under the MOO framework and show that our proposed classifier ensemble attains superior performance to it.  相似文献   

6.
In recent years, computational paralinguistics has emerged as a new topic within speech technology. It concerns extracting non-linguistic information from speech (such as emotions, the level of conflict, whether the speaker is drunk). It was shown recently that many methods applied here can be assisted by speaker clustering; for example, the features extracted from the utterances could be normalized speaker-wise instead of using a global method. In this paper, we propose a speaker clustering algorithm based on standard clustering approaches like K-means and feature selection. By applying this speaker clustering technique in two paralinguistic tasks, we were able to significantly improve the accuracy scores of several machine learning methods, and we also obtained an insight into what features could be efficiently used to separate the different speakers.  相似文献   

7.
为了解决在分类器集成过程中分类性能要求高和集成过程复杂等问题,分析常规集成方法的优缺点,研究已有的分类器差异性度量方法,提出了筛选差异性尽可能大的分类器作为基分类器而构建的一个层级式分类器集成系统.构建不同的基分类器,选择准确率较高的备选,分析其差异性,选出差异大的分类器作为系统所需基分类器,构成集成系统.通过在UCI数据集上进行的试验,获得了很好的分类识别效果,验证了这种分类集成系统的优越性.  相似文献   

8.
Recognition of protein folding patterns is an important step in protein structure and function predictions. Traditional sequence similarity-based approach fails to yield convincing predictions when proteins have low sequence identities, while the taxonometric approach is a reliable alternative. From a pattern recognition perspective, protein fold recognition involves a large number of classes with only a small number of training samples, and multiple heterogeneous feature groups derived from different propensities of amino acids. This raises the need for a classification method that is able to handle the data complexity with a high prediction accuracy for practical applications. To this end, a novel ensemble classifier, called MarFold, is proposed in this paper which combines three margin-based classifiers for protein fold recognition.The effectiveness of our method is demonstrated with the benchmark D-B dataset with 27 classes. The overall prediction accuracy obtained by MarFold is 71.7%, which surpasses the existing fold recognition methods by 3.1–15.7%. Moreover, one component classifier for MarFold, called ALH, has obtained a prediction accuracy of 65.5%, which is 4.7–9.5% higher than the prediction accuracies for the published methods using single classifiers. Additionally, the feature set of pairwise frequency information about the amino acids, which is adopted by MarFold, is found to be important for discriminating folding patterns. These results imply that the MarFold method and its operation engine ALH might become useful vehicles for protein fold recognition, as well as other bioinformatics tasks. The MarFold method and the datasets can be obtained from: (http://www-staff.it.uts.edu.au/~lbcao/publication/MarFold.7z).  相似文献   

9.

In dynamic ensemble selection (DES) techniques, only the most competent classifiers, for the classification of a specific test sample, are selected to predict the sample’s class labels. The key in DES techniques is estimating the competence of the base classifiers for the classification of each specific test sample. The classifiers’ competence is usually estimated according to a given criterion, which is computed over the neighborhood of the test sample defined on the validation data, called the region of competence. A problem arises when there is a high degree of noise in the validation data, causing the samples belonging to the region of competence to not represent the query sample. In such cases, the dynamic selection technique might select the base classifier that overfitted the local region rather than the one with the best generalization performance. In this paper, we propose two modifications in order to improve the generalization performance of any DES technique. First, a prototype selection technique is applied over the validation data to reduce the amount of overlap between the classes, producing smoother decision borders. During generalization, a local adaptive K-Nearest Neighbor algorithm is used to minimize the influence of noisy samples in the region of competence. Thus, DES techniques can better estimate the classifiers’ competence. Experiments are conducted using 10 state-of-the-art DES techniques over 30 classification problems. The results demonstrate that the proposed scheme significantly improves the classification accuracy of dynamic selection techniques.

  相似文献   

10.
11.
《微型机与应用》2016,(13):51-54
针对电信客户流失数据集存在的数据维度过高及单一分类器预测效果较弱的问题,结合过滤式和封装式特征选择方法的优点及组合分类器的较高预测能力,提出了一种基于Fisher比率与预测风险准则的分步特征选择方法结合组合分类器的电信客户流失预测模型。首先,基于Fisher比率从原始特征集合中提取具有较高判别能力的特征;其次,采用预测风险准则进一步选取对分类模型预测效果影响较大的特征;最后,构建基于平均概率输出和加权概率输出的组合分类器,以进一步提高客户流失预测效果。实验结果表明,相对于单步特征提取和单分类器模型,该方法能够提高对客户流失预测的效果。  相似文献   

12.
Neural Computing and Applications - Breast cancer is one of the leading causes of death among women worldwide. Many methods have been proposed for automatic breast cancer diagnosis. One popular...  相似文献   

13.
A dynamic classifier ensemble selection approach for noise data   总被引:2,自引:0,他引:2  
Dynamic classifier ensemble selection (DCES) plays a strategic role in the field of multiple classifier systems. The real data to be classified often include a large amount of noise, so it is important to study the noise-immunity ability of various DCES strategies. This paper introduces a group method of data handling (GMDH) to DCES, and proposes a novel dynamic classifier ensemble selection strategy GDES-AD. It considers both accuracy and diversity in the process of ensemble selection. We experimentally test GDES-AD and six other ensemble strategies over 30 UCI data sets in three cases: the data sets do not include artificial noise, include class noise, and include attribute noise. Statistical analysis results show that GDES-AD has stronger noise-immunity ability than other strategies. In addition, we find out that Random Subspace is more suitable for GDES-AD compared with Bagging. Further, the bias-variance decomposition experiments for the classification errors of various strategies show that the stronger noise-immunity ability of GDES-AD is mainly due to the fact that it can reduce the bias in classification error better.  相似文献   

14.
A data driven ensemble classifier for credit scoring analysis   总被引:2,自引:0,他引:2  
This study focuses on predicting whether a credit applicant can be categorized as good, bad or borderline from information initially supplied. This is essentially a classification task for credit scoring. Given its importance, many researchers have recently worked on an ensemble of classifiers. However, to the best of our knowledge, unrepresentative samples drastically reduce the accuracy of the deployment classifier. Few have attempted to preprocess the input samples into more homogeneous cluster groups and then fit the ensemble classifier accordingly. For this reason, we introduce the concept of class-wise classification as a preprocessing step in order to obtain an efficient ensemble classifier. This strategy would work better than a direct ensemble of classifiers without the preprocessing step. The proposed ensemble classifier is constructed by incorporating several data mining techniques, mainly involving optimal associate binning to discretize continuous values; neural network, support vector machine, and Bayesian network are used to augment the ensemble classifier. In particular, the Markov blanket concept of Bayesian network allows for a natural form of feature selection, which provides a basis for mining association rules. The learned knowledge is represented in multiple forms, including causal diagram and constrained association rules. The data driven nature of the proposed system distinguishes it from existing hybrid/ensemble credit scoring systems.  相似文献   

15.
Most of studies on streaming data classification are based on the assumption that data can be fully labeled. However, in real-life applications, it is impractical and time-consuming to manually label the entire stream for training. It is very common that only a small part of positive data and a large amount of unlabeled data are available in data stream environments. In this case, applying the traditional streaming algorithms with straightforward adaptation to positive unlabeled stream may not work well or lead to poor performance. In this paper, we propose a Dynamic Classifier Ensemble method for Positive and Unlabeled text stream (DCEPU) classification scenarios. We address the problem of classifying positive and unlabeled text stream with various concept drift by constructing an appropriate validation set and designing a novel dynamic weighting scheme in the classification phase. Experimental results on benchmark dataset RCV1-v2 demonstrate that the proposed method DCEPU outperforms the existing LELC (Li et?al. 2009b), DVS (with necessary adaption) (Tsymbal et?al. in Inf Fusion 9(1):56?C68, 2008), and Stacking style ensemble-based algorithm (Zhang et?al. 2008b).  相似文献   

16.

This paper presents a random boosting ensemble (RBE) classifier for remote sensing image classification, which introduces the random projection feature selection and bootstrap methods to obtain base classifiers for classifier ensemble. The RBE method is built based on an improved boosting framework, which is quite efficient for the few-shot problem due to the bootstrap in use. In RBE, kernel extreme machine (KELM) is applied to design base classifiers, which actually make RBE quite efficient due to feature reduction. The experimental results on the remote scene image classification demonstrate that RBE can effectively improve the classification performance, and resulting into a better generalization ability on the 21-class land-use dataset and the India pine satellite scene dataset.

  相似文献   

17.
The concept of a classifier competence is fundamental to multiple classifier systems (MCSs). In this study, a method for calculating the classifier competence is developed using a probabilistic model. In the method, first a randomised reference classifier (RRC) whose class supports are realisations of the random variables with beta probability distributions is constructed. The parameters of the distributions are chosen in such a way that, for each feature vector in a validation set, the expected values of the class supports produced by the RRC and the class supports produced by a modelled classifier are equal. This allows for using the probability of correct classification of the RRC as the competence of the modelled classifier. The competences calculated for a validation set are then generalised to an entire feature space by constructing a competence function based on a potential function model or regression. Three systems based on a dynamic classifier selection and a dynamic ensemble selection (DES) were constructed using the method developed. The DES based system had statistically significant higher average rank than the ones of eight benchmark MCSs for 22 data sets and a heterogeneous ensemble. The results obtained indicate that the full vector of class supports should be used for evaluating the classifier competence as this potentially improves performance of MCSs.  相似文献   

18.
将极限学习机算法与旋转森林算法相结合,提出了以ELM算法为基分类器并以旋转森林算法为框架的RF-ELM集成学习模型。在8个数据集上进行了3组预测实验,根据实验结果讨论了ELM算法中隐含层神经元个数对预测结果的影响以及单个ELM模型预测结果不稳定的缺陷;将RF-ELM模型与单ELM模型和基于Bagging算法集成的ELM模型相比较,由稳定性和预测精度的两组对比实验的实验结果表明,对ELM的集成学习可以有效地提高ELM模型的性能,且RF-ELM模型较其他两个模型具有更好的稳定性和更高的准确率,验证了RF-ELM是一种有效的ELM集成学习模型。  相似文献   

19.
It is challenging to use traditional data mining techniques to deal with real-time data stream classifications. Existing mining classifiers need to be updated frequently to adapt to the changes in data streams. To address this issue, in this paper we propose an adaptive ensemble approach for classification and novel class detection in concept drifting data streams. The proposed approach uses traditional mining classifiers and updates the ensemble model automatically so that it represents the most recent concepts in data streams. For novel class detection we consider the idea that data points belonging to the same class should be closer to each other and should be far apart from the data points belonging to other classes. If a data point is well separated from the existing data clusters, it is identified as a novel class instance. We tested the performance of this proposed stream classification model against that of existing mining algorithms using real benchmark datasets from UCI (University of California, Irvine) machine learning repository. The experimental results prove that our approach shows great flexibility and robustness in novel class detection in concept drifting and outperforms traditional classification models in challenging real-life data stream applications.  相似文献   

20.
The problem addressed in this letter concerns the multiclassifier generation by a random subspace method (RSM). In the RSM, the classifiers are constructed in random subspaces of the data feature space. In this letter, we propose an evolved feature weighting approach: in each subspace, the features are multiplied by a weight factor for minimizing the error rate in the training set. An efficient method based on particle swarm optimization (PSO) is here proposed for finding a set of weights for each feature in each subspace. The performance improvement with respect to the state-of-the-art approaches is validated through experiments with several benchmark data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号