共查询到20条相似文献,搜索用时 15 毫秒
1.
Micorarray data are often extremely asymmetric in dimensionality, such as thousands or even tens of thousands of genes and a few hundreds of samples. Such extreme asymmetry between the dimensionality of genes and samples presents several challenges to conventional clustering and classification methods. In this paper, a novel ensemble method is proposed. Firstly, in order to extract useful features and reduce dimensionality, different feature selection methods such as correlation analysis, Fisher-ratio is used to form different feature subsets. Then a pool of candidate base classifiers is generated to learn the subsets which are re-sampling from the different feature subsets with PSO (Particle Swarm Optimization) algorithm. At last, appropriate classifiers are selected to construct the classification committee using EDAs (Estimation of Distribution Algorithms). Experiments show that the proposed method produces the best recognition rates on four benchmark databases. 相似文献
2.
In this paper a new framework for feature selection consisting of an ensemble of filters and classifiers is described. Five filters, based on different metrics, were employed. Each filter selects a different subset of features which is used to train and to test a specific classifier. The outputs of these five classifiers are combined by simple voting. In this study three well-known classifiers were employed for the classification task: C4.5, naive-Bayes and IB1. The rationale of the ensemble is to reduce the variability of the features selected by filters in different classification domains. Its adequacy was demonstrated by employing 10 microarray data sets. 相似文献
3.
Data classification is usually based on measurements recorded at the same time. This paper considers temporal data classification where the input is a temporal database that describes measurements over a period of time in history while the predicted class is expected to occur in the future. We describe a new temporal classification method that improves the accuracy of standard classification methods. The benefits of the method are tested on weather forecasting using the meteorological database from the Texas Commission on Environmental Quality and on influenza using the Google Flu Trends database. 相似文献
5.
Multilabel classification is a challenging research problem in which each instance may belong to more than one class. Recently, a considerable amount of research has been concerned with the development of “good” multi-label learning methods. Despite the extensive research effort, many scientific challenges posed by e.g. highly imbalanced training sets and correlation among labels remain to be addressed. The aim of this paper is to use a heterogeneous ensemble of multi-label learners to simultaneously tackle both the sample imbalance and label correlation problems. This is different from the existing work in the sense that we are proposing to combine state-of-the-art multi-label methods by ensemble techniques instead of focusing on ensemble techniques within a multi-label learner. The proposed ensemble approach (EML) is applied to six publicly available multi-label data sets from various domains including computer vision, biology and text using several evaluation criteria. We validate the advocated approach experimentally and demonstrate that it yields significant performance gains when compared with state-of-the art multi-label methods. 相似文献
6.
This paper presents a system to predict gender of individuals from offline handwriting samples. The technique relies on extracting a set of textural features from handwriting samples of male and female writers and training multiple classifiers to learn to discriminate between the two gender classes. The features include local binary patterns (LBP), histogram of oriented gradients (HOG), statistics computed from gray-level co-occurrence matrices (GLCM) and features extracted through segmentation-based fractal texture analysis (SFTA). For classification, we employ artificial neural networks (ANN), support vector machine (SVM), nearest neighbor classifier (NN), decision trees (DT) and random forests (RF). Classifiers are then combined using bagging, voting and stacking techniques to enhance the overall system performance. The realized classification rates are significantly better than those of the state-of-the-art systems on this problem validating the ideas put forward in this study. 相似文献
7.
In this paper, we present a novel approach for classification named Probabilistic Semi-supervised Random Subspace Sparse Representation (P-RSSR). In many random subspaces based methods, all features have the same probability to be selected to compose the random subspace. However, in the real world, especially in images, some regions or features are important for classification and some are not. In the proposed P-RSSR, firstly, we calculate the distribution probability of the image and determine which feature is selected to compose the random subspace. Then, we use Sparse Representation (SR) to construct graphs to characterize the distribution of samples in random subspaces, and train classifiers under the framework of Manifold Regularization (MR) in these random subspaces. Finally, we fuse the results in all random subspaces and obtain the classified results through majority vote. Experimental results on face image datasets have demonstrated the effectiveness of the proposed P-RSSR. 相似文献
8.
Decoding perceptual or cognitive states based on brain activity measured using functional magnetic resonance imaging (fMRI) can be achieved using machine learning algorithms to train classifiers of specific stimuli. However, the high dimensionality and intrinsically low signal to noise ratio (SNR) of fMRI data poses great challenges to such techniques. The problem is aggravated in the case of multiple subject experiments because of the high inter-subject variability in brain function. To address these difficulties, the majority of current approaches uses a single classifier. Since, in many cases, different stimuli activate different brain areas, it makes sense to use a set of classifiers each specialized in a different stimulus. Therefore, we propose in this paper using an ensemble of classifiers for decoding fMRI data. Each classifier in the ensemble has a favorite class or stimulus and uses an optimized feature set for that particular stimulus. The output for each individual stimulus is therefore obtained from the corresponding classifier and the final classification is achieved by simply selecting the best score. The method was applied to three empirical fMRI datasets from multiple subjects performing visual tasks with four classes of stimuli. Ensembles of GNB and k-NN base classifiers were tested. The ensemble of classifiers systematically outperformed a single classifier for the two most challenging datasets. In the remaining dataset, a ceiling effect was observed which probably precluded a clear distinction between the two classification approaches. Our results may be explained by the fact that different visual stimuli elicit specific patterns of brain activation and indicate that an ensemble of classifiers provides an advantageous alternative to commonly used single classifiers, particularly when decoding stimuli associated with specific brain areas. 相似文献
9.
Instance selection aims at filtering out noisy data (or outliers) from a given training set, which not only reduces the need for storage space, but can also ensure that the classifier trained by the reduced set provides similar or better performance than the baseline classifier trained by the original set. However, since there are numerous instance selection algorithms, there is no concrete winner that is the best for various problem domain datasets. In other words, the instance selection performance is algorithm and dataset dependent. One main reason for this is because it is very hard to define what the outliers are over different datasets. It should be noted that, using a specific instance selection algorithm, over-selection may occur by filtering out too many ‘good’ data samples, which leads to the classifier providing worse performance than the baseline. In this paper, we introduce a dual classification (DuC) approach, which aims to deal with the potential drawback of over-selection. Specifically, performing instance selection over a given training set, two classifiers are trained using both a ‘good’ and ‘noisy’ sets respectively identified by the instance selection algorithm. Then, a test sample is used to compare the similarities between the data in the good and noisy sets. This comparison guides the input of the test sample to one of the two classifiers. The experiments are conducted using 50 small scale and 4 large scale datasets and the results demonstrate the superior performance of the proposed DuC approach over the baseline instance selection approach. 相似文献
10.
Pattern classification methods are a crucial direction in the current study of brain–computer interface (BCI) technology. A simple yet effective ensemble approach for electroencephalogram (EEG) signal classification named the random electrode selection ensemble (RESE) is developed, which aims to surmount the instability demerit of the Fisher discriminant feature extraction for BCI applications. Through the random selection of recording electrodes answering for the physiological background of user-intended mental activities, multiple individual classifiers are constructed. In a feature subspace determined by a couple of randomly selected electrodes, principal component analysis (PCA) is first used to carry out dimensionality reduction. Successively Fisher discriminant is adopted for feature extraction, and a Bayesian classifier with a Gaussian mixture model (GMM) approximating the feature distribution is trained. For a test sample the outputs from all the Bayesian classifiers are combined to give the final prediction for its label. Theoretical analysis and classification experiments with real EEG signals indicate that the RESE approach is both effective and efficient. 相似文献
11.
This paper presents a random boosting ensemble (RBE) classifier for remote sensing image classification, which introduces the random projection feature selection and bootstrap methods to obtain base classifiers for classifier ensemble. The RBE method is built based on an improved boosting framework, which is quite efficient for the few-shot problem due to the bootstrap in use. In RBE, kernel extreme machine (KELM) is applied to design base classifiers, which actually make RBE quite efficient due to feature reduction. The experimental results on the remote scene image classification demonstrate that RBE can effectively improve the classification performance, and resulting into a better generalization ability on the 21-class land-use dataset and the India pine satellite scene dataset. 相似文献
12.
Graph structure is vital to graph based semi-supervised learning. However, the problem of constructing a graph that reflects the underlying data distribution has been seldom investigated in semi-supervised learning, especially for high dimensional data. In this paper, we focus on graph construction for semi-supervised learning and propose a novel method called Semi-Supervised Classification based on Random Subspace Dimensionality Reduction, SSC-RSDR in short. Different from traditional methods that perform graph-based dimensionality reduction and classification in the original space, SSC-RSDR performs these tasks in subspaces. More specifically, SSC-RSDR generates several random subspaces of the original space and applies graph-based semi-supervised dimensionality reduction in these random subspaces. It then constructs graphs in these processed random subspaces and trains semi-supervised classifiers on the graphs. Finally, it combines the resulting base classifiers into an ensemble classifier. Experimental results on face recognition tasks demonstrate that SSC-RSDR not only has superior recognition performance with respect to competitive methods, but also is robust against a wide range of values of input parameters. 相似文献
13.
Various measures, such as Margin and Bias/Variance, have been proposed with the aim of gaining a better understanding of why Multiple Classifier Systems (MCS) perform as well as they do. While these measures provide different perspectives for MCS analysis, it is not clear how to use them for MCS design. In this paper a different measure based on a spectral representation is proposed for two-class problems. It incorporates terms representing positive and negative correlation of pairs of training patterns with respect to class labels. Experiments employing MLP base classifiers, in which parameters are fixed but systematically varied, demonstrate the sensitivity of the proposed measure to base classifier complexity. 相似文献
14.
Along with the increase of data and information, incremental learning ability turns out to be more and more important for machine learning approaches. The online algorithms try not to remember irrelevant information instead of synthesizing all available information (as opposed to classic batch learning algorithms). Today, combining classifiers is proposed as a new road for the improvement of the classification accuracy. However, most ensemble algorithms operate in batch mode. For this reason, we propose an incremental ensemble that combines five classifiers that can operate incrementally: the Naive Bayes, the Averaged One-Dependence Estimators (AODE), the 3-Nearest Neighbors, the Non-Nested Generalised Exemplars (NNGE) and the Kstar algorithms using the voting methodology. We performed a large-scale comparison of the proposed ensemble with other state-of-the-art algorithms on several datasets and the proposed method produce better accuracy in most cases. 相似文献
15.
Linear discriminant analysis (LDA) often suffers from the small sample size problem when dealing with high-dimensional face data. Random subspace can effectively solve this problem by random sampling on face features. However, it remains a problem how to construct an optimal random subspace for discriminant analysis and perform the most efficient discriminant analysis on the constructed random subspace. In this paper, we propose a novel framework, random discriminant analysis (RDA), to handle this problem. Under the most suitable situation of the principal subspace, the optimal reduced dimension of the face sample is discovered to construct a random subspace where all the discriminative information in the face space is distributed in the two principal subspaces of the within-class and between-class matrices. Then we apply Fisherface and direct LDA, respectively, to the two principal subspaces for simultaneous discriminant analysis. The two sets of discriminant analysis features from dual principal subspaces are first combined at the feature level, and then all the random subspaces are further integrated at the decision level. With the discriminating information fusion at the two levels, our method can take full advantage of useful discriminant information in the face space. Extensive experiments on different face databases demonstrate its performance. 相似文献
16.
In this paper, a measure of competence based on random classification (MCR) for classifier ensembles is presented. The measure selects dynamically (i.e. for each test example) a subset of classifiers from the ensemble that perform better than a random classifier. Therefore, weak (incompetent) classifiers that would adversely affect the performance of a classification system are eliminated. When all classifiers in the ensemble are evaluated as incompetent, the classification accuracy of the system can be increased by using the random classifier instead. Theoretical justification for using the measure with the majority voting rule is given. Two MCR based systems were developed and their performance was compared against six multiple classifier systems using data sets taken from the UCI Machine Learning Repository and Ludmila Kuncheva Collection. The systems developed had typically the highest classification accuracies regardless of the ensemble type used (homogeneous or heterogeneous). 相似文献
17.
Imbalanced streaming data is commonly encountered in real-world data mining and machine learning applications, and has attracted much attention in recent years. Both imbalanced data and streaming data in practice are normally encountered together; however, little research work has been studied on the two types of data together. In this paper, we propose a multi-window based ensemble learning method for the classification of imbalanced streaming data. Three types of windows are defined to store the current batch of instances, the latest minority instances, and the ensemble classifier. The ensemble classifier consists of a set of latest sub-classifiers, and the instances employed to train each sub-classifier. All sub-classifiers are weighted prior to predicting the class labels of newly arriving instances, and new sub-classifiers are trained only when the precision is below a predefined threshold. Extensive experiments on synthetic datasets and real-world datasets demonstrate that the new approach can efficiently and effectively classify imbalanced streaming data, and generally outperforms existing approaches. 相似文献
18.
In kernel-based nonlinear subspace (KNS) methods, the subspace dimensions have a strong influence on the performance of the subspace classifier. In order to get a high classification accuracy, a large dimension is generally required. However, if the chosen subspace dimension is too large, it leads to a low performance due to the overlapping of the resultant subspaces and, if it is too small, it increases the classification error due to the poor resulting approximation. The most common approach is of an ad hoc nature, which selects the dimensions based on the so-called cumulative proportion computed from the kernel matrix for each class. We propose a new method of systematically and efficiently selecting optimal or near-optimal subspace dimensions for KNS classifiers using a search strategy and a heuristic function termed the overlapping criterion. The rationale for this function has been motivated in the body of the paper. The task of selecting optimal subspace dimensions is reduced to find the best ones from a given problem-domain solution space using this criterion as a heuristic function. Thus, the search space can be pruned to very efficiently find the best solution. Our experimental results demonstrate that the proposed mechanism selects the dimensions efficiently without sacrificing the classification accuracy. 相似文献
19.
In this paper, a novel spectral-spatial hyperspectral image classification method has been proposed by designing hierarchical subspace switch ensemble learning algorithm. First, the hyperspectral images are processed by fast bilateral filtering to get the spatial features. The spectral features and spatial features are combined to form the initial feature set. Second, Hierarchical instance learning based on iterative means clustering method is designed to obtain hierarchical instance space. Third, random subspace method (RSM) is used for sampling the features and samples, thereby forming multiple sub sample set. After that, semi-supervised learning (S 2L) is applied to choose test samples for improving classification performance without touching the class labels. Then, micro noise linear dimension reduction (mNLDR) is used for dimension reduction. Afterwards, ensemble multiple kernels SVM(EMK_SVM) are used for stable classification results. Finally, final classification results are obtained by combining classification results with voting strategy. Experimental results on real hyperspectral scenes demonstrate that the proposed method can effectively improve the classification performance apparently. 相似文献
20.
Pattern Analysis and Applications - Classification is one of the most important topics in machine learning. However, most of these works focus on the two-class classification (i.e., classification... 相似文献
|