共查询到20条相似文献,搜索用时 0 毫秒
1.
In many important application domains such as text categorization, biomolecular analysis, scene classification and medical diagnosis, examples are naturally associated with more than one class label, giving rise to multi-label classification problems. This fact has led, in recent years, to a substantial amount of research on feature selection methods that allow the identification of relevant and informative features for multi-label classification. However, the methods proposed for this task are scattered in the literature, with no common framework to describe them and to allow an objective comparison. Here, we revisit a categorization of existing multi-label classification methods and, as our main contribution, we provide a comprehensive survey and novel categorization of the feature selection techniques that have been created for the multi-label classification setting. We conclude this work with concrete suggestions for future research in multi-label feature selection which have been derived from our categorization and analysis. 相似文献
2.
This paper proposed a novel feature selection method that includes a self-representation loss function, a graph regularization term and an \({l_{2,1}}\)-norm regularization term. Different from traditional least square loss function which focuses on achieving the minimal regression error between the class labels and their corresponding predictions, the proposed self-representation loss function pushes to represent each feature with a linear combination of its relevant features, aim at effectively selecting representative features and ensuring the robustness to outliers. The graph regularization terms include two kinds of inherent information, i.e., the relationship between samples (the sample–sample relation for short) and the relationship between features (the feature–feature relation for short). The feature–feature relation reflects the similarity between two features and preserves the relation into the coefficient matrix, while the sample–sample relation reflects the similarity between two samples and preserves the relation into the coefficient matrix. The \({l_{2,1}}\)-norm regularization term is used to conduct feature selection, aim at selecting the features, which satisfies the characteristics mentioned above. Furthermore, we put forward a new optimization method to solve our objective function. Finally, we feed reduced data into support vector machine (SVM) to conduct classification on real datasets. The experimental results showed that the proposed method has a better performance comparing with state-of-the-art methods, such as k nearest neighbor, ridge regression, SVM and so on. 相似文献
3.
In multi-label learning, the training set is made up of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances. In this paper, this learning problem is addressed by using a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances. Feature selection mechanisms are incorporated into Mlnb to improve its performance. Firstly, feature extraction techniques based on principal component analysis are applied to remove irrelevant and redundant features. After that, feature subset selection techniques based on genetic algorithms are used to choose the most appropriate subset of features for prediction. Experiments on synthetic and real-world data show that Mlnb achieves comparable performance to other well-established multi-label learning algorithms. 相似文献
6.
Previous spectral feature selection methods generate the similarity graph via ignoring the negative effect of noise and redundancy of the original feature space, and ignoring the association between graph matrix learning and feature selection, so that easily producing suboptimal results. To address these issues, this paper joints graph learning and feature selection in a framework to obtain optimal selected performance. More specifically, we use the least square loss function and an ? 2,1-norm regularization to remove the effect of noisy and redundancy features, and use the resulting local correlations among the features to dynamically learn a graph matrix from a low-dimensional space of original data. Experimental results on real data sets show that our method outperforms the state-of-the-art feature selection methods for classification tasks. 相似文献
7.
In multi-label classification, examples can be associated with multiple labels simultaneously. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label classification problems. The binary relevance approach is one of these methods, where the multi-label learning task is decomposed into several independent binary classification problems, one for each label in the set of labels, and the final labels for each example are determined by aggregating the predictions from all binary classifiers. However, this approach fails to consider any dependency among the labels. Aiming to accurately predict label combinations, in this paper we propose a simple approach that enables the binary classifiers to discover existing label dependency by themselves. An experimental study using decision trees, a kernel method as well as Naïve Bayes as base-learning techniques shows the potential of the proposed approach to improve the multi-label classification performance. 相似文献
8.
Applied Intelligence - Multi-label learning has widely applied in machine learning and data mining. The purpose of feature selection is to select an approximately optimal feature subset to... 相似文献
9.
Feature selection plays an important role in classifying systems such as neural networks (NNs). We use a set of attributes which are relevant, irrelevant or redundant and from the viewpoint of managing a dataset which can be huge, reducing the number of attributes by selecting only the relevant ones is desirable. In doing so, higher performances with lower computational effort is expected. In this paper, we propose two feature selection algorithms. The limitation of mutual information feature selector (MIFS) is analyzed and a method to overcome this limitation is studied. One of the proposed algorithms makes more considered use of mutual information between input attributes and output classes than the MIFS. What is demonstrated is that the proposed method can provide the performance of the ideal greedy selection algorithm when information is distributed uniformly. The computational load for this algorithm is nearly the same as that of MIFS. In addition, another feature selection algorithm using the Taguchi method is proposed. This is advanced as a solution to the question as to how to identify good features with as few experiments as possible. The proposed algorithms are applied to several classification problems and compared with MIFS. These two algorithms can be combined to complement each other's limitations. The combined algorithm performed well in several experiments and should prove to be a useful method in selecting features for classification problems. 相似文献
10.
Recent developments in texture classification have shown that the proper integration of texture methods from different families leads to significant improvements in terms of classification rate compared to the use of a single family of texture methods. In order to reduce the computational burden of that integration process, a selection stage is necessary. In general, a large number of feature selection techniques have been proposed. However, a specific texture feature selection must be typically applied given a particular set of texture patterns to be classified. This paper describes a new texture feature selection algorithm that is independent of specific classification problems/applications and thus must only be run once given a set of available texture methods. The proposed application-independent selection scheme has been evaluated and compared to previous proposals on both Brodatz compositions and complex real images. 相似文献
11.
In this paper. we present the MIFS-C variant of the mutual information feature-selection algorithms. We present an algorithm
to find the optimal value of the redundancy parameter, which is a key parameter in the MIFS-type algorithms. Furthermore,
we present an algorithm that speeds up the execution time of all the MIFS variants. Overall, the presented MIFS-C has comparable
classification accuracy (in some cases even better) compared with other MIFS algorithms, while its running time is faster.
We compared this feature selector with other feature selectors, and found that it performs better in most cases. The MIFS-C
performed especially well for the breakeven and F-measure because the algorithm can be tuned to optimise these evaluation measures.
Jan Bakus received the B.A.Sc. and M.A.Sc. degrees in electrical engineering from the University of Waterloo, Waterloo, ON, Canada,
in 1996 and 1998, respectively, and Ph.D. degree in systems design engineering in 2005. He is currently working at Maplesoft,
Waterloo, ON, Canada as an applications engineer, where he is responsible for the development of application specific toolboxes
for the Maple scientific computing software.
His research interests are in the area of feature selection for text classification, text classification, text clustering,
and information retrieval. He is the recipient of the Carl Pollock Fellowship award from the University of Waterloo and the
Datatel Scholars Foundation scholarship from Datatel.
Mohamed S. Kamel holds a Ph.D. in computer science from the University of Toronto, Canada. He is at present Professor and Director of the
Pattern Analysis and Machine Intelligence Laboratory in the Department of Electrical and Computing Engineering, University
of Waterloo, Canada. Professor Kamel holds a Canada Research Chair in Cooperative Intelligent Systems.
Dr. Kamel's research interests are in machine intelligence, neural networks and pattern recognition with applications in robotics
and manufacturing. He has authored and coauthored over 200 papers in journals and conference proceedings, 2 patents and numerous
technical and industrial project reports. Under his supervision, 53 Ph.D. and M.A.Sc. students have completed their degrees.
Dr. Kamel is a member of ACM, AAAI, CIPS and APEO and has been named s Fellow of IEEE (2005). He is the editor-in-chief of
the International Journal of Robotics and Automation, Associate Editor of the IEEE SMC, Part A, the International Journal
of Image and Graphics, Pattern Recognition Letters and is a member of the editorial board of the Intelligent Automation and
Soft Computing. He has served as a consultant to many Companies, including NCR, IBM, Nortel, VRP and CSA. He is a member of
the board of directors and cofounder of Virtek Vision International in Waterloo. 相似文献
12.
The classification of functional or high-dimensional data requires to select a reduced subset of features among the initial set, both to help fighting the curse of dimensionality and to help interpreting the problem and the model. The mutual information criterion may be used in that context, but it suffers from the difficulty of its estimation through a finite set of samples. Efficient estimators are not designed specifically to be applied in a classification context, and thus suffer from further drawbacks and difficulties. This paper presents an estimator of mutual information that is specifically designed for classification tasks, including multi-class ones. It is combined to a recently published stopping criterion in a traditional forward feature selection procedure. Experiments on both traditional benchmarks and on an industrial functional classification problem show the added value of this estimator. 相似文献
13.
A graph-based approach to document classification is described in this paper. The graph representation offers the advantage that it allows for a much more expressive document encoding than the more standard bag of words/phrases approach, and consequently gives an improved classification accuracy. Document sets are represented as graph sets to which a weighted graph mining algorithm is applied to extract frequent subgraphs, which are then further processed to produce feature vectors (one per document) for classification. Weighted subgraph mining is used to ensure classification effectiveness and computational efficiency; only the most significant subgraphs are extracted. The approach is validated and evaluated using several popular classification algorithms together with a real world textual data set. The results demonstrate that the approach can outperform existing text classification algorithms on some dataset. When the size of dataset increased, further processing on extracted frequent features is essential. 相似文献
14.
Neural Computing and Applications - The multi-label classification problem involves finding a multi-valued decision function that predicts an instance to a vector of binary classes. Two methods are... 相似文献
15.
The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has often been overlooked in the literature due to the perceived inadequacy of not directly modelling label correlations. Most current methods invest considerable complexity to model interdependencies between labels. This paper shows that binary relevance-based methods have much to offer, and that high predictive performance can be obtained without impeding scalability to large datasets. We exemplify this with a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity. We extend this approach further in an ensemble framework. An extensive empirical evaluation covers a broad range of multi-label datasets with a variety of evaluation metrics. The results illustrate the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity. 相似文献
16.
Ensemble methods have been shown to be an effective tool for solving multi-label classification tasks. In the RAndom k-labELsets (RAKEL) algorithm, each member of the ensemble is associated with a small randomly-selected subset of k labels. Then, a single label classifier is trained according to each combination of elements in the subset. In this paper we adopt a similar approach, however, instead of randomly choosing subsets, we select the minimum required subsets of k labels that cover all labels and meet additional constraints such as coverage of inter-label correlations. Construction of the cover is achieved by formulating the subset selection as a minimum set covering problem (SCP) and solving it by using approximation algorithms. Every cover needs only to be prepared once by offline algorithms. Once prepared, a cover may be applied to the classification of any given multi-label dataset whose properties conform with those of the cover. The contribution of this paper is two-fold. First, we introduce SCP as a general framework for constructing label covers while allowing the user to incorporate cover construction constraints. We demonstrate the effectiveness of this framework by proposing two construction constraints whose enforcement produces covers that improve the prediction performance of random selection by achieving better coverage of labels and inter-label correlations. Second, we provide theoretical bounds that quantify the probabilities of random selection to produce covers that meet the proposed construction criteria. The experimental results indicate that the proposed methods improve multi-label classification accuracy and stability compared to the RAKEL algorithm and to other state-of-the-art algorithms. 相似文献
17.
In applications of algorithms, feature selection has got much attention of researchers, due to its ability to overcome the curse of dimensionality, reduce computational costs, increase the performance of the subsequent classification algorithm and output the results with better interpretability. To remove the redundant and noisy features from original feature set, we define local density and discriminant distance for each feature vector, wherein local density is used for measuring the representative ability of each feature vector, and discriminant distance is used for measuring the redundancy and similarity between features. Based on the above two quantities, the decision graph score is proposed as the evaluation criterion of unsupervised feature selection. The method is intuitive and simple, and its performances are evaluated in the data classification experiments. From statistical tests on the averaged classification accuracies over 16 real-life dataset, it is observed that the proposed method obtains better or comparable ability of discriminant feature selection in 98% of the cases, compared with the state-of-the-art methods. 相似文献
18.
Feature selection is an important preprocessing step for building efficient, generalizable and interpretable classifiers on high dimensional data sets. Given the assumption on the sufficient labelled samples, the Markov Blanket provides a complete and sound solution to the selection of optimal features, by exploring the conditional independence relationships among the features. In real-world applications, unfortunately, it is usually easy to get unlabelled samples, but expensive to obtain the corresponding accurate labels on the samples. This leads to the potential waste of valuable classification information buried in unlabelled samples.In this paper, we propose a new BAyesian Semi-SUpervised Method, or BASSUM in short, to exploit the values of unlabelled samples on classification feature selection problem. Generally speaking, the inclusion of unlabelled samples helps the feature selection algorithm on (1) pinpointing more specific conditional independence tests involving fewer variable features and (2) improving the robustness of individual conditional independence tests with additional statistical information. Our experimental results show that BASSUM enhances the efficiency of traditional feature selection methods and overcomes the difficulties on redundant features in existing semi-supervised solutions. 相似文献
19.
Pixel-based texture classifiers and segmenters are typically based on the combination of texture feature extraction methods that belong to a single family (e.g., Gabor filters). However, combining texture methods from different families has proven to produce better classification results both quantitatively and qualitatively. Given a set of multiple texture feature extraction methods from different families, this paper presents a new texture feature selection scheme that automatically determines a reduced subset of methods whose integration produces classification results comparable to those obtained when all the available methods are integrated, but with a significantly lower computational cost. Experiments with both Brodatz and real outdoor images show that the proposed selection scheme is more advantageous than well-known general purpose feature selection algorithms applied to the same problem. 相似文献
20.
This paper proposes a framework for selecting the Laplacian eigenvalues of 3D shapes that are more relevant for shape characterization
and classification. We demonstrate the redundancy of the information coded by the shape spectrum and discuss the shape characterization
capability of the selected eigenvalues. The feature selection methods used to demonstrate our claim are the AdaBoost algorithm
and Support Vector Machine. The efficacy of the selection is shown by comparing the results of the selected eigenvalues on
shape characterization and classification with those related to the first k eigenvalues, by varying k over the cardinality of the spectrum. Our experiments, which have been performed on 3D objects represented either as triangle
meshes or point clouds, show that working directly with point clouds provides classification results that are comparable with
respect to those related to surface-based representations. Finally, we discuss the stability of the computation of the Laplacian
spectrum to matrix perturbations. 相似文献
|