期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Feature subset selection in large dimensionality domains

Iffat A. Gheyas^{Author Vitae} Leslie S. Smith Author Vitae 《Pattern recognition》2010,43(1):5-13

Searching for an optimal feature subset from a high dimensional feature space is known to be an NP-complete problem. We present a hybrid algorithm, SAGA, for this task. SAGA combines the ability to avoid being trapped in a local minimum of simulated annealing with the very high rate of convergence of the crossover operator of genetic algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks. We compare the performance over time of SAGA and well-known algorithms on synthetic and real datasets. The results show that SAGA outperforms existing algorithms. 相似文献

2.

Feature subset selection and feature ranking for multivariate time series 总被引：4，自引：0，他引：4

Yoon H. Yang K. Shahabi C. 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(9):1186-1198

Feature subset selection (FSS) is a known technique to preprocess the data before performing any data mining tasks, e.g., classification and clustering. FSS provides both cost-effective predictors and a better understanding of the underlying process that generated the data. We propose a family of novel unsupervised methods for feature subset selection from multivariate time series (MTS) based on common principal component analysis, termed CLeVer. Traditional FSS techniques, such as recursive feature elimination (RFE) and Fisher criterion (FC), have been applied to MTS data sets, e.g., brain computer interface (BCI) data sets. However, these techniques may lose the correlation information among features, while our proposed techniques utilize the properties of the principal component analysis to retain that information. In order to evaluate the effectiveness of our selected subset of features, we employ classification as the target data mining task. Our exhaustive experiments show that CLeVer outperforms RFE, FC, and random selection by up to a factor of two in terms of the classification accuracy, while taking up to 2 orders of magnitude less processing time than RFE and FC. 相似文献

3.

Feature ranking and best feature subset using mutual information

Shuang?Cang Email author Derek?Partridge 《Neural computing & applications》2004,13(3):175-184

A new algorithm for ranking the input features and obtaining the best feature subset is developed and illustrated in this paper. The asymptotic formula for mutual information and the expectation maximisation (EM) algorithm are used to developing the feature selection algorithm in this paper. We not only consider the dependence between the features and the class, but also measure the dependence among the features. Even for noisy data, this algorithm still works well. An empirical study is carried out in order to compare the proposed algorithm with the current existing algorithms. The proposed algorithm is illustrated by application to a variety of problems. 相似文献

4.

Feature dimensionality reduction for the verification of handwritten numerals 总被引：2，自引：2，他引：2

P. Zhang T. D. Bui C. Y. Suen 《Pattern Analysis & Applications》2004,7(3):296-307

A novel method based on multi-modal discriminant analysis is proposed to reduce feature dimensionality. First, each class is divided into several clusters by the k-means algorithm. The optimal discriminant analysis is implemented by multi-modal mapping. Our method utilizes only those training samples on and near the effective decision boundary to generate a between-class scatter matrix, which requires less CPU time than other nonparametric discriminant analysis (NDA) approaches [Fukunaga and Mantock in IEEE Trans PAMI 5(6):671–677, 1983; Bressan and Vitria in Pattern Recognit Lett 24(5):2473–2749, 2003]. In addition, no prior assumptions about class and cluster densities are needed. In order to achieve a high verification performance of confusing handwritten numeral pairs, a hybrid feature extraction scheme is developed, which consists of a set of gradient-based wavelet features and a set of geometric features. Our proposed dimensionality reduction algorithm is used to congregate features, and it outperforms the principal component analysis (PCA) and other NDA approaches. Experiments proved that our proposed method could achieve a high feature compression performance without sacrificing its discriminant ability for classification. As a result, this new method can reduce artificial neural network (ANN) training complexity and make the ANN classifier more reliable. 相似文献

5.

Foraging theory for dimensionality reduction of clustered data

Luis Felipe Giraldo Fernando Lozano Nicanor Quijano 《Machine Learning》2011,82(1):71-90

We present a bioinspired algorithm which performs dimensionality reduction on datasets for visual exploration, under the assumption that they have a clustered structure. We formulate a decision-making strategy based on foraging theory, where a software agent is viewed as an animal, a discrete space as the foraging landscape, and objects representing points from the dataset as nutrients or prey items. We apply this algorithm to artificial and real databases, and show how a multi-agent system addresses the problem of mapping high-dimensional data into a two-dimensional space. 相似文献

6.

A fast approach for dimensionality reduction with image data

Mnica Daniel 《Pattern recognition》2005,38(12):2400-2408

An important objective in image analysis is dimensionality reduction. The most often used data-exploratory technique with this objective is principal component analysis, which performs a singular value decomposition on a data matrix of vectorized images. When considering an array data or tensor instead of a matrix, the high-order generalization of PCA for computing principal components offers multiple ways to decompose tensors orthogonally. As an alternative, we propose a new method based on the projection of the images as matrices and show that it leads to a better reconstruction of images than previous approaches. 相似文献

7.

Feature reduction and selection for EMG signal classification

Angkoon Phinyomark Pornchai PhukpattaranontChusak Limsakul 《Expert systems with applications》2012,39(8):7420-7431

Feature extraction is a significant method to extract the useful information which is hidden in surface electromyography (EMG) signal and to remove the unwanted part and interferences. To be successful in classification of the EMG signal, selection of a feature vector ought to be carefully considered. However, numerous studies of the EMG signal classification have used a feature set that have contained a number of redundant features. In this study, most complete and up-to-date thirty-seven time domain and frequency domain features have been proposed to be studied their properties. The results, which were verified by scatter plot of features, statistical analysis and classifier, indicated that most time domain features are superfluity and redundancy. They can be grouped according to mathematical property and information into four main types: energy and complexity, frequency, prediction model, and time-dependence. On the other hand, all frequency domain features are calculated based on statistical parameters of EMG power spectral density. Its performance in class separability viewpoint is not suitable for EMG recognition system. Recommendation of features to avoid the usage of redundant features for classifier in EMG signal classification applications is also proposed in this study. 相似文献

8.

Semi-supervised dimensionality reduction for analyzing high-dimensional data with constraints 总被引：1，自引：0，他引：1

Su YanAuthor Vitae Sofien Bouaziz^{Author Vitae} 《Neurocomputing》2012,76(1):114-124

In this paper, we present a novel semi-supervised dimensionality reduction technique to address the problems of inefficient learning and costly computation in coping with high-dimensional data. Our method named the dual subspace projections (DSP) embeds high-dimensional data in an optimal low-dimensional space, which is learned with a few user-supplied constraints and the structure of input data. The method projects data into two different subspaces respectively the kernel space and the original input space. Each projection is designed to enforce one type of constraints and projections in the two subspaces interact with each other to satisfy constraints maximally and preserve the intrinsic data structure. Compared to existing techniques, our method has the following advantages: (1) it benefits from constraints even when only a few are available; (2) it is robust and free from overfitting; and (3) it handles nonlinearly separable data, but learns a linear data transformation. As a conclusion, our method can be easily generalized to new data points and is efficient in dealing with large datasets. An empirical study using real data validates our claims so that significant improvements in learning accuracy can be obtained after the DSP-based dimensionality reduction is applied to high-dimensional data. 相似文献

9.

Distance-preserving projection of high-dimensional data for nonlinear dimensionality reduction 总被引：3，自引：0，他引：3

Yang L 《IEEE transactions on pattern analysis and machine intelligence》2004,26(9):1243-1246

A distance-preserving method is presented to map high-dimensional data sequentially to low-dimensional space. It preserves exact distances of each data point to its nearest neighbor and to some other near neighbors. Intrinsic dimensionality of data is estimated by examining the preservation of interpoint distances. The method has no user-selectable parameter. It can successfully project data when the data points are spread among multiple clusters. Results of experiments show its usefulness in projecting high-dimensional data. 相似文献

10.

Visualizing dimensionality reduction of systems biology data

Andreas Lehrmann Michael Huber Aydin C. Polatkan Albert Pritzkau Kay Nieselt 《Data mining and knowledge discovery》2013,27(1):146-165

One of the challenges in analyzing high-dimensional expression data is the detection of important biological signals. A common approach is to apply a dimension reduction method, such as principal component analysis. Typically, after application of such a method the data is projected and visualized in the new coordinate system, using scatter plots or profile plots. These methods provide good results if the data have certain properties which become visible in the new coordinate system but which were hard to detect in the original coordinate system. Often however, the application of only one method does not suffice to capture all important signals. Therefore several methods addressing different aspects of the data need to be applied. We have developed a framework for linear and non-linear dimension reduction methods within our visual analytics pipeline SpRay. This includes measures that assist the interpretation of the factorization result. Different visualizations of these measures can be combined with functional annotations that support the interpretation of the results. We show an application to high-resolution time series microarray data in the antibiotic-producing organism Streptomyces coelicolor as well as to microarray data measuring expression of cells with normal karyotype and cells with trisomies of human chromosomes 13 and 21. 相似文献

11.

Greedy column subset selection for large-scale data sets

Ahmed K. Farahat Ahmed Elgohary Ali Ghodsi Mohamed S. Kamel 《Knowledge and Information Systems》2015,45(1):1-34

相似文献

12.

Feature subset selection applied to model-free gait recognition

Y. Dupuis X. Savatier P. Vasseur 《Image and vision computing》2013

In this paper, we tackle the problem of gait recognition based on the model-free approach. Numerous methods exist; they all lead to high dimensional feature spaces. To address the problem of high dimensional feature space, we propose the use of the Random Forest algorithm to rank features' importance. In order to efficiently search throughout subspaces, we apply a backward feature elimination search strategy. Our first experiments are carried out on unknown covariate conditions. Our first results suggest that the selected features contribute to increase the CCR of different existing classification methods. Secondary experiments are performed on unknown covariate conditions and viewpoints. Inspired by the location of our first experiments' features, we proposed a simple mask. Experimental results demonstrate that the proposed mask gives satisfactory results for all angles of the probe and consequently is not view specific. We also show that our mask performs well when an uncooperative experimental setup is considered as compared to the state-of-the art methods. As a consequence, we propose a panoramic gait recognition framework on unknown covariate conditions. Our results suggest that panoramic gait recognition can be performed under unknown covariate conditions. Our approach can greatly reduce the complexity of the classification problem while achieving fair correct classification rates when gait is captured with unknown conditions. 相似文献

13.

Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing 总被引：3，自引：0，他引：3

Jun Yan Benyu Zhang Ning Liu Shuicheng Yan Qiansheng Cheng Fan W. Qiang Yang Xi W. Zheng Chen 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(3):320-333

Dimensionality reduction is an essential data preprocessing technique for large-scale and streaming data classification tasks. It can be used to improve both the efficiency and the effectiveness of classifiers. Traditional dimensionality reduction approaches fall into two categories: feature extraction and feature selection. Techniques in the feature extraction category are typically more effective than those in feature selection category. However, they may break down when processing large-scale data sets or data streams due to their high computational complexities. Similarly, the solutions provided by the feature selection approaches are mostly solved by greedy strategies and, hence, are not ensured to be optimal according to optimized criteria. In this paper, we give an overview of the popularly used feature extraction and selection algorithms under a unified framework. Moreover, we propose two novel dimensionality reduction algorithms based on the orthogonal centroid algorithm (OC). The first is an incremental OC (IOC) algorithm for feature extraction. The second algorithm is an orthogonal centroid feature selection (OCFS) method which can provide optimal solutions according to the OC criterion. Both are designed under the same optimization criterion. Experiments on Reuters Corpus Volume-1 data set and some public large-scale text data sets indicate that the two algorithms are favorable in terms of their effectiveness and efficiency when compared with other state-of-the-art algorithms. 相似文献

14.

A unified dimensionality reduction framework for semi-paired and semi-supervised multi-view data

Xiaohong Chen Songcan Chen Hui Xue Xudong Zhou 《Pattern recognition》2012,45(5):2005-2018

Canonical correlation analysis (CCA) is a popular and powerful dimensionality reduction method to analyze paired multi-view data. However, when facing semi-paired and semi-supervised multi-view data which widely exist in real-world problems, CCA usually performs poorly due to its requirement of data pairing between different views and un-supervision in nature. Recently, several extensions of CCA have been proposed, however, they just handle the semi-paired scenario by utilizing structure information in each view or just deal with semi-supervised scenario by incorporating the discriminant information. In this paper, we present a general dimensionality reduction framework for semi-paired and semi-supervised multi-view data which naturally generalizes existing related works by using different kinds of prior information. Based on the framework, we develop a novel dimensionality reduction method, termed as semi-paired and semi-supervised generalized correlation analysis (S²GCA). S²GCA exploits a small amount of paired data to perform CCA and at the same time, utilizes both the global structural information captured from the unlabeled data and the local discriminative information captured from the limited labeled data to compensate the limited pairedness. Consequently, S²GCA can find the directions which make not only maximal correlation between the paired data but also maximal separability of the labeled data. Experimental results on artificial and four real-world datasets show its effectiveness compared to the existing related dimensionality reduction methods. 相似文献

15.

A scalable supervised algorithm for dimensionality reduction on streaming data

Jun Yan Shuicheng Yan Qiang Yang Hua Li Wei-Ying Ma 《Information Sciences》2006,176(14):2042-2065

Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification problems because it is unsupervised and ignores valuable label information for classification. On the other hand, the performance of LDA is degraded when encountering limited available low-dimensional spaces and singularity problem. Recently, Maximum Margin Criterion (MMC) was proposed to overcome the shortcomings of PCA and LDA. Nevertheless, the original MMC algorithm could not satisfy the streaming data model to handle large-scale high-dimensional data set. Thus an effective, efficient and scalable approach is needed. In this paper, we propose a supervised incremental dimensionality reduction algorithm and its extension to infer adaptive low-dimensional spaces by optimizing the maximum margin criterion. Experimental results on a synthetic dataset and real datasets demonstrate the superior performance of our proposed algorithm on streaming data. 相似文献

16.

Feature subset selection for predicting the success of crowdfunding project campaigns

Ryoba Michael J. Qu Shaojian Zhou Yongyi 《Electronic Markets》2021,31(3):671-684

Electronic Markets - Statistics from crowdfunding platforms show that a small percent of crowdfunding projects succeed in securing funds. This makes project creators eager to know the probability... 相似文献

17.

Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition

Xuechuan WangAuthor Vitae Kuldip K. Paliwal Author Vitae 《Pattern recognition》2003,36(10):2429-2439

Feature extraction is an important component of a pattern recognition system. It performs two tasks: transforming input parameter vector into a feature vector and/or reducing its dimensionality. A well-defined feature extraction algorithm makes the classification process more effective and efficient. Two popular methods for feature extraction are linear discriminant analysis (LDA) and principal component analysis (PCA). In this paper, the minimum classification error (MCE) training algorithm (which was originally proposed for optimizing classifiers) is investigated for feature extraction. A generalized MCE (GMCE) training algorithm is proposed to mend the shortcomings of the MCE training algorithm. LDA, PCA, and MCE and GMCE algorithms extract features through linear transformation. Support vector machine (SVM) is a recently developed pattern classification algorithm, which uses non-linear kernel functions to achieve non-linear decision boundaries in the parametric space. In this paper, SVM is also investigated and compared to linear feature extraction algorithms. 相似文献

18.

Decision tree approach for classification and dimensionality reduction of electronic nose data 总被引：2，自引：0，他引：2

Jung Hwan Cho Pradeep U. KurupAuthor vitae 《Sensors and actuators. B, Chemical》2011,160(1):542

This paper presents a decision tree approach using two different tree models, C4.5 and CART, for use in the classification and dimensionality reduction of electronic nose (EN) data. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. The decision tree is proficient at both maintaining the role of dimensionality reduction and at organizing optimally sized classification trees, and therefore it could be a promising approach to analyze EN data. In the experiments conducted, six sensor response parameters were extracted from the dynamic sensor responses of each of the four metal oxide gas sensors. The six parameters observed were the rising time (T_r), falling time (T_f), total response time (T_t), normalized peak voltage change (y_p,n), normalized curve integral (C_I), and triangle area (T_A). One sensor parameter from each metal oxide sensor was used for the classification trees, and the best classification accuracy of 97.78% was achieved by CART using the C_I parameter. However, the accuracy of CART was improved using all of the sensor parameters as inputs to the classification tree. The improved results of CART, having an accuracy of 98.89%, was comparable to that of two popular classifiers, the multilayer perceptron (MLP) neural network and the fuzzy ARTMAP network (accuracy of 98.89%, and 100%, respectively). Furthermore, as a dimensionality reduction method the decision tree has shown a better discrimination accuracy of 100% for the MLP classifier and 98.89% for the fuzzy ARTMAP classifier as compared to those achieved with principle component analysis (PCA) giving 81.11% and 97.78%, and a variable selection method giving 92.22% and 93.33% (for the same MLP and fuzzy ARTMAP classifiers). Therefore, a decision tree could be a promising technique for a pattern recognition system for EN data in terms of two functions; as classifier which is an optimally organized classification tree, and as dimensionality reduction method for other pattern recognition techniques. 相似文献

19.

A nonlinear method for dimensionality reduction of data using reference nodes

E. V. Myasnikov 《Pattern Recognition and Image Analysis》2012,22(2):337-345

A nonlinear method for dimensionality reduction based on the hierarchical clusterization of data and the Sammon mapping is proposed in the work. An essential element is the use of lists of reference nodes created based on the results of the hierarchical clusterization of data in the original multidimensional space in the dimensionality reduction. The work quality of the proposed method has been analyzed for a number of feature systems extracted from digital images, as well as for collections of images of various volumes. 相似文献

20.

Feature subset selection for support vector machines through discriminative function pruning analysis 总被引：10，自引：0，他引：10

Mao K.Z. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2004,34(1):60-67

In many pattern classification applications, data are represented by high dimensional feature vectors, which induce high computational cost and reduce classification speed in the context of support vector machines (SVMs). To reduce the dimensionality of pattern representation, we develop a discriminative function pruning analysis (DFPA) feature subset selection method in the present study. The basic idea of the DFPA method is to learn the SVM discriminative function from training data using all input variables available first, and then to select feature subset through pruning analysis. In the present study, the pruning is implement using a forward selection procedure combined with a linear least square estimation algorithm, taking advantage of linear-in-the-parameter structure of the SVM discriminative function. The strength of the DFPA method is that it combines good characters of both filter and wrapper methods. Firstly, it retains the simplicity of the filter method avoiding training of a large number of SVM classifier. Secondly, it inherits the good performance of the wrapper method by taking the SVM classification algorithm into account. 相似文献