首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Feature selection is an important preprocessing step for dealing with high dimensional data. In this paper, we propose a novel unsupervised feature selection method by embedding a subspace learning regularization (i.e., principal component analysis (PCA)) into the sparse feature selection framework. Specifically, we select informative features via the sparse learning framework and consider preserving the principal components (i.e., the maximal variance) of the data at the same time, such that improving the interpretable ability of the feature selection model. Furthermore, we propose an effective optimization algorithm to solve the proposed objective function which can achieve stable optimal result with fast convergence. By comparing with five state-of-the-art unsupervised feature selection methods on six benchmark and real-world datasets, our proposed method achieved the best result in terms of classification performance.  相似文献   

3.
Qian  Youcheng  Yin  Xueyan  Gao  Wei 《Multimedia Tools and Applications》2019,78(23):33593-33615
Multimedia Tools and Applications - Feature selection aims to select the optimal feature subset which can reduce time complexity, save storage space and improve the performances of various tasks....  相似文献   

4.
Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.  相似文献   

5.
Exploratory data analysis methods are essential for getting insight into data. Identifying the most important variables and detecting quasi-homogenous groups of data are problems of interest in this context. Solving such problems is a difficult task, mainly due to the unsupervised nature of the underlying learning process. Unsupervised feature selection and unsupervised clustering can be successfully approached as optimization problems by means of global optimization heuristics if an appropriate objective function is considered. This paper introduces an objective function capable of efficiently guiding the search for significant features and simultaneously for the respective optimal partitions. Experiments conducted on complex synthetic data suggest that the function we propose is unbiased with respect to both the number of clusters and the number of features.  相似文献   

6.
This paper proposes a novel unsupervised feature selection method by jointing self-representation and subspace learning. In this method, we adopt the idea of self-representation and use all the features to represent each feature. A Frobenius norm regularization is used for feature selection since it can overcome the over-fitting problem. The Locality Preserving Projection (LPP) is used as a regularization term as it can maintain the local adjacent relations between data when performing feature space transformation. Further, a low-rank constraint is also introduced to find the effective low-dimensional structures of the data, which can reduce the redundancy. Experimental results on real-world datasets verify that the proposed method can select the most discriminative features and outperform the state-of-the-art unsupervised feature selection methods in terms of classification accuracy, standard deviation, and coefficient of variation.  相似文献   

7.
Toward unsupervised correlation preserving discretization   总被引:2,自引:0,他引:2  
Discretization is a crucial preprocessing technique used for a variety of data warehousing and mining tasks. In this paper, we present a novel PCA-based unsupervised algorithm for the discretization of continuous attributes in multivariate data sets. The algorithm leverages the underlying correlation structure in the data set to obtain the discrete intervals and ensures that the inherent correlations are preserved. Previous efforts on this problem are largely supervised and consider only piecewise correlation among attributes. We consider the correlation among continuous attributes and, at the same time, also take into account the interactions between continuous and categorical attributes. Our approach also extends easily to data sets containing missing values. We demonstrate the efficacy of the approach on real data sets and as a preprocessing step for both classification and frequent itemset mining tasks. We show that the intervals are meaningful and can uncover hidden patterns in data. We also show that large compression factors can be obtained on the discretized data sets. The approach is task independent, i.e., the same discretized data set can be used for different data mining tasks. Thus, the data sets can be discretized, compressed, and stored once and can be used again and again.  相似文献   

8.
Dimensionality reduction is an important and challenging task in machine learning and data mining. Feature selection and feature extraction are two commonly used techniques for decreasing dimensionality of the data and increasing efficiency of learning algorithms. Specifically, feature selection realized in the absence of class labels, namely unsupervised feature selection, is challenging and interesting. In this paper, we propose a new unsupervised feature selection criterion developed from the viewpoint of subspace learning, which is treated as a matrix factorization problem. The advantages of this work are four-fold. First, dwelling on the technique of matrix factorization, a unified framework is established for feature selection, feature extraction and clustering. Second, an iterative update algorithm is provided via matrix factorization, which is an efficient technique to deal with high-dimensional data. Third, an effective method for feature selection with numeric data is put forward, instead of drawing support from the discretization process. Fourth, this new criterion provides a sound foundation for embedding kernel tricks into feature selection. With this regard, an algorithm based on kernel methods is also proposed. The algorithms are compared with four state-of-the-art feature selection methods using six publicly available datasets. Experimental results demonstrate that in terms of clustering results, the proposed two algorithms come with better performance than the others for almost all datasets we experimented with here.  相似文献   

9.
The optimum finite set of linear observables for discriminating two Gaussian stochastic processes is derived using classical methods and distribution function theory. The results offer a new, accurate information-theoretic strategy and are superior to well-known conventional methods using statistical distance measures.  相似文献   

10.
Principal components analysis (PCA) is probably the best-known approach to unsupervised dimensionality reduction. However, axes of the lower-dimensional space, ie., principal components (PCs), are a set of new variables carrying no clear physical meanings. Thus, interpretation of results obtained in the lower-dimensional PCA space and data acquisition for test samples still involve all of the original measurements. To deal with this problem, we develop two algorithms to link the physically meaningless PCs back to a subset of original measurements. The main idea of the algorithms is to evaluate and select feature subsets based on their capacities to reproduce sample projections on principal axes. The strength of the new algorithms is that the computaion complexity involved is significantly reduced, compared with the data structural similarity-based feature evaluation.  相似文献   

11.
Pattern Analysis and Applications - Search-based methods that use matrix- or vector-based representations of the dataset are commonly employed to solve the problem of feature selection. These...  相似文献   

12.

In hyperspectral image (HSI) analysis, high-dimensional data may contain noisy, irrelevant and redundant information. To mitigate the negative effect from these information, feature selection is one of the useful solutions. Unsupervised feature selection is a data preprocessing technique for dimensionality reduction, which selects a subset of informative features without using any label information. Different from the linear models, the autoencoder is formulated to nonlinearly select informative features. The adjacency matrix of HSI can be constructed to extract the underlying relationship between each data point, where the latent representation of original data can be obtained via matrix factorization. Besides, a new feature representation can be also learnt from the autoencoder. For a same data matrix, different feature representations should consistently share the potential information. Motivated by these, in this paper, we propose a latent representation learning based autoencoder feature selection (LRLAFS) model, where the latent representation learning is used to steer feature selection for the autoencoder. To solve the proposed model, we advance an alternative optimization algorithm. Experimental results on three HSI datasets confirm the effectiveness of the proposed model.

  相似文献   

13.
An algebraic approach to feature interactions   总被引:1,自引:0,他引:1  
The various approaches proposed to provide communication between CAD systems and process planning systems share the major problem that, due to geometric interactions among features, there may be several equally valid sets of manufacturable features describing the same part, and different sets of features may differ in their manufacturability. Thus, to produce a good process plan-or, in some cases, any plan at ll-it may be necessary to interpret the part as a different set of features than the one initially obtained from the CAD model. This is addressed using an algebra of features. Given a set of features describing a machinable part, other equally valid interpretations of the part can be produced by performing operations in the algebra. This will enable automated process planning systems to examine these interpretations in order to see which one is most appropriate for use in manufacturing. The feature algebra has been implemented for a restricted domain and integrated with the Protosolid solid modeling system and the EFHA process planning system  相似文献   

14.
A Bayesian approach to joint feature selection and classifier design   总被引:5,自引:0,他引:5  
This paper adopts a Bayesian approach to simultaneously learn both an optimal nonlinear classifier and a subset of predictor variables (or features) that are most relevant to the classification task. The approach uses heavy-tailed priors to promote sparsity in the utilization of both basis functions and features; these priors act as regularizers for the likelihood function that rewards good classification on the training data. We derive an expectation- maximization (EM) algorithm to efficiently compute a maximum a posteriori (MAP) point estimate of the various parameters. The algorithm is an extension of recent state-of-the-art sparse Bayesian classifiers, which in turn can be seen as Bayesian counterparts of support vector machines. Experimental comparisons using kernel classifiers demonstrate both parsimonious feature selection and excellent classification accuracy on a range of synthetic and benchmark data sets.  相似文献   

15.
Neural Computing and Applications - Classification problems such as gene expression array analysis, text processing of Internet document, combinatorial chemistry, software defect prediction and...  相似文献   

16.
Mobile context modeling is a process of recognizing and reasoning about contexts and situations in a mobile environment, which is critical for the success of context-aware mobile services. While there are prior works on mobile context modeling, the use of unsupervised learning techniques for mobile context modeling is still under-explored. Indeed, unsupervised techniques have the ability to learn personalized contexts, which are difficult to be predefined. To that end, in this paper, we propose an unsupervised approach to modeling personalized contexts of mobile users. Along this line, we first segment the raw context data sequences of mobile users into context sessions where a context session contains a group of adjacent context records which are mutually similar and usually reflect the similar contexts. Then, we exploit two methods for mining personalized contexts from context sessions. The first method is to cluster context sessions and then to extract the frequent contextual feature-value pairs from context session clusters as contexts. The second method leverages topic models to learn personalized contexts in the form of probabilistic distributions of raw context data from the context sessions. Finally, experimental results on real-world data show that the proposed approach is efficient and effective for mining personalized contexts of mobile users.  相似文献   

17.
This paper presents a novel feature selection approach for backpropagation neural networks (NNs). Previously, a feature selection technique known as the wrapper model was shown effective for decision trees induction. However, it is prohibitively expensive when applied to real-world neural net training characterized by large volumes of data and many feature choices. Our approach incorporates a weight analysis-based heuristic called artificial neural net input gain measurement approximation (ANNIGMA) to direct the search in the wrapper model and allows effective feature selection feasible for neural net applications. Experimental results on standard datasets show that this approach can efficiently reduce the number of features while maintaining or even improving the accuracy. We also report two successful applications of our approach in the helicopter maintenance applications.  相似文献   

18.
Axiomatic approach to feature subset selection based on relevance   总被引:7,自引:0,他引:7  
Relevance has traditionally been linked with feature subset selection, but formalization of this link has not been attempted. In this paper, we propose two axioms for feature subset selection-sufficiency axiom and necessity axiom-based on which this link is formalized: The expected feature subset is the one which maximizes relevance. Finding the expected feature subset turns out to be NP-hard. We then devise a heuristic algorithm to find the expected subset which has a polynomial time complexity. The experimental results show that the algorithm finds good enough subset of features which, when presented to C4.5, results in better prediction accuracy  相似文献   

19.
20.
Neural Computing and Applications - The “curse of dimensionality” issue caused by high-dimensional datasets not only imposes high memory and computational costs but also deteriorates...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号