期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Erratum: correction to "unsupervised feature selection using feature similarity"

Mitra P. Pal S.K. 《IEEE transactions on pattern analysis and machine intelligence》2002,24(6):721-721

相似文献

2.

Self-representation and PCA embedding for unsupervised feature selection

Yonghua Zhu Xuejun Zhang Ruili Wang Wei Zheng Yingying Zhu 《World Wide Web》2018,21(6):1675-1688

Feature selection is an important preprocessing step for dealing with high dimensional data. In this paper, we propose a novel unsupervised feature selection method by embedding a subspace learning regularization (i.e., principal component analysis (PCA)) into the sparse feature selection framework. Specifically, we select informative features via the sparse learning framework and consider preserving the principal components (i.e., the maximal variance) of the data at the same time, such that improving the interpretable ability of the feature selection model. Furthermore, we propose an effective optimization algorithm to solve the proposed objective function which can achieve stable optimal result with fast convergence. By comparing with five state-of-the-art unsupervised feature selection methods on six benchmark and real-world datasets, our proposed method achieved the best result in terms of classification performance. 相似文献

3.

Robust inner product regularized unsupervised feature selection

Qian Youcheng Yin Xueyan Gao Wei 《Multimedia Tools and Applications》2019,78(23):33593-33615

Multimedia Tools and Applications - Feature selection aims to select the optimal feature subset which can reduce time complexity, save storage space and improve the performances of various tasks.... 相似文献

4.

Efficient greedy feature selection for unsupervised learning

Ahmed K. Farahat Ali Ghodsi Mohamed S. Kamel 《Knowledge and Information Systems》2013,35(2):285-310

Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection. 相似文献

5.

A unifying criterion for unsupervised clustering and feature selection

Mihaela Breaban^{Author Vitae} Henri Luchian Author Vitae 《Pattern recognition》2011,44(4):854-865

Exploratory data analysis methods are essential for getting insight into data. Identifying the most important variables and detecting quasi-homogenous groups of data are problems of interest in this context. Solving such problems is a difficult task, mainly due to the unsupervised nature of the underlying learning process. Unsupervised feature selection and unsupervised clustering can be successfully approached as optimization problems by means of global optimization heuristics if an appropriate objective function is considered. This paper introduces an objective function capable of efficiently guiding the search for significant features and simultaneously for the respective optimal partitions. Experiments conducted on complex synthetic data suggest that the function we propose is unbiased with respect to both the number of clusters and the number of features. 相似文献

6.

Joint self-representation and subspace learning for unsupervised feature selection

Ruili Wang Ming Zong 《World Wide Web》2018,21(6):1745-1758

This paper proposes a novel unsupervised feature selection method by jointing self-representation and subspace learning. In this method, we adopt the idea of self-representation and use all the features to represent each feature. A Frobenius norm regularization is used for feature selection since it can overcome the over-fitting problem. The Locality Preserving Projection (LPP) is used as a regularization term as it can maintain the local adjacent relations between data when performing feature space transformation. Further, a low-rank constraint is also introduced to find the effective low-dimensional structures of the data, which can reduce the redundancy. Experimental results on real-world datasets verify that the proposed method can select the most discriminative features and outperform the state-of-the-art unsupervised feature selection methods in terms of classification accuracy, standard deviation, and coefficient of variation. 相似文献

7.

Toward unsupervised correlation preserving discretization 总被引：2，自引：0，他引：2

Mehta S. Parthasarathy S. Hui Yang 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(9):1174-1185

Discretization is a crucial preprocessing technique used for a variety of data warehousing and mining tasks. In this paper, we present a novel PCA-based unsupervised algorithm for the discretization of continuous attributes in multivariate data sets. The algorithm leverages the underlying correlation structure in the data set to obtain the discrete intervals and ensures that the inherent correlations are preserved. Previous efforts on this problem are largely supervised and consider only piecewise correlation among attributes. We consider the correlation among continuous attributes and, at the same time, also take into account the interactions between continuous and categorical attributes. Our approach also extends easily to data sets containing missing values. We demonstrate the efficacy of the approach on real data sets and as a preprocessing step for both classification and frequent itemset mining tasks. We show that the intervals are meaningful and can uncover hidden patterns in data. We also show that large compression factors can be obtained on the discretized data sets. The approach is task independent, i.e., the same discretized data set can be used for different data mining tasks. Thus, the data sets can be discretized, compressed, and stored once and can be used again and again. 相似文献

8.

Subspace learning for unsupervised feature selection via matrix factorization

Shiping Wang Witold Pedrycz Qingxin Zhu William Zhu 《Pattern recognition》2015

Dimensionality reduction is an important and challenging task in machine learning and data mining. Feature selection and feature extraction are two commonly used techniques for decreasing dimensionality of the data and increasing efficiency of learning algorithms. Specifically, feature selection realized in the absence of class labels, namely unsupervised feature selection, is challenging and interesting. In this paper, we propose a new unsupervised feature selection criterion developed from the viewpoint of subspace learning, which is treated as a matrix factorization problem. The advantages of this work are four-fold. First, dwelling on the technique of matrix factorization, a unified framework is established for feature selection, feature extraction and clustering. Second, an iterative update algorithm is provided via matrix factorization, which is an efficient technique to deal with high-dimensional data. Third, an effective method for feature selection with numeric data is put forward, instead of drawing support from the discretization process. Fourth, this new criterion provides a sound foundation for embedding kernel tricks into feature selection. With this regard, an algorithm based on kernel methods is also proposed. The algorithms are compared with four state-of-the-art feature selection methods using six publicly available datasets. Experimental results demonstrate that in terms of clustering results, the proposed two algorithms come with better performance than the others for almost all datasets we experimented with here. 相似文献

9.

A unified approach to optimal feature selection

Salvatore D Morgera 《Pattern recognition letters》1983,2(2):61-68

The optimum finite set of linear observables for discriminating two Gaussian stochastic processes is derived using classical methods and distribution function theory. The results offer a new, accurate information-theoretic strategy and are superior to well-known conventional methods using statistical distance measures. 相似文献

10.

Identifying critical variables of principal components for unsupervised feature selection.

K Z Mao 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2005,35(2):339-344

Principal components analysis (PCA) is probably the best-known approach to unsupervised dimensionality reduction. However, axes of the lower-dimensional space, ie., principal components (PCs), are a set of new variables carrying no clear physical meanings. Thus, interpretation of results obtained in the lower-dimensional PCA space and data acquisition for test samples still involve all of the original measurements. To deal with this problem, we develop two algorithms to link the physically meaningless PCs back to a subset of original measurements. The main idea of the algorithms is to evaluate and select feature subsets based on their capacities to reproduce sample projections on principal axes. The strength of the new algorithms is that the computaion complexity involved is significantly reduced, compared with the data structural similarity-based feature evaluation. 相似文献

11.

An approach of feature selection using graph-theoretic heuristic and hill climbing

Goswami Saptarsi Das Amit Kumar Guha Priyanka Tarafdar Arunabha Chakraborty Sanjay Chakrabarti Amlan Chakraborty Basabi 《Pattern Analysis & Applications》2019,22(2):615-631

Pattern Analysis and Applications - Search-based methods that use matrix- or vector-based representations of the dataset are commonly employed to solve the problem of feature selection. These... 相似文献

12.

Latent representation learning based autoencoder for unsupervised feature selection in hyperspectral imagery

Wang Xinxin Wang Zhenyu Zhang Yongshan Jiang Xinwei Cai Zhihua 《Multimedia Tools and Applications》2022,81(9):12061-12075

In hyperspectral image (HSI) analysis, high-dimensional data may contain noisy, irrelevant and redundant information. To mitigate the negative effect from these information, feature selection is one of the useful solutions. Unsupervised feature selection is a data preprocessing technique for dimensionality reduction, which selects a subset of informative features without using any label information. Different from the linear models, the autoencoder is formulated to nonlinearly select informative features. The adjacency matrix of HSI can be constructed to extract the underlying relationship between each data point, where the latent representation of original data can be obtained via matrix factorization. Besides, a new feature representation can be also learnt from the autoencoder. For a same data matrix, different feature representations should consistently share the potential information. Motivated by these, in this paper, we propose a latent representation learning based autoencoder feature selection (LRLAFS) model, where the latent representation learning is used to steer feature selection for the autoencoder. To solve the proposed model, we advance an alternative optimization algorithm. Experimental results on three HSI datasets confirm the effectiveness of the proposed model.

相似文献

13.

An algebraic approach to feature interactions 总被引：1，自引：0，他引：1

Karinthi R.R. Nau D. 《IEEE transactions on pattern analysis and machine intelligence》1992,14(4):469-484

The various approaches proposed to provide communication between CAD systems and process planning systems share the major problem that, due to geometric interactions among features, there may be several equally valid sets of manufacturable features describing the same part, and different sets of features may differ in their manufacturability. Thus, to produce a good process plan-or, in some cases, any plan at ll-it may be necessary to interpret the part as a different set of features than the one initially obtained from the CAD model. This is addressed using an algebra of features. Given a set of features describing a machinable part, other equally valid interpretations of the part can be produced by performing operations in the algebra. This will enable automated process planning systems to examine these interpretations in order to see which one is most appropriate for use in manufacturing. The feature algebra has been implemented for a restricted domain and integrated with the Protosolid solid modeling system and the EFHA process planning system 相似文献

14.

A Bayesian approach to joint feature selection and classifier design 总被引：5，自引：0，他引：5

Krishnapuram B Hartemink AJ Carin L Figueiredo MA 《IEEE transactions on pattern analysis and machine intelligence》2004,26(9):1105-1111

This paper adopts a Bayesian approach to simultaneously learn both an optimal nonlinear classifier and a subset of predictor variables (or features) that are most relevant to the classification task. The approach uses heavy-tailed priors to promote sparsity in the utilization of both basis functions and features; these priors act as regularizers for the likelihood function that rewards good classification on the training data. We derive an expectation- maximization (EM) algorithm to efficiently compute a maximum a posteriori (MAP) point estimate of the various parameters. The algorithm is an extension of recent state-of-the-art sparse Bayesian classifiers, which in turn can be seen as Bayesian counterparts of support vector machines. Experimental comparisons using kernel classifiers demonstrate both parsimonious feature selection and excellent classification accuracy on a range of synthetic and benchmark data sets. 相似文献

15.

An approach for feature selection using local searching and global optimization techniques

Tiwari Sadhana Singh Birmohan Kaur Manpreet 《Neural computing & applications》2017,28(10):2915-2930

Neural Computing and Applications - Classification problems such as gene expression array analysis, text processing of Internet document, combinatorial chemistry, software defect prediction and... 相似文献

16.

An unsupervised approach to modeling personalized contexts of mobile users

Tengfei Bao Huanhuan Cao Enhong Chen Jilei Tian Hui Xiong 《Knowledge and Information Systems》2012,31(2):345-370

Mobile context modeling is a process of recognizing and reasoning about contexts and situations in a mobile environment, which is critical for the success of context-aware mobile services. While there are prior works on mobile context modeling, the use of unsupervised learning techniques for mobile context modeling is still under-explored. Indeed, unsupervised techniques have the ability to learn personalized contexts, which are difficult to be predefined. To that end, in this paper, we propose an unsupervised approach to modeling personalized contexts of mobile users. Along this line, we first segment the raw context data sequences of mobile users into context sessions where a context session contains a group of adjacent context records which are mutually similar and usually reflect the similar contexts. Then, we exploit two methods for mining personalized contexts from context sessions. The first method is to cluster context sessions and then to extract the frequent contextual feature-value pairs from context session clusters as contexts. The second method leverages topic models to learn personalized contexts in the form of probabilistic distributions of raw context data from the context sessions. Finally, experimental results on real-world data show that the proposed approach is efficient and effective for mining personalized contexts of mobile users. 相似文献

17.

The ANNIGMA-wrapper approach to fast feature selection for neuralnets

Chun-Nan Hsu Hung-Ju Huang Dietrich S. 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2002,32(2):207-212

This paper presents a novel feature selection approach for backpropagation neural networks (NNs). Previously, a feature selection technique known as the wrapper model was shown effective for decision trees induction. However, it is prohibitively expensive when applied to real-world neural net training characterized by large volumes of data and many feature choices. Our approach incorporates a weight analysis-based heuristic called artificial neural net input gain measurement approximation (ANNIGMA) to direct the search in the wrapper model and allows effective feature selection feasible for neural net applications. Experimental results on standard datasets show that this approach can efficiently reduce the number of features while maintaining or even improving the accuracy. We also report two successful applications of our approach in the helicopter maintenance applications. 相似文献

18.

Axiomatic approach to feature subset selection based on relevance 总被引：7，自引：0，他引：7

Hui Wang Bell D. Murtagh F. 《IEEE transactions on pattern analysis and machine intelligence》1999,21(3):271-277

Relevance has traditionally been linked with feature subset selection, but formalization of this link has not been attempted. In this paper, we propose two axioms for feature subset selection-sufficiency axiom and necessity axiom-based on which this link is formalized: The expected feature subset is the one which maximizes relevance. Finding the expected feature subset turns out to be NP-hard. We then devise a heuristic algorithm to find the expected subset which has a polynomial time complexity. The experimental results show that the algorithm finds good enough subset of features which, when presented to C4.5, results in better prediction accuracy 相似文献

19.

An unsupervised grid-based approach for clustering analysis 总被引：1，自引：0，他引：1

YUE ShiHong WANG JeenShing TAO Gao WANG HuaXiang 《中国科学:信息科学(英文版)》2010,(7):1345-1357

相似文献

20.

Fast unsupervised feature selection based on the improved binary ant system and mutation strategy

Manbari Zhaleh Akhlaghian Tab Fardin Salavati Chiman 《Neural computing & applications》2019,31(9):4963-4982

Neural Computing and Applications - The “curse of dimensionality” issue caused by high-dimensional datasets not only imposes high memory and computational costs but also deteriorates... 相似文献