首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 632 毫秒
1.
Normalized Mutual Information Feature Selection   总被引:6,自引:0,他引:6  
A filter method of feature selection based on mutual information, called normalized mutual information feature selection (NMIFS), is presented. NMIFS is an enhancement over Battiti's MIFS, MIFS-U, and mRMR methods. The average normalized mutual information is proposed as a measure of redundancy among features. NMIFS outperformed MIFS, MIFS-U, and mRMR on several artificial and benchmark data sets without requiring a user-defined parameter. In addition, NMIFS is combined with a genetic algorithm to form a hybrid filter/wrapper method called GAMIFS. This includes an initialization procedure and a mutation operator based on NMIFS to speed up the convergence of the genetic algorithm. GAMIFS overcomes the limitations of incremental search algorithms that are unable to find dependencies between groups of features.   相似文献   

2.
分类分析中基于信息论准则的特征选取   总被引:3,自引:1,他引:2  
Feature selection aims to reduce the dimensionality of patterns for classificatory analysis by selecting the most informative instead of irrelevant and/or redundant features. In this study, two novel information-theoretic measures for feature ranking are presented: one is an improved formula to estimate the conditional mutual information between the candidate feature fi and the target class C given the subset of selected features S, i.e., I(C;fi|S), under the assumption that information of features is distributed uniformly; the other is a mutual information (MI) based constructive criterion that is able to capture both irrelevant and redundant input features under arbitrary distributions of information of features. With these two measures, two new feature selection algorithms, called the quadratic MI-based feature selection (QMIFS) approach and the MI-based constructive criterion (MICC) approach, respectively, are proposed, in which no parameters like β in Battiti's MIFS and (Kwak and Choi)'s MIFS-U methods need to be preset. Thus, the intractable problem of how to choose an appropriate value for β to do the tradeoff between the relevance to the target classes and the redundancy with the already-selected features is avoided completely. Experimental results demonstrate the good performances of QMIFS and MICC on both synthetic and benchmark data sets.  相似文献   

3.
Qinghua Hu  Jinfu Liu  Daren Yu 《Knowledge》2008,21(4):294-304
Feature subset selection presents a common challenge for the applications where data with tens or hundreds of features are available. Existing feature selection algorithms are mainly designed for dealing with numerical or categorical attributes. However, data usually comes with a mixed format in real-world applications. In this paper, we generalize Pawlak’s rough set model into δ neighborhood rough set model and k-nearest-neighbor rough set model, where the objects with numerical attributes are granulated with δ neighborhood relations or k-nearest-neighbor relations, while objects with categorical features are granulated with equivalence relations. Then the induced information granules are used to approximate the decision with lower and upper approximations. We compute the lower approximations of decision to measure the significance of attributes. Based on the proposed models, we give the definition of significance of mixed features and construct a greedy attribute reduction algorithm. We compare the proposed algorithm with others in terms of the number of selected features and classification performance. Experiments show the proposed technique is effective.  相似文献   

4.
Element size transitioning in the construction of spatial meshes for finite element models is often controlled by biasing the concentration of nodes, towards one end or the other, along each of a set of curves in the model. A simple, common and efficient scheme to implement such nodal concentration biasing along a given curve is to require that the nodal spacings δi be (sequence) terms biδ0 of a geometric series. Current practice takes the parameter value b, or its equivalent, as an independent input, so that the initial nodal spacing δ0 must be a computed output. This is the most straightforward approach, but the lack of direct control over the value δ0 is a significant shortcoming. In an element size transitioning scenario, δ0 is often a parameter for which the model builder/analyst has independent quantitative information. It may represent the a priori known thickness of a thin bond or weld, for example. A more rational choice for these cases, proposed by this paper, is a scheme for which δ0 is an independent input parameter instead of b. The parameter b is computed by a convergence-guaranteed algorithm for which the existence of b as a single-valued function of its input is proven.  相似文献   

5.
Feature selection plays an important role in pattern recognition systems. In this study, we explored the problem of selecting effective heart rate variability (HRV) features for recognizing congestive heart failure (CHF) based on mutual information (MI). The MI-based greedy feature selection approach proposed by Battiti was adopted in the study. The mutual information conditioned by the first-selected feature was used as a criterion for feature selection. The uniform distribution assumption was used to reduce the computational load. And, a logarithmic exponent weighting was added to model the relative importance of the MI with respect to the number of the already-selected features. The CHF recognition system contained a feature extractor that generated four categories, totally 50, features from the input HRV sequences. The proposed feature selector, termed UCMIFS, proceeded to select the most effective features for the succeeding support vector machine (SVM) classifier. Prior to feature selection, the 50 features produced a high accuracy of 96.38%, which confirmed the representativeness of the original feature set. The performance of the UCMIFS selector was demonstrated to be superior to the other MI-based feature selectors including MIFS-U, CMIFS, and mRMR. When compared to the other outstanding selectors published in the literature, the proposed UCMIFS outperformed them with as high as 97.59% accuracy in recognizing CHF using only 15 features. The results demonstrated the advantage of using the recruited features in characterizing HRV sequences for CHF recognition. The UCMIFS selector further improved the efficiency of the recognition system with substantially lowered feature dimensions and elevated recognition rate.  相似文献   

6.
Input feature selection for classification problems   总被引:30,自引:0,他引:30  
Feature selection plays an important role in classifying systems such as neural networks (NNs). We use a set of attributes which are relevant, irrelevant or redundant and from the viewpoint of managing a dataset which can be huge, reducing the number of attributes by selecting only the relevant ones is desirable. In doing so, higher performances with lower computational effort is expected. In this paper, we propose two feature selection algorithms. The limitation of mutual information feature selector (MIFS) is analyzed and a method to overcome this limitation is studied. One of the proposed algorithms makes more considered use of mutual information between input attributes and output classes than the MIFS. What is demonstrated is that the proposed method can provide the performance of the ideal greedy selection algorithm when information is distributed uniformly. The computational load for this algorithm is nearly the same as that of MIFS. In addition, another feature selection algorithm using the Taguchi method is proposed. This is advanced as a solution to the question as to how to identify good features with as few experiments as possible. The proposed algorithms are applied to several classification problems and compared with MIFS. These two algorithms can be combined to complement each other's limitations. The combined algorithm performed well in several experiments and should prove to be a useful method in selecting features for classification problems.  相似文献   

7.
A method for the analysis and discrimination of textures, based on C-calculus, is proposed.

The concepts of C-space and C-transform of a digitized signal are introduced as simple tools, well suited to the visualization of the filtering properties of C-calculus: C-filters are thus also defined and the “natural” role they seem to play in problems concerning textures is investigated in some practical instances. In particular, C-transforms of some sample textures are provided and texture classification in C-space is performed. Discrimination of objects against textural background is obtained by C-filtering, in an inherently parallel fashion.

The philosophy involved in this approach is finally briefly discussed in a comparison with some extant methods.  相似文献   


8.
A novel feature selection method using the concept of mutual information (MI) is proposed in this paper. In all MI based feature selection methods, effective and efficient estimation of high-dimensional MI is crucial. In this paper, a pruned Parzen window estimator and the quadratic mutual information (QMI) are combined to address this problem. The results show that the proposed approach can estimate the MI in an effective and efficient way. With this contribution, a novel feature selection method is developed to identify the salient features one by one. Also, the appropriate feature subsets for classification can be reliably estimated. The proposed methodology is thoroughly tested in four different classification applications in which the number of features ranged from less than 10 to over 15000. The presented results are very promising and corroborate the contribution of the proposed feature selection methodology.  相似文献   

9.
核选择直接影响核方法的性能.已有高斯核选择方法的计算复杂度为Ω(n2),阻碍大规模核方法的发展.文中提出高斯核选择的线性性质检测方法,不同于传统核选择方法,询问复杂度为O(ln(1/δ)/ 2),计算复杂度独立于样本规模.文中首先给出函数 线性水平的定义,证明可使用 线性水平近似度量一个函数与线性函数类之间的距离,并以此为基础提出高斯核选择的线性性质检测准则.然后应用该准则,在随机傅里叶特征空间中有效评价并选择高斯核.理论分析与实验表明,应用性质检测以实现高斯核选择的方法有效可行.  相似文献   

10.
中文情感分析中的一个重要问题就是情感倾向分类,情感特征选择是基于机器学习的情感倾向分类的前提和基础,其作用在于通过剔除无关或冗余的特征来降低特征集的维数。提出一种将Lasso算法与过滤式特征选择方法相结合的情感混合特征选择方法:先利用Lasso惩罚回归算法对原始特征集合进行筛选,得出冗余度较低的情感分类特征子集;再对特征子集引入CHI,MI,IG等过滤方法来评价候选特征词与文本类别的依赖性权重,并据此剔除候选特征词中相关性较低的特征词;最终,在使用高斯核函数的SVM分类器上对比所提方法与DF,MI,IG和CHI在不同特征词数量下的分类效果。在微博短文本语料库上进行了实验,结果表明所提算法具有有效性和高效性;并且在特征子集维数小于样本数量时,提出的混合方法相比DF,MI,IG和CHI的特征选择效果都有一定程度的改善;通过对比识别率和查全率可以发现,Lasso-MI方法相比MI以及其他过滤方法更为有效。  相似文献   

11.
To approach a simple game Δ2 of P and E = {E1, E2} with no a priori evaders' role assignment and the payoff equal to the distance to one evader at an instant of catching another, we introduce a concept of casting and study the games Δ1,2 and Δ2,1 for preassigned and Δp2 for open-loop casting procedures. Since Δp2 is reduced to Δ1,2 or Δ2,1 which, in turn, are distinguished only by their notations, we focus attention mainly on Δ1,2. According to the tenet of transition, Δ1,2 is divided into a concatenation of Δ1,2b (basic) and Δ1,2a (auxiliary) games that model the problem before and after the first instant of E1 capture. The games Δ1,2a, Δ1,2b, Δ1,2 are studied one after another with use of the Isaacs' approach extended by Berkowitz, Breakwell, Bernhard et al.  相似文献   

12.
This paper presents an iris recognition system using automatic scale selection algorithm for iris feature extraction. The proposed system first filters the given iris image adopting a bank of Laplacian of Gaussian (LoG) filters with many different scales and computes the normalized response of every filter. The parameter γ used to normalize the filter responses, is derived by analyzing the scale-space maxima of the blob feature detector responses. Then the maxima normalized response over scales for each point are selected together as the optimal filter outputs of the given iris image and the binary codes for iris feature representation are achieved by encoding these optimal outputs through a zero threshold. Comparison experiment results clearly demonstrate an efficient performance of the proposed algorithm.  相似文献   

13.
Light use efficiency (LUE) algorithms are a potentially effective approach to monitoring global net primary production (NPP) using satellite-borne sensors such as the Moderate Resolution Imaging Spectroradiometer (MODIS). However, these algorithms are applied at relatively coarse spatial resolutions (≥1 km), which may subsume significant heterogeneity in vegetation LUE (εn, g MJ−1) and, hence, introduce error. To examine the effects of spatial heterogeneity on a LUE algorithm, imagery from the Advanced Very High Resolution Radiometer (AVHRR) at ≈1-km resolution was used to implement a LUE approach for NPP estimation over a 25-km2 area of corn (Zea mays L.) and soybean (Glycine max Merr.) in central Illinois, USA. Results from several εn formulations were compared with a NPP reference surface based on measured NPPs and a high spatial resolution land cover surface derived from Landsat ETM+. Determination of εn based on measurements of biomass production and monitoring of absorbed photosynthetically active radiation (APAR) revealed that εn of soybean was 68% of that for corn. When a LUE algorithm for estimating NPP was implemented in the study area using the assumption of homogeneous cropland and the εn for corn, the estimate for total biomass production was 126% of that from the NPP reference surface. Because of counteracting errors, total biomass production using the soybean εn was closer (86%) to that from the NPP reference surface. Retention of high spatial resolution land cover to assign εn resulted in a total NPP very similar to the reference NPP because differences in leaf phenology between the crop types were small except early in the growing season. These results suggest several alternative approaches to accounting for land cover heterogeneity in εn when implementing LUE algorithms at coarse resolution.  相似文献   

14.
Feature selection is one of the fundamental problems in pattern recognition and data mining. A popular and effective approach to feature selection is based on information theory, namely the mutual information of features and class variable. In this paper we compare eight different mutual information-based feature selection methods. Based on the analysis of the comparison results, we propose a new mutual information-based feature selection method. By taking into account both the class-dependent and class-independent correlation among features, the proposed method selects a less redundant and more informative set of features. The advantage of the proposed method over other methods is demonstrated by the results of experiments on UCI datasets (Asuncion and Newman, 2010 [1]) and object recognition.  相似文献   

15.
特征选择旨在选择待处理数据中最具代表性的特征,降低特征空间的维度.文中提出基于局部判别约束的半监督特征选择方法,充分利用已标记样本和未标记样本训练特征选择模型,并借助相邻数据间的局部判别信息提高模型的准确度,引入l2,1约束提高特征之间可区分度,避免噪声干扰.最后通过实验验证文中方法的有效性.  相似文献   

16.
基于图的无监督特征选择方法大多选择投影矩阵的l2,1范数稀疏正则化代替非凸的l2,0范数约束,然而l2,1范数正则化方法根据得分高低逐个选择特征,未考虑特征的相关性.因此,文中提出基于l2,0范数稀疏性和模糊相似性的图优化无监督组特征选择方法,同时进行图学习和特征选择.在图学习中,学习具有精确连通分量的相似性矩阵.在特征选择过程中,约束投影矩阵的非零行个数,实现组特征选择.为了解决非凸的l2,0范数约束,引入元素为0或1的特征选择向量,将l2,0范数约束问题转化为0-1整数规划问题,并将离散的0-1整数约束转化为2个连续约束进行求解.最后,引入模糊相似性因子,拓展文中方法,学习更精确的图结构.在真实数据集上的实验表明文中方法的有效性.  相似文献   

17.
Feature selection is one of the major problems in an intrusion detection system (IDS) since there are additional and irrelevant features. This problem causes incorrect classification and low detection rate in those systems. In this article, four feature selection algorithms, named multivariate linear correlation coefficient (MLCFS), feature grouping based on multivariate mutual information (FGMMI), feature grouping based on linear correlation coefficient (FGLCC), and feature grouping based on pairwise MI, are proposed to solve this problem. These algorithms are implementable in any IDS. Both linear and nonlinear measures are used in the sense that the correlation coefficient and the multivariate correlation coefficient are linear, whereas the MI and the multivariate MI are nonlinear. Least Square Support Vector Machine (LS-SVM) as an intrusion classifier is used to evaluate the selected features. Experimental results on the KDDcup99 and Network Security Laboratory-Knowledge Discovery and Data Mining (NSL) datasets showed that the proposed feature selection methods have a higher detection and accuracy and lower false-positive rate compared with the pairwise linear correlation coefficient and the pairwise MI employed in several previous algorithms.  相似文献   

18.
比较研究了与类别信息无关的文档频率和与类别信息有关的信息增益、互信息和χ2统计特征选择方法,在此基础上分析了以往直接组合这两类特征选择方法的弊端,并提出基于相关性和冗余度的联合特征选择算法。该算法将文档频率方法分别与信息增益、互信息和χ2统计方法联合进行特征选择,旨在删除冗余特征,并保留有利于分类的特征,从而提高文本情感分类效果。实验结果表明,该联合特征选择方法具有较好的性能,并且能够有效降低特征维数。  相似文献   

19.
In this paper a general theorem on |A,δ|k-summability methods has been proved. This theorem includes, as a special case, a known result in [E. Savas, Factors for |A|k Summability of infinite series, Comput. Math. Appl. 53 (2007) 1045–1049].  相似文献   

20.
基于K-均值聚类的无监督的特征选择方法   总被引:11,自引:1,他引:10  
模式识别方法首先要解决的一个问题就是特征选择,目前许多方法考虑了有监督学习的特征选择问题,对无监督学习的特征选择问题却涉及得很少。依据特征对分类结果的影响和特征之间相关性分析两个方面提出了一种基于K-均值聚类方法的特征选择算法,用于无监督学习的特征选择问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号