首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
葛倩  张光斌  张小凤 《计算机应用》2022,42(10):3046-3053
为解决特征选择ReliefF算法在利用欧氏距离选取近邻样本过程中,算法稳定性差以及选取的特征子集分类准确率低的问题,提出了一种利用最大信息系数(MIC)作为近邻样本选择标准的MICReliefF算法;同时,以支持向量机(SVM)模型的分类准确率作为评价指标,并多次寻优,以自动确定其最优特征子集,从而实现MICReliefF算法与分类模型的交互优化,即MICReliefF-SVM自动特征选择算法。在多个UCI公开数据集上对MICReliefF-SVM算法的性能进行了验证。实验结果表明,MICReliefF-SVM自动特征选择算法不仅可以筛除更多的冗余特征,而且可以选择出具有良好稳定性和泛化能力的特征子集。与随机森林(RF)、最大相关最小冗余(mRMR)、相关性特征选择(CFS)等经典的特征选择算法相比,MICReliefF-SVM算法具有更高的分类准确率。  相似文献   

2.
Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.  相似文献   

3.
A novel feature selection method based on normalized mutual information   总被引:6,自引:6,他引:0  
In this paper, a novel feature selection method based on the normalization of the well-known mutual information measurement is presented. Our method is derived from an existing approach, the max-relevance and min-redundancy (mRMR) approach. We, however, propose to normalize the mutual information used in the method so that the domination of the relevance or of the redundancy can be eliminated. We borrow some commonly used recognition models including Support Vector Machine (SVM), k-Nearest-Neighbor (kNN), and Linear Discriminant Analysis (LDA) to compare our algorithm with the original (mRMR) and a recently improved version of the mRMR, the Normalized Mutual Information Feature Selection (NMIFS) algorithm. To avoid data-specific statements, we conduct our classification experiments using various datasets from the UCI machine learning repository. The results confirm that our feature selection method is more robust than the others with regard to classification accuracy.  相似文献   

4.
Mutual information (MI) is used in feature selection to evaluate two key-properties of optimal features, the relevance of a feature to the class variable and the redundancy of similar features. Conditional mutual information (CMI), i.e., MI of the candidate feature to the class variable conditioning on the features already selected, is a natural extension of MI but not so far applied due to estimation complications for high dimensional distributions. We propose the nearest neighbor estimate of CMI, appropriate for high-dimensional variables, and build an iterative scheme for sequential feature selection with a termination criterion, called CMINN. We show that CMINN is equivalent to feature selection MI filters, such as mRMR and MaxiMin, in the presence of solely single feature effects, and more appropriate for combined feature effects. We compare CMINN to mRMR and MaxiMin on simulated datasets involving combined effects and confirm the superiority of CMINN in selecting the correct features (indicated also by the termination criterion) and giving best classification accuracy. The application to ten benchmark databases shows that CMINN obtains the same or higher classification accuracy compared to mRMR and MaxiMin at a smaller cardinality of the selected feature subset.  相似文献   

5.
由于基因表达数据高维度、高噪声、小样本的特点,基因选择一直是肿瘤分类的一大挑战。为了提高肿瘤分类的精度,同时保证基因选择的效率,提出一种结合Relief-F和CART决策树的自适应粒子群优化(APSO)算法(R-C-APSO)。该方法首先利用Relief-F快速过滤大量无关基因和噪声,缩小基因选择范围;然后以CART决策树为适应度函数,用APSO算法对基因进行最终搜索。通过6个数据集的分析实验,实验结果表明,R-C-APSO拥有较高的分类精度和较快的基因选择速度,且具有良好的稳定性。  相似文献   

6.
摘 要:本文针对两期高分辨率遥感影像提出一种结合邻域相关影像(NCI)和最大相关性最小冗余性特征选择(mRMR)的面向对象变化检测方法。为了验证该方法的有效性,本研究设计了3组对比实验:(1)比较只使用mRMR特征选择与未使用mRMR特征选择的效果;(2)比较使用NCI与mRMR特征选择相结合与只使用NCI的效果;(3)比较使用NCI与mRMR特征选择相结合与只使用mRMR特征选择的效果。实验结果表明,使用NCI与mRMR特征选择相结合的变化检测效果要优于只使用NCI或是只使用mRMR特征选择的效果,更优于两者都不使用的效果。 关键字:遥感影像,高分辨率,面向对象,变化检测,邻域相关影像,特征选择  相似文献   

7.
Churn prediction in telecom has recently gained substantial interest of stakeholders because of associated revenue losses. Predicting telecom churners, is a challenging problem due to the enormous nature of the telecom datasets. In this regard, we propose an intelligent churn prediction system for telecom by employing efficient feature extraction technique and ensemble method. We have used Random Forest, Rotation Forest, RotBoost and DECORATE ensembles in combination with minimum redundancy and maximum relevance (mRMR), Fisher’s ratio and F-score methods to model the telecom churn prediction problem. We have observed that mRMR method returns most explanatory features compared to Fisher’s ratio and F-score, which significantly reduces the computations and help ensembles in attaining improved performance. In comparison to Random Forest, Rotation Forest and DECORATE, RotBoost in combination with mRMR features attains better prediction performance on the standard telecom datasets. The better performance of RotBoost ensemble is largely attributed to the rotation of feature space, which enables the base classifier to learn different aspects of the churners and non-churners. Moreover, the Adaboosting process in RotBoost also contributes in achieving higher prediction accuracy by handling hard instances. The performance evaluation is conducted on standard telecom datasets using AUC, sensitivity and specificity based measures. Simulation results reveal that the proposed approach based on RotBoost in combination with mRMR features (CP-MRB) is effective in handling high dimensionality of the telecom datasets. CP-MRB offers higher accuracy in predicting churners and thus is quite prospective in modeling the challenging problems of customer churn prediction in telecom.  相似文献   

8.
为了成功将土地覆盖进行分类,选择合适的特征是至关重要的。针对利用MODIS数据进行宏观土地覆盖的分类问题,对三种典型的特征选择方法进行了比较研究。研究结果表明:分支定界法(BB)最适合于该土地覆盖分类问题,与此同时,ReliefF和mRMR方法在目标应用中的精度非常接近。研究结果同样表明进行特征选择是非常必要的,它不仅能够大大地降低计算复杂度,而且分类精度能够保持不变,甚至更高。  相似文献   

9.
This paper presents an effective mutual information-based feature selection approach for EMG-based motion classification task. The wavelet packet transform (WPT) is exploited to decompose the four-class motion EMG signals to the successive and non-overlapped sub-bands. The energy characteristic of each sub-band is adopted to construct the initial full feature set. For reducing the computation complexity, mutual information (MI) theory is utilized to get the reduction feature set without compromising classification accuracy. Compared with the extensively used feature reduction methods such as principal component analysis (PCA), sequential forward selection (SFS) and backward elimination (BE) etc., the comparison experiments demonstrate its superiority in terms of time-consuming and classification accuracy. The proposed strategy of feature extraction and reduction is a kind of filter-based algorithms which is independent of the classifier design. Considering the classification performance will vary with the different classifiers, we make the comparison between the fuzzy least squares support vector machines (LS-SVMs) and the conventional widely used neural network classifier. In the further study, our experiments prove that the combination of MI-based feature selection and SVM techniques outperforms other commonly used combination, for example, the PCA and NN. The experiment results show that the diverse motions can be identified with high accuracy by the combination of MI-based feature selection and SVM techniques.

Compared with the combination of PCA-based feature selection and the classical Neural Network classifier, superior performance of the proposed classification scheme illustrates the potential of the SVM techniques combined with WPT and MI in EMG motion classification.  相似文献   


10.
The evaluation of feature selection methods for text classification with small sample datasets must consider classification performance, stability, and efficiency. It is, thus, a multiple criteria decision-making (MCDM) problem. Yet there has been few research in feature selection evaluation using MCDM methods which considering multiple criteria. Therefore, we use MCDM-based methods for evaluating feature selection methods for text classification with small sample datasets. An experimental study is designed to compare five MCDM methods to validate the proposed approach with 10 feature selection methods, nine evaluation measures for binary classification, seven evaluation measures for multi-class classification, and three classifiers with 10 small datasets. Based on the ranked results of the five MCDM methods, we make recommendations concerning feature selection methods. The results demonstrate the effectiveness of the used MCDM-based method in evaluating feature selection methods.  相似文献   

11.
Credit risk assessment has been a crucial issue as it forecasts whether an individual will default on loan or not. Classifying an applicant as good or bad debtor helps lender to make a wise decision. The modern data mining and machine learning techniques have been found to be very useful and accurate in credit risk predictive capability and correct decision making. Classification is one of the most widely used techniques in machine learning. To increase prediction accuracy of standalone classifiers while keeping overall cost to a minimum, feature selection techniques have been utilized, as feature selection removes redundant and irrelevant attributes from dataset. This paper initially introduces Bolasso (Bootstrap-Lasso) which selects consistent and relevant features from pool of features. The consistent feature selection is defined as robustness of selected features with respect to changes in dataset Bolasso generated shortlisted features are then applied to various classification algorithms like Random Forest (RF), Support Vector Machine (SVM), Naïve Bayes (NB) and K-Nearest Neighbors (K-NN) to test its predictive accuracy. It is observed that Bolasso enabled Random Forest algorithm (BS-RF) provides best results forcredit risk evaluation. The classifiers are built on training and test data partition (70:30) of three datasets (Lending Club’s peer to peer dataset, Kaggle’s Bank loan status dataset and German credit dataset obtained from UCI). The performance of Bolasso enabled various classification algorithms is then compared with that of other baseline feature selection methods like Chi Square, Gain Ratio, ReliefF and stand-alone classifiers (no feature selection method applied). The experimental results shows that Bolasso provides phenomenal stability of features when compared with stability of other algorithms. Jaccard Stability Measure (JSM) is used to assess stability of feature selection methods. Moreover BS-RF have good classification accuracy and is better than other methods in terms of AUC and Accuracy resulting in effectively improving the decision making process of lenders.  相似文献   

12.
特征选择是处理高维大数据常用的降维手段,但其中牵涉到的多个彼此冲突的特征子集评价目标难以平衡。为综合考虑特征选择中多种子集评价方式间的折中,优化子集性能,提出一种基于子集评价多目标优化的特征选择框架,并重点对多目标粒子群优化(MOPSO)在特征子集评价中的应用进行了研究。该框架分别根据子集的稀疏度、分类能力和信息损失度设计多目标优化函数,继而基于多目标优化算法进行特征权值向量寻优,并通过权值向量Pareto解集膝点选取确定最优向量,最终实现基于权值向量排序的特征选择。设计实验对比了基于多目标粒子群优化算法的特征选择(FS_MOPSO)与四种经典方法的性能,多个数据集上的结果表明,FS_MOPSO在低维空间表现出更高的分类精度,并保证了更少的信息损失。  相似文献   

13.
ABSTRACT

Network Intrusion Detection System (NIDS) is often used to classify network traffic in an attempt to protect computer systems from various network attacks. A major component for building an efficient intrusion detection system is the preprocessing of network traffic and identification of essential features which is essential for building robust classifier. In this study, a NIDS based on deep learning model optimized with rule-based hybrid feature selection is proposed. The architecture is divided into three phases namely: hybrid feature selection, rule evaluation and detection. Several search methods and attribute evaluators were combined for features selection to enhance experimentation and comparison. The results obtained showed that the number of selected features will not affect the detection accuracy of the feature selection algorithms, but directly proportional to the performance of the base classifier. Results from the performance comparison proved that the proposed method outperforms other related methods with reduction of false alarm rate, high accuracy rate, reduced training and testing time of 1.2%, 98.8%, 7.17s and 3.11s, respectively. Finally, the simulation experiments on standard evaluation metrics showed that the proposed method is suitable for attack classification in NIDS.  相似文献   

14.
雍菊亚  周忠眉 《计算机应用》2020,40(12):3478-3484
针对在特征选择中选取特征较多时造成的去冗余过程很复杂的问题,以及一些特征需与其他特征组合后才会与标签有较强相关度的问题,提出了一种基于互信息的多级特征选择算法(MI_MLFS)。首先,根据特征与标签的相关度,将特征分为强相关、次强相关和其他特征;其次,选取强相关特征后,在次强相关特征中,选取冗余度较低的特征;最后,选取能增强已选特征集合与标签相关度的特征。在15组数据集上,将MI_MLFS与ReliefF、最大相关最小冗余(mRMR)算法、基于联合互信息(JMI)算法、条件互信息最大化准则(CMIM)算法和双输入对称关联(DISR)算法进行对比实验,结果表明MI_MLFS在支持向量机(SVM)和分类回归树(CART)分类器上分别有13组和11组数据集获得了最高的分类准确率。相较多种经典特征选择方法,MI_MLFS算法有更好的分类性能。  相似文献   

15.
雍菊亚  周忠眉 《计算机应用》2005,40(12):3478-3484
针对在特征选择中选取特征较多时造成的去冗余过程很复杂的问题,以及一些特征需与其他特征组合后才会与标签有较强相关度的问题,提出了一种基于互信息的多级特征选择算法(MI_MLFS)。首先,根据特征与标签的相关度,将特征分为强相关、次强相关和其他特征;其次,选取强相关特征后,在次强相关特征中,选取冗余度较低的特征;最后,选取能增强已选特征集合与标签相关度的特征。在15组数据集上,将MI_MLFS与ReliefF、最大相关最小冗余(mRMR)算法、基于联合互信息(JMI)算法、条件互信息最大化准则(CMIM)算法和双输入对称关联(DISR)算法进行对比实验,结果表明MI_MLFS在支持向量机(SVM)和分类回归树(CART)分类器上分别有13组和11组数据集获得了最高的分类准确率。相较多种经典特征选择方法,MI_MLFS算法有更好的分类性能。  相似文献   

16.
With the advent of technology in various scientific fields, high dimensional data are becoming abundant. A general approach to tackle the resulting challenges is to reduce data dimensionality through feature selection. Traditional feature selection approaches concentrate on selecting relevant features and ignoring irrelevant or redundant ones. However, most of these approaches neglect feature interactions. On the other hand, some datasets have imbalanced classes, which may result in biases towards the majority class. The main goal of this paper is to propose a novel feature selection method based on the interaction information (II) to provide higher level interaction analysis and improve the search procedure in the feature space. In this regard, an evolutionary feature subset selection algorithm based on interaction information is proposed, which consists of three stages. At the first stage, candidate features and candidate feature pairs are identified using traditional feature weighting approaches such as symmetric uncertainty (SU) and bivariate interaction information. In the second phase, candidate feature subsets are formed and evaluated using multivariate interaction information. Finally, the best candidate feature subsets are selected using dominant/dominated relationships. The proposed algorithm is compared with some other feature selection algorithms including mRMR, WJMI, IWFS, IGFS, DCSF, IWFS, K_OFSD, WFLNS, Information Gain and ReliefF in terms of the number of selected features, classification accuracy, F-measure and algorithm stability using three different classifiers, namely KNN, NB, and CART. The results justify the improvement of classification accuracy and the robustness of the proposed method in comparison with the other approaches.  相似文献   

17.
Features selection is the process of choosing the relevant subset of features from the high-dimensional dataset to enhance the performance of the classifier. Much research has been carried out in the present world for the process of feature selection. Algorithms such as Naïve Bayes (NB), decision tree, and genetic algorithm are applied to the high-dimensional dataset to select the relevant features and also to increase the computational speed. The proposed model presents a solution for selection of features using ensemble classifier algorithms. The proposed algorithm is the combination of minimum redundancy and maximum relevance (mRMR) and forest optimization algorithm (FOA). Ensemble-based algorithms such as support vector machine (SVM), K-nearest neighbor (KNN), and NB is further used to enhance the performance of the classifier algorithm. The mRMR-FOA is used to select the relevant features from the various datasets and 21% to 24% improvement is recorded in the feature selection. The ensemble classifier algorithms further improves the performance of the algorithm and provides accuracy of 96%.  相似文献   

18.
This article proposes two novel feature selection methods for dimension reduction according to max–min-associated indices derived from Cramer's V-test coefficient. The proposed methods incrementally select features simultaneously satisfying the criteria of a statistically maximal association (A) between target labels and features and a minimal association (R) among selected features with respect to Cramer's V-test value. Two indices are developed by different combinations of the A and R conditions. One index is to maximize A/R and the other is to maximize A–λR, which are referred to as the MMAIQ and MMAIS methods, respectively. Since the proposed feature selection algorithms are feature filter methods, how to determine the best number of features is another important issue. This article adopts an information lost criterion by measuring the variation between χ2 and β statistics to optimize the number of features selected associated with the Gaussian maximal likelihood classifier (GMLC). To validate the proposed methods, experiments are conducted with both a hyperspectral image data set and a high spatial resolution image data set. The results demonstrate that the proposed methods can provide an effective tool for feature selection and improve classification accuracy significantly. Furthermore, the proposed methods with well-known feature selection methods, i.e. mutual information-based max-dependency criterion (mRMR) and sequential forward selection (SFS), are evaluated and compared. The experiments demonstrate that the proposed methods can offer better results in terms of kappa coefficient and overall classification accuracy measurements.  相似文献   

19.
Over the last few years, the dimensionality of datasets involved in data mining applications has increased dramatically. In this situation, feature selection becomes indispensable as it allows for dimensionality reduction and relevance detection. The research proposed in this paper broadens the scope of feature selection by taking into consideration not only the relevance of the features but also their associated costs. A new general framework is proposed, which consists of adding a new term to the evaluation function of a filter feature selection method so that the cost is taken into account. Although the proposed methodology could be applied to any feature selection filter, in this paper the approach is applied to two representative filter methods: Correlation-based Feature Selection (CFS) and Minimal-Redundancy-Maximal-Relevance (mRMR), as an example of use. The behavior of the proposed framework is tested on 17 heterogeneous classification datasets, employing a Support Vector Machine (SVM) as a classifier. The results of the experimental study show that the approach is sound and that it allows the user to reduce the cost without compromising the classification error.  相似文献   

20.
在高维数据如图像数据、基因数据、文本数据等的分析过程中,当样本存在冗余特征时会大大增加问题分析复杂难度,因此在数据分析前从中剔除冗余特征尤为重要。基于互信息(MI)的特征选择方法能够有效地降低数据维数,提高分析结果精度,但是,现有方法在特征选择过程中评判特征是否冗余的标准单一,无法合理排除冗余特征,最终影响分析结果。为此,提出一种基于最大联合条件互信息的特征选择方法(MCJMI)。MCJMI选择特征时考虑整体联合互信息与条件互信息两个因素,两个因素融合增强特征选择约束。在平均预测精度方面,MCJMI与信息增益(IG)、最小冗余度最大相关性(mRMR)特征选择相比提升了6个百分点;与联合互信息(JMI)、最大化联合互信息(JMIM)相比提升了2个百分点;与LW向前搜索方法(SFS-LW)相比提升了1个百分点。在稳定性方面,MCJMI稳定性达到了0.92,优于JMI、JMIM、SFS-LW方法。实验结果表明MCJMI能够有效地提高特征选择的准确率与稳定性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号