首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most recent research efforts on feature selection have focused mainly on classification task due to its popularity in the data-mining community. However, feature selection research in nonlinear system estimations has been very limited. Hence, it is reasonable to devise a feature selection approach that is computationally efficient on nonlinear system estimations context. A novel feature selection approach, the Monte Carlo evaluative selection (MCES), is proposed in this paper. MCES is an objective sampling method that derives a better estimation of the relevancy measure. The algorithm is objectively designed to be applicable to both classification and nonlinear regressive tasks. The MCES method has been demonstrated to perform well with four sets of experiments, consisting of two classification and two regressive tasks. The results demonstrate that the MCES method has following strong advantages: 1) ability to identify correlated and irrelevant features based on weight ranking, 2) application to both nonlinear system estimation and classification tasks, and 3) independence of the underlying induction algorithms used to derive the performance measures  相似文献   

2.
Xu  Ruohao  Li  Mengmeng  Yang  Zhongliang  Yang  Lifang  Qiao  Kangjia  Shang  Zhigang 《Applied Intelligence》2021,51(10):7233-7244

Feature selection is a technique to improve the classification accuracy of classifiers and a convenient data visualization method. As an incremental, task oriented, and model-free learning algorithm, Q-learning is suitable for feature selection, this study proposes a dynamic feature selection algorithm, which combines feature selection and Q-learning into a framework. First, the Q-learning is used to construct the discriminant functions for each class of the data. Next, the feature ranking is achieved according to the all discrimination functions vectors for each class of the data comprehensively, and the feature ranking is doing during the process of updating discriminant function vectors. Finally, experiments are designed to compare the performance of the proposed algorithm with four feature selection algorithms, the experimental results on the benchmark data set verify the effectiveness of the proposed algorithm, the classification performance of the proposed algorithm is better than the other feature selection algorithms, meanwhile the proposed algorithm also has good performance in removing the redundant features, and the experiments of the effect of learning rates on the our algorithm demonstrate that the selection of parameters in our algorithm is very simple.

  相似文献   

3.
It is a significant and challenging task to detect the informative features to carry out explainable analysis for high dimensional data, especially for those with very small number of samples. Feature selection especially the unsupervised ones are the right way to deal with this challenge and realize the task. Therefore, two unsupervised spectral feature selection algorithms are proposed in this paper. They group features using advanced Self-Tuning spectral clustering algorithm based on local standard deviation, so as to detect the global optimal feature clusters as far as possible. Then two feature ranking techniques, including cosine-similarity-based feature ranking and entropy-based feature ranking, are proposed, so that the representative feature of each cluster can be detected to comprise the feature subset on which the explainable classification system will be built. The effectiveness of the proposed algorithms is tested on high dimensional benchmark omics datasets and compared to peer methods, and the statistical test are conducted to determine whether or not the proposed spectral feature selection algorithms are significantly different from those of the peer methods. The extensive experiments demonstrate the proposed unsupervised spectral feature selection algorithms outperform the peer ones in comparison, especially the one based on cosine similarity feature ranking technique. The statistical test results show that the entropy feature ranking based spectral feature selection algorithm performs best. The detected features demonstrate strong discriminative capabilities in downstream classifiers for omics data, such that the AI system built on them would be reliable and explainable. It is especially significant in building transparent and trustworthy medical diagnostic systems from an interpretable AI perspective.  相似文献   

4.
In this article, we deal with the problem of measuring the importance of features, that determine the purchase of the product after being exposed to an advertisement. We use an algorithm called Monte Carlo feature selection, which is based on multiple usage of decision trees, to achieve a ranking of variables from the questionnaire data. Our data generation process relies on low-involvement during the advertisement watching phase and the comparison of advertised products is based on purchase in a virtual shop.  相似文献   

5.
基于互信息和遗传算法的两阶段特征选择方法   总被引:2,自引:0,他引:2  
为了在特征选择过程中得到较优的特征子集,结合标准化互信息和遗传算法提出了一种新的两阶段特征选择方法。该方法首先采用标准化的互信息对特征进行排序,然后用排序在前的特征初始化第二阶段遗传算法的部分种群,使得遗传算法的初始种群中含有较好的搜索起点,从而遗传算法只需较少的进化代数就可搜寻到较优的特征子集。实验显示,所提出的特征选择方法在特征约简和分类等方面具有较好的效果。  相似文献   

6.
Online learning is a growing branch of data mining which allows all traditional data mining techniques to be applied on a online stream of data in real time. In this paper, we present a fast and efficient online sensitivity based feature ranking method (SFR) which is updated incrementally. We take advantage of the concept of global sensitivity and rank features based on their impact on the outcome of the classification model. In the feature selection part, we use a two-stage filtering method in order to first eliminate highly correlated and redundant features and then eliminate irrelevant features in the second stage. One important advantage of our algorithm is its generality, which means the method works for correlated feature spaces without preprocessing. It can be implemented along with any single-pass online classification method with separating hyperplane such as SVMs. The proposed method is primarily developed for online tasks, however, we achieve very significant experimental results in comparison with popular batch feature ranking/selection methods. We also perform experiments to compare the method with available online feature ranking methods. Empirical results suggest that our method can be successfully implemented in batch learning or online mode.  相似文献   

7.
分类问题普遍存在于现代工业生产中。在进行分类任务之前,利用特征选择筛选有用的信息,能够有效地提高分类效率和分类精度。最小冗余最大相关算法(mRMR)考虑最大化特征与类别的相关性和最小化特征之间的冗余性,能够有效地选择特征子集;但该算法存在中后期特征重要度偏差大以及无法直接给出特征子集的问题。针对该问题,文中提出了结合邻域粗糙集差别矩阵和mRMR原理的特征选择算法。根据最大相关性和最小冗余性原则,利用邻域熵和邻域互信息定义了特征的重要度,以更好地处理混合数据类型。基于差别矩阵定义了动态差别集,利用差别集的动态演化有效去除冗余属性,缩小搜索范围,优化特征子集,并根据差别矩阵判定迭代截止条件。实验选取SVM,J48,KNN和MLP作为分类器来评价该特征选择算法的性能。在公共数据集上的实验结果表明,与已有算法相比,所提算法的平均分类精度提升了2%左右,同时在特征较多的数据集上能够有效地缩短特征选择时间。所提算法继承了差别矩阵和mRMR的优点,能够有效地处理特征选择问题。  相似文献   

8.
针对标签排序问题的特点,提出一种面向标签排序数据集的特征选择算法(Label Ranking Based Feature Selection, LRFS)。该算法首先基于邻域粗糙集定义了新的邻域信息测度,能直接度量连续型、离散型以及排序型特征间的相关性、冗余性和关联性。然后,在此基础上提出基于邻域关联权重因子的标签排序特征选择算法。实验结果表明,LRFS算法能够在不降低排序准确率的前提下,有效剔除标签排序数据集中的无关特征或冗余特征。  相似文献   

9.
在多标记学习框架中,特征选择是解决维数灾难,提高多标记分类器的有效手段。提出了一种融合特征排序的多标记特征选择算法。该算法首先在各标记下进行自适应的粒化样本,以此来构造特征与类别标记之间的邻域互信息。其次,对得到邻域互信息进行排序,使得每个类别标记下均能得到一组特征排序。最后,多个独立的特征排序经过聚类融合成一组新的特征排序。在4个多标记数据集和4个评价指标上的实验结果表明,所提算法优于一些当前流行的多标记降维方法。  相似文献   

10.
给出了一种乳腺X线照片微钙化点的特征选择方法,该方法运用基于加权变异算子的免疫算法进行特征优选。加权变异算子能够动态调整抗体各部位的变异率,在高亲和力抗体的邻近小范围搜索,在低亲和力抗体的周围跳跃式搜索;为了与支持向量机的分类准则保持一致性,该免疫算法在特征空间中通过核函数计算亲和力。实验使用该方法对微钙化点的20种常用特征进行选择,其结果与经验特征集基本相符但更精简,提高了计算效率,是一种可行的特征选择方法。  相似文献   

11.
This correspondence presents a novel hybrid wrapper and filter feature selection algorithm for a classification problem using a memetic framework. It incorporates a filter ranking method in the traditional genetic algorithm to improve classification performance and accelerate the search in identifying the core feature subsets. Particularly, the method adds or deletes a feature from a candidate feature subset based on the univariate feature ranking information. This empirical study on commonly used data sets from the University of California, Irvine repository and microarray data sets shows that the proposed method outperforms existing methods in terms of classification accuracy, number of selected features, and computational efficiency. Furthermore, we investigate several major issues of memetic algorithm (MA) to identify a good balance between local search and genetic search so as to maximize search quality and efficiency in the hybrid filter and wrapper MA  相似文献   

12.
In Routing Problems the aim is to determine a minimum cost traversal over a graph satisfying some specified constraints. Most of them are NP-hard problems and many different heuristic solution algorithms have been proposed. The name Monte Carlo, MC, applies to a set of heuristic procedures with the common feature of using random numbers to simulate a given process. MC approach has not been applied to the framework of Routing Problems in the literature. The purpose of this paper is to demonstrate that MC methods could be useful in implementing heuristic algorithms for Routing Problems. In particular, we design an efficient MC heuristic algorithm for the well known Rural Postman Problem (RPP), for which we have a set of instances with known optimal solution taken from the literature.The Rural Postman Problem (RPP) consists of finding a minimum cost traversal of a specified arc subset of a graph. Given that the RPP is a NP-hard problem, heuristic algorithms are interesting both to handle large size instances and to provide upper bounds that could be used in branch and cut procedures. In this paper we propose a heuristic algorithm for the RPP based on Monte Carlo methods. We simulate a vehicle travelling randomly over the graph, jumping from one node to another on the basis of certain probabilities. Monte Carlo methods provide a simple approach to many different Routing Problems and they are easily implemented in a computer code. The application of this algorithm to a set of RPP instances taken from the literature demonstrates that, using the appropriate probabilities, they are also efficient.  相似文献   

13.
许召召  申德荣  聂铁铮  寇月 《软件学报》2022,33(3):1128-1140
随着信息技术以及电子病历和病案在医疗机构的应用,医院数据库产生了大量的医学数据.决策树因其分类精度高、计算速度快,且分类规则简单、易于理解,而被广泛应用于医学数据分析中.然而,医学数据固有的高维特征空间和高度特征冗余等特点,使得传统的决策树在医学数据上的分类精度并不理想.基于此,提出了一种融合信息增益比排序分组和分组进...  相似文献   

14.
武妍  杨洋 《计算机应用》2006,26(2):433-0435
为了获得重要的特征集合,提出了一种基于判别式分析算法和神经网络的特征选择方法。通过最小化扩展互熵误差函数来训练神经网络,这一误差函数的使用减小了神经网络传输函数的导数,降低了输出敏感度。该方法首先利用判别式分析算法得到一个有序的特征队列,然后通过正则化神经网络进行特征的选择,特征选择过程是基于单个特征的移除带来验证数据集上分类误差变化这一原理。与其他基于不同原理的四种方法进行了比较,实验结果表明,利用该算法训练的网络能够获得较高分类准确率。  相似文献   

15.
大型搜索系统对用户查询的快速响应尤为必要,同时在计算候选文档的特征相关性时,必须遵守严格的后端延迟约束。通过特征选择,提高了机器学习的效率。针对排序学习中快速特征选择的起点多为单一排序效果最好的特征的特点,首先提出了一种用层次聚类法生成特征选择起点的算法,并将该算法应用于已有的2种快速特征选择中。除此之外,还提出了一种充分利用聚类特征的新方法来处理特征选择。在2个标准数据集上的实验表明,该算法既可以在不影响精度的情况下获得较小的特征子集,也可以在中等子集上获得最佳的排序精度。  相似文献   

16.
针对简单遗传算法用于特征选择精度不高、过早收敛的问题,提出了一种新的遗传算法——链式智能体遗传算法(LAGA),并与多准则(MC)相结合,从而提出了基于多准则竞争策略的链式智能体遗传算法(LAGA MC)用于特征选择。LAGA引入了链式智能体结构,智能体相互进行竞争选择和自适应交叉,自身进行自适应变异,从而使得该算法能够获得更精确的搜索结果;MC通过对基于单准则进行选择得到的特征子集进行特征位判断,从而确定出最终特征子集,以达到更全面的评价选择结果,获得识别率更稳定的特征子集。实验结果表明,LAGA搜索精度更高,LAGA MC获得的特征子集分类准确率更高、更稳定。  相似文献   

17.
针对数据的特征存在单一和协同特征的选择问题,基于平方误差标准核密度估计和随机置换理论,首先提出一种针对单一特征的特征选择方法(FSKDE-RP);然后,针对协同特征的情况,通过拓展随机置换理论,提出多维协同特征选择算法(SFSKDE-MRP),并利用核神经网络(KNN)分类器的分类精度选择最优特征子集.在模拟数据和真实数据集上的实验结果表明了所提出算法的有效性.  相似文献   

18.
This correspondence presents a novel hybrid wrapper and filter feature selection algorithm for a classification problem using a memetic framework. It incorporates a filter ranking method in the traditional genetic algorithm to improve classification performance and accelerate the search in identifying the core feature subsets. Particularly, the method adds or deletes a feature from a candidate feature subset based on the univariate feature ranking information. This empirical study on commonly used data sets from the University of California, Irvine repository and microarray data sets shows that the proposed method outperforms existing methods in terms of classification accuracy, number of selected features, and computational efficiency. Furthermore, we investigate several major issues of memetic algorithm (MA) to identify a good balance between local search and genetic search so as to maximize search quality and efficiency in the hybrid filter and wrapper MA.  相似文献   

19.
排序问题在信息检索领域是一个非常重要的课题。虽然排序学习模型的算法早已被深入研究,但针对排序学习算法中的特征选择的研究却很少。现实的情况是,许多用于分类的特征选择方法被直接应用到排序学习中。但由于排序和分类有着显著的差异,应研究出针对排序的特征选择算法。文中在介绍常用的排序学习的特征选择方法的基础上,提出了一种全新的、适用于QA问题的排序学习的特征选择方法一锦标赛排序特征选择方法。实验结果显示,这种新的特征选择方法在提高特征提取效率和降低特征向量维数方面都有显著改善。  相似文献   

20.
在跨场景行人识别过程中,为了解决多种特征以一个固定的权重融合导致行人识别率低、识别速度慢的问题,提出基于自适应特征选择的动态加权平均排名行人识别方法。首先,将GrabCut算法和基于流形排序显著性检测算法相融合,提高行人外观特征提取的准确性;然后,提出自适应显著特征选择方法,有效地提取行人特征描述;最后,通过动态加权平均排名模型将多特征融合。实验表明,所提出的方法提高了行人识别的准确性,同时对姿态的变化具有较好的鲁棒性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号