首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
高超声速进气道不起动预测研究中主要包括确定压力传感器位置和建立起动\不起动分类面,属于机器学习中特征选择问题和分类问题,而常用特征选择算法(基于支持向量机的递归特征消除SVM-RFE)单一并且耗时较长。为解决该问题寻找较优的特征选择算法,建立一个高超声速二元进气道/隔离段模型,通过数值模拟获得内流道上表面压力数据样本;利用Relief和SVMRFE组合式算法Relief-Corre方法,Relief-SVMRFE方法,Relief-PSO-SVMRFE方法进行特征选择;支持向量机SVM训练分类面。最后得出Relief-SVMRFE方法性能最优,运行效率比SVMRFE提高了约3倍,准确率比其他基于Relief组合方法高;获得最优特征的分类面具有较高的泛化性与鲁棒性,证明该分类面的有效性。  相似文献   

2.
壳近邻分类算法克服了k近邻分类在近邻选择上可能存在偏好的问题,使得在大数据集上的分类效果优于k近邻分类,为了进一步提高壳近邻算法的分类性能,提出了基于Relief特征加权的壳近邻分类算法.该算法在Relief算法的基础上求解训练集的特征权值,并利用特征权值来改进算法的距离度量方法和投票机制.实验结果表明,该算法在小数据和大数据上的分类性能都优于k近邻和壳近邻分类算法.  相似文献   

3.
SNP是指基因序列中碱基对的变异,通过对人体SNP位点进行基因分型,可以有效帮助人们找出与遗传疾病相关的基因.然而,对所有SNPs位点进行基因分型的成本过于昂贵.研究表明,部分最具有代表性的SNPs(标签SNP,tag SNP)就可以区分基因变异,选择高质量的tag SNP成为基因研究中的热点内容.论文提出一种基于连锁...  相似文献   

4.
基于支持向量机的递归特征消除(SVM-RFE)是目前最主流的基因选择方法之一,是为二分类问题设计的,对于多分类问题必须要进行扩展。从帕累托最优(Pareto Optimum)的概念出发,阐明了常用的基因选择方法在多分类问题中的局限性,提出了基于类别的基因选择过程,并据此提出一种新的SVM-RFE设计方法。8个癌症和肿瘤基因表达谱数据上的实验结果证明了新方法优于另两种递归特征消除方法,为每一类单独寻找最优基因,能够得到更高的分类准确率。  相似文献   

5.
SVM-RFE特征选择算法是一种有效的特征选择方法,具有较高的应用价值。针对传统SVM-RFE特征选择算法中SVM参数(γ和C)难以确定的问题,本文采用粒子群算法搜索SVM的参数。然后将特征向量映射到SVM参数γ确定的核空间中并进行特征选择,有效地将特征选择与SVM分类器设计关联起来。仿真结果表明,特征选择后的数据集仍能保证SVM分类器具有较高的分类正确率。  相似文献   

6.
不平衡数据集上的Relief特征选择算法   总被引:1,自引:0,他引:1  
Relief算法为系列特征选择方法,包括最早提出的Relief算法和后来拓展的ReliefF算法,核心思想是对分类贡献大的特征赋予较大的权值;特点是算法简单,运行效率高,因此有着广泛的应用。但直接将Relief算法应用于有干扰的数据集或不平衡数据集,效果并不理想。基于Relief算法,提出一种干扰数据特征选择算法,称为阈值-Relief算法,有效消除了干扰数据对分类结果的影响。结合K-means算法,提出两种不平衡数据集特征选择算法,分别称为K-means-ReliefF算法和 K-means-Relief抽样算法,有效弥补了Relief算法在不平衡数据集上表现出的不足。实验证明了本文算法的有效性。  相似文献   

7.
特征选择在机器学习和数据挖掘中起到了至关重要的作用。Relief作为一种高效的过滤式特征选择算法,能处理多种类型的数据,且对噪声的容忍力较强,因此被广泛应用。然而,经典的Relief算法对离散特征的评价较为简单,在实际进行特征选择时并未充分挖掘特征与类标签之间的潜在关系,具有很大的改进空间。针对经典的Relief算法对离散特征的评价方式较为简单这一不足,提出了一种基于标签相关度的离散特征评价方法。该算法充分考虑了不同特征的特性,给出了一种面向混合特征的距离度量方式,同时从离散特征与标签之间的相关度出发,重新定义了Relief算法对离散特征的评价体系。实验结果表明,改进后的Relief算法与经典的Relief算法和现有的一些面向混合数据的特征选择算法相比,其分类精度均有不同程度的提升,具有良好的性能。  相似文献   

8.
针对当前阿尔茨海默病脑皮层厚度数据的特征选择算法分类精度问题,提出一种融合的特征选择算法。首先,分析处理轻度认知障碍人群和正常老年人的脑皮层厚度的核磁共振图像数据,基于此数据融合最小冗余和最大相关方法与Relief方法,并使用粒子群优化算法求得最优权重。然后,使用此权重融合两种方法对脑皮层厚度的脑区特征做特征选择,选出使分类准确率较高的特征。实验使用留一验证对实验结果进行评估,选出的特征对轻度认知障碍人群与正常老年人的分类效果好于当前流行的特征选择方法。  相似文献   

9.
SVM-RFE特征选择算法的算法复杂度高,特征选择消耗时间过长,为了缩短特征选择的时间,针对径向基函数—支持向量机分类器提出了依据核空间类间平均距进行特征选择的算法。首先分析了径向基函数核参数与数据集核空间类间平均距之间的关系,然后提出了依据单个特征对数据集的核空间类间平均距的贡献大小进行特征重要性排序的算法,最后用该算法和SVM-RFE算法分别对8个UCI数据集进行了特征选择实验。实验结果证明了该算法的正确性、有效性,而且特征选择的时间与SVM-RFE算法相比大大减小。  相似文献   

10.
针对高分遥感影像分类过程中面临的特征维数高、数据冗杂度严重问题,从机器学习的角度提出了混合粒子群优化遗传算法的特征优化方法。此方法发挥2种机器学习算法优势,以Relief F算法进行初步特征筛选,再利用新二进制粒子群优化遗传算法确定优化特征集用于随机森林分类器进行城市用地信息的提取。通过与全特征、Relief F算法、GABPSO算法3种特征提取方法进行比较,验证此方法的优越性。结果表明,基于Relief F和GANBPSO算法的混合特征选择方法能够在提取较少特征变量的情况下获得较高的精度,总精度和Kappa系数分别为91.17%和0.874,与传统方法相比具有更好的分类效果。  相似文献   

11.
全基因组关联研究是研究复杂疾病和性状遗传效应的一种有效手段。现有关联分析主要用的是边缘统计检验的方法,但未考虑特征间相关性、阈值选取不稳定等问题。该文以心脑血管疾病为研究对象,提出了一种基于多步筛选法的全基因组关联分析新方法。该方法可以简要概括为以下 两步:首先利用 Gini 指数做特征初始筛选,获得一个候选单核苷酸多态性子集,再用基于随机森林的递归聚类消除法从单核苷酸多态性子集中发现关联单核苷酸多态性。实验结果表明,多步筛选法比单步特征选择的效果更好,基于 Gini 指数的基于随机森林的递归聚类消除法筛选的单核苷酸多态性子集与疾病的关联性更高。  相似文献   

12.
TagSNP selection, which aims to select a small subset of informative single nucleotide polymorphisms (SNPs) to represent the whole large SNP set, has played an important role in current genomic research. Not only can this cut down the cost of genotyping by filtering a large number of redundant SNPs, but also it can accelerate the study of genome-wide disease association. In this paper, we propose a new hybrid method called CMDStagger that combines the ideas of the clustering and the graph algorithm, to find the minimum set of tagSNPs. The proposed algorithm uses the information of the linkage disequilibrium association and the haplotype diversity to reduce the information loss in tagSNP selection, and has no limit of block partition. The approach is tested on eight benchmark datasets from Hapmap and chromosome 5q31. Experimental results show that the algorithm in this paper can reduce the selection time and obtain less tagSNPs with high prediction accuracy. It indicates that this method has better performance than previous ones.  相似文献   

13.
谢琪  徐旭  程耕国  陈和平 《计算机应用》2020,40(5):1266-1271
针对传统的基于森林优化算法的特征选择算法在初始化阶段、候选森林生成阶段和更新阶段存在的问题,提出了一种新的基于森林优化算法的特征选择算法。该算法在初始化阶段采用皮尔森相关系数和L1正则化方法代替随机初始化策略;在候选森林生成阶段,采用优劣树分开和差额补足的方法解决优劣树不完备问题;在更新阶段,将与最优树精度相同但维度不同的树木添加到森林中。在实验中,所提算法采用与传统的基于森林优化算法的特征选择算法相同的实验数据和实验参数,分别测试了小维度、中维度和大维度数据。实验结果表明,在2个大维度数据和2个中维度数据上,所提算法的分类精度和维度缩减能力均高于传统的基于森林优化算法的特征选择算法。实验结果验证了所提算法在处理特征选择问题的有效性。  相似文献   

14.
SNPs are positions of the DNA sequences where the differences among individuals are embedded. The knowledge of such SNPs is crucial for disease association studies, but even if the number of such positions is low (about 1% of the entire sequence), the cost to extract the complete information is actually very high. Recent studies have shown that DNA sequences are structured into blocks of positions, that are conserved during evolution, where there is strong correlation among values (alleles) of different loci. To reduce the cost of extracting SNPs information, the block structure of the DNA has suggested to limit the process to a subset of SNPs, the so-called Tag SNPs, that are able to maintain the most of the information contained in the whole sequence. In this paper, we apply a technique for feature selection based on integer programming to the problem of Tag SNP selection. Moreover, to test the quality of our approach, we consider also the problem of SNPs reconstruction, i.e. the problem of deriving unknown SNPs from the value of Tag SNPs and propose two reconstruction methods, one based on a majority vote and the other on a machine learning approach. We test our algorithm on two public data sets of different nature, providing results that are, when comparable, in line with the related literature. One of the interesting aspects of the proposed method is to be found in its capability to deal simultaneously with very large SNPs sets, and, in addition, to provide highly informative reconstruction rules in the form of logic formulas.  相似文献   

15.
A novel feature selection approach: Combining feature wrappers and filters   总被引:2,自引:0,他引:2  
Feature selection is one of the most important issues in the research fields such as system modelling, data mining and pattern recognition. In this study, a new feature selection algorithm that combines feature wrapper and feature filter approaches is proposed in order to identify the significant input variables in systems with continuous domains. The proposed method utilizes functional dependency concept, correlation coefficients and K-nearest neighbourhood (KNN) method to implement the feature filter and feature wrappers. Four feature selection methods independently select the significant input variables and the input variable combination, which yields best result with respect to their corresponding evaluation function, is selected as the winner. This is similar to the basic information fusion notion of integrating the information collected from different sources. All of the four feature selection methods are performed in two stages: (i) pre-selection, (ii) selection. Two of the four feature selection methods utilize KNN method for evaluating the candidates. These two methods use sequential forward and sequential backward search mechanism, respectively, in pre-selection stage. Whereas, the third feature selection method uses correlation coefficients in the pre-selection stage. It is common to have outliers and noise in real-life data. In order to make the proposed feature selection algorithm noise and outlier resistant, approximate functional dependencies are used by utilizing membership values that inherently cope with uncertainty in the data. Thus, the fourth feature selection method makes use of approximate functional dependencies to evaluate candidates in pre-selection stage. All of these four methods apply KNN method with exhaustive search strategy in order to find the most suitable input variable combination with respect to a performance measure.  相似文献   

16.
全基因组关联研究(Genome-wide association studies,GWAS)是指在基因水平上进行关联分析来寻找致病基因的方法. 传统的研究方法没有考虑到基因之间的相互作用,而且在复杂的因素情形下往往效率、准确率较低. 针对上述难题,本文提出一种基于互信息的结构性关键SNPs集合选取方法. 在互信息理论和仿真数据的基础之上,逆向构建SNPs互信息网络,给定互信息一个阈值范围,找到对应阈值下相关统计量进行比较分析,选取出合适的阈值. 根据选取的阈值,筛选出对网络结构有明显影响效果的“结构性关键SNPs”. 实验结果表明:本文采用的参数取值方法能够准确快速地筛选出对网络结构有明显影响效果的关键SNPs.  相似文献   

17.
In this paper, we propose a method based on association rule-mining to enhance the diagnosis of medical images (mammograms). It combines low-level features automatically extracted from images and high-level knowledge from specialists to search for patterns. Our method analyzes medical images and automatically generates suggestions of diagnoses employing mining of association rules. The suggestions of diagnosis are used to accelerate the image analysis performed by specialists as well as to provide them an alternative to work on. The proposed method uses two new algorithms, PreSAGe and HiCARe. The PreSAGe algorithm combines, in a single step, feature selection and discretization, and reduces the mining complexity. Experiments performed on PreSAGe show that this algorithm is highly suitable to perform feature selection and discretization in medical images. HiCARe is a new associative classifier. The HiCARe algorithm has an important property that makes it unique: it assigns multiple keywords per image to suggest a diagnosis with high values of accuracy. Our method was applied to real datasets, and the results show high sensitivity (up to 95%) and accuracy (up to 92%), allowing us to claim that the use of association rules is a powerful means to assist in the diagnosing task.  相似文献   

18.
Abstract: We used several machine learning techniques to predict the susceptibility to chronic hepatitis from single nucleotide polymorphism (SNP) data. These are integrated with several feature selection algorithms to identify a set of SNPs relevant to the disease. In addition, we apply a backtracking technique to a couple of feature selection algorithms, forward selection and backward elimination, and show that it is beneficial to find the better solutions by experiment. The experimental results show that the decision rule is able to distinguish chronic hepatitis from normal with a maximum accuracy of 73.20%, whereas the accuracy of the support vector machine is 67.53% and that of the decision tree is 72.68%. It is also shown that the decision tree and decision rule are potential tools to predict susceptibility to chronic hepatitis from SNP data.  相似文献   

19.
针对目前电子邮件安全网关不能很好地支持敏感信息检测问题,深入研究了Winnow算法和Markov模型,在N-Gram语言模型的基础上,提出了一种邮件特征选择方法--Markov-Gram,该方法以句子为单位进行特征项的选取,不仅保留了更多的语义信息,而且可以有效地减少特征项的数目,解决"维度灾难"问题;提出一种Winnow算法训练过程中初始权重生成方法,该方法融入了电子邮件结构特点以及  相似文献   

20.
在非结构化数据挖掘结构模型,即发现特征子空间模型(DFSSM)的运行机制下,提出了一种新的文本分类算法——基于DFSSM 的文本分类(TCDFSSM) 算法。该算法在文本训练及分类阶段的基础上增加了自动反馈阶段,使得TCDFSSM具有自学习能力,并给出了文本分类过程反馈阈值的选取算法。结果表明,该算法分类效果良好,其自学习能力、适应性及鲁棒性更加优越。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号