首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 377 毫秒
1.
Feature selection plays an important role in the machine-vision-based online detection of foreign fibers in cotton because of improvement detection accuracy and speed. Feature sets of foreign fibers in cotton belong to multi-character feature sets. That means the high-quality feature sets of foreign fibers in cotton consist of three classes of features which are respectively the color, texture and shape features. The multi-character feature sets naturally contain a space constraint which lead to the smaller feature space than the general feature set with the same number of features, however the existing algorithms do not consider the space characteristic of multi-character feature sets and treat the multi-character feature sets as the general feature sets. This paper proposed an improved ant colony optimization for feature selection, whose objective is to find the (near) optimal subsets in multi-character feature sets. In the proposed algorithm, group constraint is adopted to limit subset constructing process and probability transition for reducing the effect of invalid subsets and improve the convergence efficiency. As a result, the algorithm can effectively find the high-quality subsets in the feature space of multi-character feature sets. The proposed algorithm is tested in the datasets of foreign fibers in cotton and comparisons with other methods are also made. The experimental results show that the proposed algorithm can find the high-quality subsets with smaller size and high classification accuracy. This is very important to improve performance of online detection systems of foreign fibers in cotton.  相似文献   

2.
Feature Subset Selection within a Simulated Annealing Data Mining Algorithm   总被引:2,自引:0,他引:2  
An overview of the principle feature subset selection methods isgiven. We investigate a number of measures of feature subset quality, usinglarge commercial databases. We develop an entropic measure, based upon theinformation gain approach used within ID3 and C4.5 to build trees, which isshown to give the best performance over our databases. This measure is usedwithin a simple feature subset selection algorithm and the technique is usedto generate subsets of high quality features from the databases. A simulatedannealing based data mining technique is presented and applied to thedatabases. The performance using all features is compared to that achievedusing the subset selected by our algorithm. We show that a substantialreduction in the number of features may be achieved together with animprovement in the performance of our data mining system. We also present amodification of the data mining algorithm, which allows it to simultaneouslysearch for promising feature subsets and high quality rules. The effect ofvarying the generality level of the desired pattern is alsoinvestigated.  相似文献   

3.
杨柳  李云 《计算机应用》2021,41(12):3521-3526
K-匿名算法通过对数据的泛化、隐藏等手段使得数据达到K-匿名条件,在隐藏特征的同时考虑数据的隐私性与分类性能,可以视为一种特殊的特征选择方法,即K-匿名特征选择。K-匿名特征选择方法结合K-匿名与特征选择的特点使用多个评价准则选出K-匿名特征子集。过滤式K-匿名特征选择方法难以搜索到所有满足K-匿名条件的候选特征子集,不能保证得到的特征子集的分类性能最优,而封装式特征选择方法计算成本很大,因此,结合过滤式特征排序与封装式特征选择的特点,改进已有方法中的前向搜索策略,设计了一种混合式K-匿名特征选择算法,使用分类性能作为评价准则选出分类性能最好的K-匿名特征子集。在多个公开数据集上进行实验,结果表明,所提算法在分类性能上可以超过现有算法并且信息损失更小。  相似文献   

4.
一种基于信息增益及遗传算法的特征选择算法   总被引:8,自引:0,他引:8  
特征选择是模式识别及数据挖掘等领域的重要问题之一。针对高维数据对象,特征选择一方面可以提高分类精度和效率,另一方面可以找出富含信息的特征子集。针对此问题,本文提出一种综合了filter模型及wrapper模型的特征选择方法,首先基于特征之间的信息增益进行特征分组及筛选,然后针对经过筛选而精简的特征子集采用遗传算法进行随机搜索,并采用感知器模型的分类错误率作为评价指标。实验结果表明,该算法可有效地找出具有较好的线性可分离性的特征子集,从而实现降维并提高分类精度。  相似文献   

5.
Feature selection is an important filtering method for data analysis, pattern classification, data mining, and so on. Feature selection reduces the number of features by removing irrelevant and redundant data. In this paper, we propose a hybrid filter–wrapper feature subset selection algorithm called the maximum Spearman minimum covariance cuckoo search (MSMCCS). First, based on Spearman and covariance, a filter algorithm is proposed called maximum Spearman minimum covariance (MSMC). Second, three parameters are proposed in MSMC to adjust the weights of the correlation and redundancy, improve the relevance of feature subsets, and reduce the redundancy. Third, in the improved cuckoo search algorithm, a weighted combination strategy is used to select candidate feature subsets, a crossover mutation concept is used to adjust the candidate feature subsets, and finally, the filtered features are selected into optimal feature subsets. Therefore, the MSMCCS combines the efficiency of filters with the greater accuracy of wrappers. Experimental results on eight common data sets from the University of California at Irvine Machine Learning Repository showed that the MSMCCS algorithm had better classification accuracy than the seven wrapper methods, the one filter method, and the two hybrid methods. Furthermore, the proposed algorithm achieved preferable performance on the Wilcoxon signed-rank test and the sensitivity–specificity test.  相似文献   

6.
特征选择是处理高维大数据常用的降维手段,但其中牵涉到的多个彼此冲突的特征子集评价目标难以平衡。为综合考虑特征选择中多种子集评价方式间的折中,优化子集性能,提出一种基于子集评价多目标优化的特征选择框架,并重点对多目标粒子群优化(MOPSO)在特征子集评价中的应用进行了研究。该框架分别根据子集的稀疏度、分类能力和信息损失度设计多目标优化函数,继而基于多目标优化算法进行特征权值向量寻优,并通过权值向量Pareto解集膝点选取确定最优向量,最终实现基于权值向量排序的特征选择。设计实验对比了基于多目标粒子群优化算法的特征选择(FS_MOPSO)与四种经典方法的性能,多个数据集上的结果表明,FS_MOPSO在低维空间表现出更高的分类精度,并保证了更少的信息损失。  相似文献   

7.
基于遗传算法及聚类的基因表达数据特征选择   总被引:1,自引:0,他引:1  
特征选择是模式识别及数据挖掘等领域的重要问题之一。针对高维数据对象(如基因表达数据)的特征选择,一方面可以提高分类及聚类的精度和效率,另一方面可以找出富含信息的特征子集,如发现与疾病密切相关的重要基因。针对此问题,本文提出了一种新的面向基因表达数据的特征选择方法,在特征子集搜索上采用遗传算法进行随机搜索,在特征子集评价上采用聚类算法及聚类错误率作为学习算法及评价指标。实验结果表明,该算法可有效地找出具有较好可分离性的特征子集,从而实现降维并提高聚类及分类精度。  相似文献   

8.
This correspondence presents a novel hybrid wrapper and filter feature selection algorithm for a classification problem using a memetic framework. It incorporates a filter ranking method in the traditional genetic algorithm to improve classification performance and accelerate the search in identifying the core feature subsets. Particularly, the method adds or deletes a feature from a candidate feature subset based on the univariate feature ranking information. This empirical study on commonly used data sets from the University of California, Irvine repository and microarray data sets shows that the proposed method outperforms existing methods in terms of classification accuracy, number of selected features, and computational efficiency. Furthermore, we investigate several major issues of memetic algorithm (MA) to identify a good balance between local search and genetic search so as to maximize search quality and efficiency in the hybrid filter and wrapper MA  相似文献   

9.
杨震宇  叶军  季雨瑄  敖家欣  王磊 《计算机应用研究》2022,39(4):1118-1123+1131
目前已有蚁群算法优化的特征选择方法,大多采用的是以属性依赖度和信息熵属性重要度作为路径上启发搜索因子,但这类搜索方法在某些决策表中存在算法早熟或搜索到的特征子集包含了冗余特征,从而导致选择精度显著下降。针对此类问题,根据条件属性在分辨矩阵中的占比提出了一种属性重要度的度量方法,以分辨矩阵重要度作为路径上启发因子,设计了一种基于分辨矩阵与蚁群算法优化的特征子集搜索方法。该算法从特征核出发,蚁群依次选择概率大的特征加入特征核集,直至找到最小特征子集算法终止。通过实例验证和UCI数据集实验结果表明,与基于属性依赖度和信息熵属性重要度的特征选择方法相比,在通常情况下,该算法能较小代价找到最小特征子集,并且可以有效减少计算工作量。  相似文献   

10.
特征选择方法与算法的研究   总被引:1,自引:0,他引:1  
特征选择的主要思想是通过去除一些包含少量或不相关的信息的特征去选择特征子集。特征选择方法可分为三大类:一是过滤式,二是封装式,三是嵌入式。鉴于目前存在大量的特征选择算法,为了能够适当地决定在特定的情况下使用哪种算法,需要提出可以依赖或判定的标准。文中的主要工作就是综述一些基本特征选择算法,根据文献中已有的理论和实验结果对特征选择方法和算法进行比较分类,然后提出一种可以依赖或判定的标准。  相似文献   

11.
提出了一种新的面向高维数据的特征选择方法,在特征子集搜索上采用遗传算法进行随机搜索,在特征子集评价上采用基于边界点的可分性度量作为评价指标及适应度。实验结果表明,该算法可有效地找出具有较好的可分离性的特征子集,从而实现降维并提高分类 精度。  相似文献   

12.
This correspondence presents a novel hybrid wrapper and filter feature selection algorithm for a classification problem using a memetic framework. It incorporates a filter ranking method in the traditional genetic algorithm to improve classification performance and accelerate the search in identifying the core feature subsets. Particularly, the method adds or deletes a feature from a candidate feature subset based on the univariate feature ranking information. This empirical study on commonly used data sets from the University of California, Irvine repository and microarray data sets shows that the proposed method outperforms existing methods in terms of classification accuracy, number of selected features, and computational efficiency. Furthermore, we investigate several major issues of memetic algorithm (MA) to identify a good balance between local search and genetic search so as to maximize search quality and efficiency in the hybrid filter and wrapper MA.  相似文献   

13.
为了有效从收集的恶意数据中选择特征去分析,保障网络系统的安全与稳定,需要进行网络入侵检测模型研究;但目前方法是采用遗传算法找出网络入侵的特征子集,再利用粒子群算法进行进一步选择,找出最优的特征子集,最后利用极限学习机对网络入侵进行分类,但该方法准确性较低;为此,提出一种基于特征选择的网络入侵检测模型研究方法;该方法首先以增强寻优性能为目标对网络入侵检测进行特征选择,结合分析出的特征选择利用特征属性的Fisher比构造出特征子集的评价函数,然后结合计算出的特征子集评价函数进行支持向量机完成对基于特征选择的网络入侵检测模型研究方法;仿真实验表明,利用支持向量机对网络入侵进行检测能有效地提高入侵检测的速度以及入侵检测的准确性。  相似文献   

14.
基于相关性分析及遗传算法的高维数据特征选择   总被引:4,自引:0,他引:4  
特征选择是模式识别及数据挖掘等领域的重要问题之一。针对高维数据对象,特征选择一方面可以提高分类精度和效率,另一方面可以找出富含信息的特征子集。针对此问题,提出了一种综合了filter模型及wrapper模型的特征选择方法,首先基于特征与类别标签的相关性分析进行特征筛选,只保留与类别标签具有较强相关性的特征,然后针对经过筛选而精简的特征子集采用遗传算法进行随机搜索,并采用感知器模型的分类错误率作为评价指标。实验结果表明,该算法可有效地找出具有较好的线性可分离性的特征子集,从而实现降维并提高分类精度。  相似文献   

15.
针对高维复杂的符号数据集在聚类中的聚类效果差和计算耗时过大的问题,首先提出了一种基于邻域距离的无监督特征选择算法,然后在选择到的特征子集上进行重新聚类,从而有效提高了聚类结果的精度,降低了聚类计算的计算耗时。实验结果表明,该算法可以找到有效的特征子集,提高数据集的聚类精度,降低面对高维复杂数据集聚类的计算耗时。  相似文献   

16.
随着互联网和物联网技术的发展,数据的收集变得越发容易。但是,高维数据中包含了很多冗余和不相关的特征,直接使用会徒增模型的计算量,甚至会降低模型的表现性能,故很有必要对高维数据进行降维处理。特征选择可以通过减少特征维度来降低计算开销和去除冗余特征,以提高机器学习模型的性能,并保留了数据的原始特征,具有良好的可解释性。特征选择已经成为机器学习领域中重要的数据预处理步骤之一。粗糙集理论是一种可用于特征选择的有效方法,它可以通过去除冗余信息来保留原始特征的特性。然而,由于计算所有的特征子集组合的开销较大,传统的基于粗糙集的特征选择方法很难找到全局最优的特征子集。针对上述问题,文中提出了一种基于粗糙集和改进鲸鱼优化算法的特征选择方法。为避免鲸鱼算法陷入局部优化,文中提出了种群优化和扰动策略的改进鲸鱼算法。该算法首先随机初始化一系列特征子集,然后用基于粗糙集属性依赖度的目标函数来评价各子集的优劣,最后使用改进鲸鱼优化算法,通过不断迭代找到可接受的近似最优特征子集。在UCI数据集上的实验结果表明,当以支持向量机为评价所用的分类器时,文中提出的算法能找到具有较少信息损失的特征子集,且具有较高的分类精度。因此,所提算法在特征选择方面具有一定的优势。  相似文献   

17.
翟俊海    刘博  张素芳 《智能系统学报》2017,12(3):397-404
特征选择是指从初始特征全集中,依据既定规则筛选出特征子集的过程,是数据挖掘的重要预处理步骤。通过剔除冗余属性,以达到降低算法复杂度和提高算法性能的目的。针对离散值特征选择问题,提出了一种将粗糙集相对分类信息熵和粒子群算法相结合的特征选择方法,依托粒子群算法,以相对分类信息熵作为适应度函数,并与其他基于进化算法的特征选择方法进行了实验比较,实验结果表明本文提出的方法具有一定的优势。  相似文献   

18.
Using Rough Sets with Heuristics for Feature Selection   总被引:32,自引:0,他引:32  
Practical machine learning algorithms are known to degrade in performance (prediction accuracy) when faced with many features (sometimes attribute is used instead of feature) that are not necessary for rule discovery. To cope with this problem, many methods for selecting a subset of features have been proposed. Among such methods, the filter approach that selects a feature subset using a preprocessing step, and the wrapper approach that selects an optimal feature subset from the space of possible subsets of features using the induction algorithm itself as a part of the evaluation function, are two typical ones. Although the filter approach is a faster one, it has some blindness and the performance of induction is not considered. On the other hand, the optimal feature subsets can be obtained by using the wrapper approach, but it is not easy to use because of the complexity of time and space. In this paper, we propose an algorithm which is using rough set theory with greedy heuristics for feature selection. Selecting features is similar to the filter approach, but the evaluation criterion is related to the performance of induction. That is, we select the features that do not damage the performance of induction.  相似文献   

19.
传统的基于特征选择的分类算法中,由于其采用的冗余度和相关度评价标准单一,从而使得此类算法应用范围受限。针对这个问题,本文提出一种新的最大相关最小冗余特征选择算法,该算法在度量特征之间冗余度的评价准则中引入了两种不同的评价准则;在度量特征与类别之间的相关度中引入了4种不同的评价准则,衍生出8种不同的特征选择算法,从而使得该算法应用范围增大。此外,由于传统的最大相关最小冗余特征选择算法不能根据用户实际需求的数据维度进行特征选择。所以,引入了指示向量 $\lambda $ 来刻画用户实际的数据维度需求,提出了一种新的目标函数来求解最优特征子集,利用支持向量机对4个UCI数据集的特征子集进行了实验,最后,利用分类正确率、成对单边T检验充分验证了该算法的有效性。  相似文献   

20.
特征选择作为一个数据预处理过程,在数据挖掘、模式识别和机器学习中有着重要地位。通过特征选择,可以降低问题的复杂度,提高学习算法的预测精度、鲁棒性和可解释性。介绍特征选择方法框架,重点描述生成特征子集、评价准则两个过程;根据特征选择和学习算法的不同结合方式对特征选择算法分类,并分析各种方法的优缺点;讨论现有特征选择算法存在的问题,提出一些研究难点和研究方向。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号