首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 165 毫秒
1.
基于信息熵的特征子集选择启发式算法的研究   总被引:2,自引:0,他引:2  
特征子集选择问题是机器学习和模式识别中的一个重要问题,最优特征子集选择问题已被证明是NP难题。然而,目前的特征子集选择的启发式算法是基于正反例一致的,没有考虑到实际应用中的噪音数据影响,使得选择一个较好的特征子集非常困难,首先从统计学的角度分析了噪音对特征子集选择的影响,给出了含有错误率的一致特征子集概念,然后利用信息熵和拉普拉斯错误估计函数构造了特征子集选择启发式算法EFS。  相似文献   

2.
容忍噪音的特征子集选择算法研究   总被引:4,自引:0,他引:4  
特征子集选择问题一直是人工智能领域研究的重要内容,特别是近几年来,特征子集选择算法研究已经成为机器学习和数据挖掘等领域的研究热点,提出了一个新的特征子集选择算法-容忍噪音的特征子集选择算法(NFS),该算法将聚类的思想引入到噪音的处理,并将Gini系数和墨西哥帽函数应用于特征选取,实现对偏吸噪音数据集的特征子集选择,实际领域的实验结果表明,NFS算法具有噪音容忍度高,选择特征代表性强和求解速度快的优点,因此能够有效地应用于实际领域。  相似文献   

3.
基于扩张矩阵的渐进式特征子集选择算法   总被引:2,自引:0,他引:2  
特征子集选择问题一直是人工智能领域研究的重要内容,特别是近几年来,特征子集选择的算法研究已经成为机器学习和数据挖掘等领域的一个研究热点。该文在扩张矩阵的基础上提出了类扩张矩阵的概念,并将加权的期望信息和不一致错误率函数应用于特征子集的选择,实现了具有噪音处理功能的渐进式特征子集选择算法———IFSS_EM,实际领域的实验结果表明:IFSS_EM算法具有运行效率高、选择特征较具有代表性的优点,从而使其能够较好地应用于实际领域。  相似文献   

4.
基于遗传算法及聚类的基因表达数据特征选择   总被引:1,自引:0,他引:1  
特征选择是模式识别及数据挖掘等领域的重要问题之一。针对高维数据对象(如基因表达数据)的特征选择,一方面可以提高分类及聚类的精度和效率,另一方面可以找出富含信息的特征子集,如发现与疾病密切相关的重要基因。针对此问题,本文提出了一种新的面向基因表达数据的特征选择方法,在特征子集搜索上采用遗传算法进行随机搜索,在特征子集评价上采用聚类算法及聚类错误率作为学习算法及评价指标。实验结果表明,该算法可有效地找出具有较好可分离性的特征子集,从而实现降维并提高聚类及分类精度。  相似文献   

5.
一种基于信息增益及遗传算法的特征选择算法   总被引:8,自引:0,他引:8  
特征选择是模式识别及数据挖掘等领域的重要问题之一。针对高维数据对象,特征选择一方面可以提高分类精度和效率,另一方面可以找出富含信息的特征子集。针对此问题,本文提出一种综合了filter模型及wrapper模型的特征选择方法,首先基于特征之间的信息增益进行特征分组及筛选,然后针对经过筛选而精简的特征子集采用遗传算法进行随机搜索,并采用感知器模型的分类错误率作为评价指标。实验结果表明,该算法可有效地找出具有较好的线性可分离性的特征子集,从而实现降维并提高分类精度。  相似文献   

6.
当特征集合存在强相关的特征子集且共同对分类问题有重要贡献时,传统方法通常从该子集中随机选择一个特征,导致数据可读性和分类性能下降.为此,面向多分类问题,提出一种基于支持向量机的特征选择算法,并设计一种快速迭代算法.该算法能够自动选择或剔除强相关的特征子集,在得到有效特征的同时实现特征降维.利用人工数据集和标准数据集进行试验,结果表明文中算法在特征选择可行性和有效性方面都有良好表现.  相似文献   

7.
基于相关性分析及遗传算法的高维数据特征选择   总被引:4,自引:0,他引:4  
特征选择是模式识别及数据挖掘等领域的重要问题之一。针对高维数据对象,特征选择一方面可以提高分类精度和效率,另一方面可以找出富含信息的特征子集。针对此问题,提出了一种综合了filter模型及wrapper模型的特征选择方法,首先基于特征与类别标签的相关性分析进行特征筛选,只保留与类别标签具有较强相关性的特征,然后针对经过筛选而精简的特征子集采用遗传算法进行随机搜索,并采用感知器模型的分类错误率作为评价指标。实验结果表明,该算法可有效地找出具有较好的线性可分离性的特征子集,从而实现降维并提高分类精度。  相似文献   

8.
模糊特征选择新算法:Ⅱ*   总被引:3,自引:0,他引:3  
用模糊似然函数计算类内及类间距离,得到任意特征子集的模糊特征特征选择系数,用于特征子集的选择,从而得出最能区分和表征模式类的特征子集。举例说明了该方法的具体用法,表明具有好的实用性。  相似文献   

9.
特征选择技术能有效解决维数灾难问题,许多搜索策略已经被应用到特征选择问题中。针对和声特征选择算法搜索能力低下的问题,提出了一种基于全局自适应调距的和声特征选择算法(HSFS-GPA)。将特征集的距离定义引入到特征选择问题中,在算法搜索过程中结合全局信息对随机产生的新和声进行调整,以一定概率减小候选和声与当前最优和声的距离来加快算法搜索速度,或减少候选和声与最差和声的距离以避免陷入局部最优;同时,采用竞争选择方案随时更新和声库全局信息,改进和声库的更新机制提高算法搜索质量。将HSFS-GPA与原始和声特征选择算法、粒子群算法和遗传算法进行对比实验,HSFS-GPA所选特征子集的大小比原始和声算法减少15%,子集评价值平均提高到0.98。实验结果表明,HSFS-GPA能在相同的条件下搜索到更优质的特征子集。  相似文献   

10.
基于Fisher准则和特征聚类的特征选择   总被引:2,自引:0,他引:2  
王飒  郑链 《计算机应用》2007,27(11):2812-2813
特征选择是机器学习和模式识别等领域的重要问题之一。针对高维数据,提出了一种基于Fisher准则和特征聚类的特征选择方法。首先基于Fisher准则,预选出鉴别性能较强的特征子集,然后在预选所得到的特征子集上对特征进行分层聚类,从而最终达到去除不相关和冗余特征的目的。实验结果表明该方法是一种有效的特征选择方法。  相似文献   

11.
在对等网上利用多路径分发视频是一种重要的机制,虽然在一对节点之间找出符合条件的多条路径并不困难,但发送端如何从可用路径集中选出最优路径子集,并为其最优地分配发送速率仍是一个难题。为此,提出一种新的对等网端到端最优多路径选择与速率分配(OMPSRA)算法。首先,应用排队论建立OMPSRA模型,并推导出一种新的OMPSRA公式,公式既给出最优分配的计算方法,也给出路径的最优速率分配与各路径最大可用带宽之间的关系,利用此关系可选出最优路径子集。最后基于公式实现OMPSRA算法。理论分析和仿真实验结果表明提出的算法能对通信量进行全局最优分配,最小化视频传输的端到端时延,有效提高视频传输质量,比同类算法有更好的性能。  相似文献   

12.
Incremental Feature Selection   总被引:6,自引:3,他引:6  
Feature selection is a problem of finding relevant features. When the number of features of a dataset is large and its number of patterns is huge, an effective method of feature selection can help in dimensionality reduction. An incremental probabilistic algorithm is designed and implemented as an alternative to the exhaustive and heuristic approaches. Theoretical analysis is given to support the idea of the probabilistic algorithm in finding an optimal or near-optimal subset of features. Experimental results suggest that (1) the probabilistic algorithm is effective in obtaining optimal/suboptimal feature subsets; (2) its incremental version expedites feature selection further when the number of patterns is large and can scale up without sacrificing the quality of selected features.  相似文献   

13.
Using Rough Sets with Heuristics for Feature Selection   总被引:32,自引:0,他引:32  
Practical machine learning algorithms are known to degrade in performance (prediction accuracy) when faced with many features (sometimes attribute is used instead of feature) that are not necessary for rule discovery. To cope with this problem, many methods for selecting a subset of features have been proposed. Among such methods, the filter approach that selects a feature subset using a preprocessing step, and the wrapper approach that selects an optimal feature subset from the space of possible subsets of features using the induction algorithm itself as a part of the evaluation function, are two typical ones. Although the filter approach is a faster one, it has some blindness and the performance of induction is not considered. On the other hand, the optimal feature subsets can be obtained by using the wrapper approach, but it is not easy to use because of the complexity of time and space. In this paper, we propose an algorithm which is using rough set theory with greedy heuristics for feature selection. Selecting features is similar to the filter approach, but the evaluation criterion is related to the performance of induction. That is, we select the features that do not damage the performance of induction.  相似文献   

14.
This paper presents a technique of selecting an optimal number of features from the original set of features. Due to the large number of features considered, it is computationally more efficient to select a subset of features that can discriminate as well as the original set. The subset of features is determined using stepwise discriminant analysis. The results of using such a scheme to classify scaled, rotated, and translated binary images and also images that have been perturbed with random noise are reported. The features used in this study are Zernike moments, which are the mapping of the image onto a set of complex orthogonal polynomials. The performance of using a subset is examined through its comparison to the original set.The classifiers used in this study are neural network and a statistical nearest neighbor classifier. The back-propagation learning algorithm is used in training the neural network. The classifers are trained with some noiseless images and are tested with the remaining data set. When an optimal subset of features is used, the classifers performed almost as well as when trained with the original set of features.  相似文献   

15.
In applications of learning from examples to real-world tasks,feature subset selection is important to speed up training and to improve generalization performance.Ideally,an inductive algorithm should use subset of features as small as possible.In this paper however,the authors show that the problem of selecting the minimum subset of features is NP-hard.The paper then presents a greedy algorithm for reature subset selection.The result of running the greedy algorithm on hand-written numeral recognition problem is also given.  相似文献   

16.
基于多路径的最优数据分配算法*   总被引:1,自引:0,他引:1  
在对等网上利用多路径分发视频是一种重要的机制,虽然在一对节点之间找出符合条件的多条路径并不困难,但发送端如何从可用路径集中选出一个最优路径子集,并为其最优地分配发送速率和数据仍是一个难题。为此提出一种基于多路径的最优数据分配算法(optimal data allocation algorithm based on multiple path, ODAABMP)。首先应用数学规划理论建立最优数据分配模型,然后基于模型给出ODAABMP,并对ODAABMP输出解的最优性给出证明,最后通过实验验证了算法的有效性。  相似文献   

17.
A new improved forward floating selection (IFFS) algorithm for selecting a subset of features is presented. Our proposed algorithm improves the state-of-the-art sequential forward floating selection algorithm. The improvement is to add an additional search step called “replacing the weak feature” to check whether removing any feature in the currently selected feature subset and adding a new one at each sequential step can improve the current feature subset. Our method provides the optimal or quasi-optimal (close to optimal) solutions for many selected subsets and requires significantly less computational load than optimal feature selection algorithms. Our experimental results for four different databases demonstrate that our algorithm consistently selects better subsets than other suboptimal feature selection algorithms do, especially when the original number of features of the database is large.  相似文献   

18.
基因数据的特点是高维度、小样本、大噪声,在处理过程中容易造成维数灾难和过度拟合等问题。针对这种情况提出一种新的基因数据集的特征选择方法,第一步是通过ReliefF算法对基因特征进行权重重要度的筛选;第二步是对筛选过的特征集合进行mRMR算法判断,留下与目标类别高度相关而其间相关性较小的基因特征;第三步利用邻域粗糙集特征选择算法对简化后的基因数据集进行寻优处理,选出最优化的特征基因子集。为了证明新算法的有效性,以SVM为分类器,使用外部交叉验证法对整个过程来计算,从而验证本文新特征选择方法的有效性。  相似文献   

19.
A key problem in computational geometry is the identification of subsets of a point set having particular properties. We study this problem for the properties of convexity and emptiness. We show that finding empty triangles is related to the problem of determining pairs of vertices that see each other in a star-shaped polygon. A linear-time algorithm for this problem which is of independent interest yields an optimal algorithm for finding all empty triangles. This result is then extended to an algorithm for finding empty convex r-gons (r> 3) and for determining a largest empty convex subset. Finally, extensions to higher dimensions are mentioned.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号