首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 391 毫秒
1.
Feature selection is a challenging problem in areas such as pattern recognition, machine learning and data mining. Considering a consistency measure introduced in rough set theory, the problem of feature selection, also called attribute reduction, aims to retain the discriminatory power of original features. Many heuristic attribute reduction algorithms have been proposed however, quite often, these methods are computationally time-consuming. To overcome this shortcoming, we introduce a theoretic framework based on rough set theory, called positive approximation, which can be used to accelerate a heuristic process of attribute reduction. Based on the proposed accelerator, a general attribute reduction algorithm is designed. Through the use of the accelerator, several representative heuristic attribute reduction algorithms in rough set theory have been enhanced. Note that each of the modified algorithms can choose the same attribute reduct as its original version, and hence possesses the same classification accuracy. Experiments show that these modified algorithms outperform their original counterparts. It is worth noting that the performance of the modified algorithms becomes more visible when dealing with larger data sets.  相似文献   

2.
Rough set theory is one of the effective methods to feature selection, which can preserve the meaning of the features. The essence of rough set approach to feature selection is to find a subset of the original features. Since finding a minimal subset of the features is a NP-hard problem, it is necessary to investigate effective and efficient heuristic algorithms. Ant colony optimization (ACO) has been successfully applied to many difficult combinatorial problems like quadratic assignment, traveling salesman, scheduling, etc. It is particularly attractive for feature selection since there is no heuristic information that can guide search to the optimal minimal subset every time. However, ants can discover the best feature combinations as they traverse the graph. In this paper, we propose a new rough set approach to feature selection based on ACO, which adopts mutual information based feature significance as heuristic information. A novel feature selection algorithm is also given. Jensen and Shen proposed a ACO-based feature selection approach which starts from a random feature. Our approach starts from the feature core, which changes the complete graph to a smaller one. To verify the efficiency of our algorithm, experiments are carried out on some standard UCI datasets. The results demonstrate that our algorithm can provide efficient solution to find a minimal subset of the features.  相似文献   

3.
Neighbourhood rough set theory has proven already, as an efficient tool for knowledge discovering from heterogeneous data. However, some types of the data are incomplete and noisy in practical environments, such as signal analysis, fault diagnosis etc. To solve this problem, a universal neighbourhood rough sets model (variable precision tolerance neighbourhood rough sets [VPTNRS] model) is proposed based on a tolerance neighbourhood relation and the probabilistic theory. The proposed model can be inducing a family of much more comprehensive information granules to characterize arbitrary concepts in complex universe. In this paper, we discussed the properties of the model as well as some important relevant theorems are also introduced and proved. Furthermore, a heuristic heterogeneous feature selection algorithm is given based on the model. The experimental results with 10 choices University of California Irvine (UCI) standard data sets showed that the universal model performed well both in feature selection and classification, especially in incomplete environment.  相似文献   

4.
Feature selection plays a vital role in many areas of pattern recognition and data mining. The effective computation of feature selection is important for improving the classification performance. In rough set theory, many feature selection algorithms have been proposed to process static incomplete data. However, feature values in an incomplete data set may vary dynamically in real-world applications. For such dynamic incomplete data, a classic (non-incremental) approach of feature selection is usually computationally time-consuming. To overcome this disadvantage, we propose an incremental approach for feature selection, which can accelerate the feature selection process in dynamic incomplete data. We firstly employ an incremental manner to compute the new positive region when feature values with respect to an object set vary dynamically. Based on the calculated positive region, two efficient incremental feature selection algorithms are developed respectively for single object and multiple objects with varying feature values. Then we conduct a series of experiments with 12 UCI real data sets to evaluate the efficiency and effectiveness of our proposed algorithms. The experimental results show that the proposed algorithms compare favorably with that of applying the existing non-incremental methods.  相似文献   

5.
基于模糊粗糙集信息熵的蚁群特征选择方法   总被引:1,自引:0,他引:1  
赵军阳  张志利 《计算机应用》2009,29(1):109-111,
目前针对高维数据特征选择提出的启发式算法多数容易陷入局部最优,无法对整个特征空间进行有效搜索。为了提高对特征域的并行搜索能力,基于模糊粗糙集的信息熵原理,对蚁群模型的搜索策略、信息素更新和状态转移规则等进行了改进,提出蚁群特征选择方法。经UCI数据实验验证,该算法比传统的特征选择算法具有更好的选择效果,是有效的。  相似文献   

6.
实际应用中,数据常常表现出不完备性和动态性的特点。针对动态不完备数据中的特征选择问题,提出了一种基于相容粗糙集模型和信息熵理论的增量式特征选择方法。首先,建立了不完备信息系统中特征值动态更新时论域上条件划分与决策分类的动态更新模式,分析了作为特征重要度评价准则的不完备相容信息熵的增量计算机制,并将该机制引入到启发式最优特征子集搜索过程中特征重要度的迭代计算,进一步设计了不完备数据中面向特征值动态更新的增量式特征选择算法。最后,在标准UCI数据集上从分类精度、决策性能和计算效率3个方面对文中所提出的增量算法的有效性和高效性进行了实验验证。  相似文献   

7.
模糊粗糙集由于能够处理实数值数据,甚至是混合值数据中的不确定性受到人们的广泛关注,其最重要的应用之一是特征选择,相关的特征选择方法已有不少研究,但其快速的特征选择算法研究很少。实际中的数据一般含有噪声点或信息含量低的样例,如果对数据集先筛选出代表样例,再对筛选的样例集进行数据挖掘便会降低挖掘计算量。本文基于模糊粗糙集,先根据样例的模糊下近似值对样例进行筛选,然后利用筛选样例的模糊粗糙信息熵构造特征选择的评估度量,并给出相应的特征选择算法,从而降低了算法的计算复杂度。数值试验表明该快速算法具有有效性,并且对控制筛选样例个数的参数给出了建议。  相似文献   

8.
针对传统聚类算法中只注重数据间的距离关系,而忽视数据全局性分布结构的问题,提出一种基于EK-medoids聚类和邻域距离的特征选择方法。首先,用稀疏重构的方法计算数据样本之间的有效距离,构建基于有效距离的相似性矩阵;然后,将相似性矩阵应用到K-medoids聚类算法中,获取新的聚类中心,进而提出EK-medoids聚类算法,可有效对原始数据集进行聚类;最后,根据划分结果所构成簇的邻域距离给出确定数据集中的属性重要度定义,应用启发式搜索方法设计一种EK-medoids聚类和邻域距离的特征选择算法,降低了聚类算法的时间复杂度。实验结果表明,该算法不仅有效地提高了聚类结果的精度,而且也可选择出分类精度较高的特征子集。  相似文献   

9.
Rough set theory has been proven to be an effective tool to feature subset selection. Current research usually employ hill-climbing as search strategy to select feature subset. However, they are inadequate to find the optimal feature subset since no heuristic can guarantee optimality. Due to this, many researchers study stochastic methods. Since previous works of combination of genetic algorithm and rough set theory do not show competitive performance compared with some other stochastic methods, we propose a hybrid genetic algorithm for feature subset selection in this paper, called HGARSTAR. Different from previous works, HGARSTAR embeds a novel local search operation based on rough set theory to fine-tune the search. This aims to enhance GA’s intensification ability. Moreover, all candidates (i.e. feature subsets) generated in evolutionary process are enforced to include core features to accelerate convergence. To verify the proposed algorithm, experiments are performed on some standard UCI datasets. Experimental results demonstrate the efficiency of our algorithm.  相似文献   

10.
基于粗糙集与蚁群优化算法的特征选择方法研究*   总被引:1,自引:0,他引:1  
已有的基于蚁群优化算法的特征选择方法是从随机点出发,寻找最优的特征组合。讨论和分析了粗糙集理论中的特征核思想,结合蚁群优化算法的全局寻优特点,以特征重要度作为启发式搜索信息,提出从特征核出发基于粗糙集理论与蚁群优化的特征选择算法,简化蚁群完全图搜索的规模。在标准UCI数据集上进行测试,实验验证了新算法对于特征选择的有效性。  相似文献   

11.
孙林  赵婧  徐久成  王欣雅 《计算机应用》2022,42(5):1355-1366
针对经典的帝王蝶优化(MBO)算法不能很好地处理连续型数据,以及粗糙集模型对于大规模、高维复杂的数据处理能力不足等问题,提出了基于邻域粗糙集(NRS)和MBO的特征选择算法。首先,将局部扰动和群体划分策略与MBO算法结合,并构建传输机制以形成一种二进制MBO(BMBO)算法;其次,引入突变算子增强算法的探索能力,设计了基于突变算子的BMBO(BMBOM)算法;然后,基于NRS的邻域度构造适应度函数,并对初始化的特征子集的适应度值进行评估并排序;最后,使用BMBOM算法通过不断迭代搜索出最优特征子集,并设计了一种元启发式特征选择算法。在基准函数上评估BMBOM算法的优化性能,并在UCI数据集上评价所提出的特征选择算法的分类能力。实验结果表明,在5个基准函数上,BMBOM算法的最优值、最差值、平均值以及标准差明显优于MBO和粒子群优化(PSO)算法;在UCI数据集上,与基于粗糙集的优化特征选择算法、结合粗糙集与优化算法的特征选择算法、结合NRS与优化算法的特征选择算法、基于二进制灰狼优化的特征选择算法相比,所提特征选择算法在分类精度、所选特征数和适应度值这3个指标上表现良好,能够选择特征数少且分类精度高的最优特征子集。  相似文献   

12.
模糊决策粗糙集是决策粗糙集理论在模糊集环境下的重要延伸,然而该模型对含噪声的数据不具有很好的容忍性。为此在传统的模糊相似关系中引入一个限定阈值,提出一种改进的模糊相似关系。在其基础上对原始的模糊决策粗糙集进行重构,提出一种改进的模糊决策粗糙集模型。根据不同的特征选择方式,利用所提出的改进模型设计出两种搜索策略的最小化决策代价特征选择算法。实验分析表明,该算法比传统算法具有更高的优越性。  相似文献   

13.
数值型不完备信息系统的特征选择方法大多是以容差关系为基础,但是这种处理方式存在数据相似性刻画过于宽松的缺陷.文中提出邻域量化容差关系的粗糙集模型,在该模型的基础上定义邻域量化容差条件熵,分析相关性质,根据邻域量化容差条件熵的单调性构造相应的特征选择算法.实验表明,文中算法在特征选择结果、运行时间和分类精度方面具有优越性.  相似文献   

14.
粗糙集理论是一种有效的信息处理工具,属性约简是粗糙集理论研究的一个核心内容。为了能够较为有效地获得不相容决策表较优的属性约简,在对文献[7]中属性约简算法分析的基础上,根据不相容决策表约简不改变决策表正域的原则,仅考虑相对差异比较表中与正域相关的实例对,同时结合属性重要性作为特征选取的启发式信息,提出了一种改进的启发式属性约简算法。该算法在不增加算法时间复杂度的前提下能够处理不相容决策表。最后,通过实例完整演示了该方法,表明该算法是有效的。  相似文献   

15.
粗糙集理论是一种有效的信息处理工具,属性约简是粗糙集理论研究的一个核心内容.为了能够较为有效地获得不相容决策表较优的属性约简,在对文献[7]中属性约简算法分析的基础上,根据不相容决策表约简不改变决策表正域的原则,仅考虑相对差异比较表中与正域相关的实例对,同时结合属性重要性作为特征选取的启发式信息,提出了一种改进的启发式属性约简算法.该算法在不增加算法时间复杂度的前提下能够处理不相容决策表.最后,通过实例完整演示了该方法,表明该算法是有效的.  相似文献   

16.
The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.  相似文献   

17.
作为粗糙集理论的一个核心内容,属性约简致力于根据给定的约束条件删除数据中的冗余属性。基于贪心策略的启发式算法是求解约简的一种有效手段,这一手段通常使用数据中的全部样本来度量属性的重要度从而进一步得到约简子集。但实际上,不同样本对于属性重要度计算的贡献是不同的,有些样本对重要度贡献不高甚至几乎没有贡献,且当数据中的样本数过大时,利用全部样本进行约简求解会使得时间消耗过大而难以接受。为了解决这一问题,提出了一种基于一致性样本的属性约简策略。具体算法大致由3个步骤组成,首先,将满足一致性原则的样本挑选出来;其次,将这些选中的样本组成新的决策系统;最后,利用启发式框架在新的决策系统中求解约简。实验结果表明:与基于聚类采样的属性约简算法相比,所提方法能够提供更高的分类精度。  相似文献   

18.
Solving the feature selection problem is considered an important issue when addressing data from real applications that contain a large number of features. However, not all of these features are important; therefore, the redundant features must be removed because they affect the accuracy of the data representation and introduce time complexity into the analysis of these data. For these reasons, the feature selection problem is considered an NP-complete nonlinearly constrained optimization problem. The rough set (RS) and neighborhood rough set (NRS) are the most powerful methods used to solve the feature selection problem; however, both approaches suffer from high time complexity. To avoid these limitations, we combined the RS and NRS with a new metaheuristic algorithm called the runner-root algorithm (RRA). The spirit of the RRA originated from real-life plants called running plants, which have roots and runners that spread the plants in search of minerals and water resources through their root and runner development. To validate the proposed algorithm, several UCI Machine Learning Repository datasets are used to compute the performance of our algorithm employing two effective classifiers, the random forest and the K-nearest neighbor, in addition to some other measures for the performance evaluation. The experimental results illustrate that the proposed algorithm is superior to the state-of-the-art metaheuristic algorithms in terms of the performance measures. Additionally, the NRS increases the performance of the proposed method more than the RS as an objective function.  相似文献   

19.
本文在基于粗糙集理论的最小差异表MDL上,使用增量方式构造了与MDL相类似的简单差异矩阵SDM,以SDM近似约简集为起点对属性子集空间进行前向搜索,提出了一种基于粗糙集的混合特征选择算法。该算法大大提高了特征选择的效率和准确性,适用于数据挖掘的预处理过程。  相似文献   

20.
多标签数据广泛存在于现实世界中,多标签特征选择是多标签学习中重要的预处理步骤.基于模糊粗糙集模型,研究人员已经提出了一些多标签特征选择算法,但是这些算法大多没有关注标签之间的共现特性.为了解决这一问题,基于样本标签间的共现关系评价样本在标签集下的相似关系,利用这种关系定义了特征与标签之间的模糊互信息,并结合最大相关与最小冗余原则设计了一种多标签特征选择算法LC-FS.在5个公开数据集上进行了实验,实验结果表明了所提算法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号