首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
一种基于CHI值特征选取的粗糙集文本分类规则抽取方法   总被引:7,自引:1,他引:6  
王明春  王正欧  张楷  郝玺龙 《计算机应用》2005,25(5):1026-1028,1033
结合文本分类规则抽取的特点,给出了近似规则的定义。该方法首先利用CHI值进行特征选取并为下一步特征选取提供特征重要性信息,然后使用粗糙集对离散决策表继续进行特征选取,最后用粗糙集抽取出精确规则或近似规则。该方法将CHI值特征选取和粗糙集理论充分结合,避免了用粗糙集对大规模决策表进行特征约简,同时避免了决策表的离散化。该方法提高了文本规则抽取的效率,并使其更趋实用化。实验结果表明了这种方法的有效性和实用性。  相似文献   

2.
孙林  赵婧  徐久成  王欣雅 《计算机应用》2022,42(5):1355-1366
针对经典的帝王蝶优化(MBO)算法不能很好地处理连续型数据,以及粗糙集模型对于大规模、高维复杂的数据处理能力不足等问题,提出了基于邻域粗糙集(NRS)和MBO的特征选择算法。首先,将局部扰动和群体划分策略与MBO算法结合,并构建传输机制以形成一种二进制MBO(BMBO)算法;其次,引入突变算子增强算法的探索能力,设计了基于突变算子的BMBO(BMBOM)算法;然后,基于NRS的邻域度构造适应度函数,并对初始化的特征子集的适应度值进行评估并排序;最后,使用BMBOM算法通过不断迭代搜索出最优特征子集,并设计了一种元启发式特征选择算法。在基准函数上评估BMBOM算法的优化性能,并在UCI数据集上评价所提出的特征选择算法的分类能力。实验结果表明,在5个基准函数上,BMBOM算法的最优值、最差值、平均值以及标准差明显优于MBO和粒子群优化(PSO)算法;在UCI数据集上,与基于粗糙集的优化特征选择算法、结合粗糙集与优化算法的特征选择算法、结合NRS与优化算法的特征选择算法、基于二进制灰狼优化的特征选择算法相比,所提特征选择算法在分类精度、所选特征数和适应度值这3个指标上表现良好,能够选择特征数少且分类精度高的最优特征子集。  相似文献   

3.
核属性蚁群算法的规则获取   总被引:1,自引:0,他引:1  
蚁群算法是一种新型的模拟进化算法,研究已经表明该算法具有许多优良的性质,并且在优化计算中已得到了很多应用.粗糙集理论作为一种智能数据分析和数据挖掘的新的数学工具,其主要优点在于它不需要任何关于被处理数据的先验或额外知识.本文从规则获取和优化两方面研究基于粗糙集理论和蚁群算法的分类规则挖掘方法.通过研究决策表和决策规则系数,建立基于粗糙集表示和度量的知识理论,将粗糙集理论与蚁群算法融合,采用粗糙集理论进行属性约简,利用蚁群算法获取最优分类规则,优势互补.实验结果比较表明,算法获取的分类规则,具有良好的预测能力和更为简洁的表示形式.  相似文献   

4.
基于值约简和决策树的最简规则提取算法   总被引:7,自引:0,他引:7  
罗秋瑾  陈世联 《计算机应用》2005,25(8):1853-1855
粗糙集理论中的值约简和数据挖掘领域中的决策树都是有效的分类方法,但二者都有其局限性。将这两种方法结合起来,生成一种新的基于值核的极小化方法对决策树进行修剪,提出了约简规则的判定准则,缩小了约简的范围,最后再对生成的规则进行极大化处理,以保证规则覆盖信息的一致性,实验验证了该算法的有效性。  相似文献   

5.
现有的混合信息系统知识发现模型涵盖的数据类型大多为符号型、数值型条件属性及符号型决策属性,且大多数模型的关注点是属性约简或特征选择,针对规则提取的研究相对较少。针对涵盖更多数据类型的混合信息系统构建一个动态规则提取模型。首先修正了现有的属性值距离的计算公式,对错层型属性值的距离给出了一种定义形式,从而定义了一个新的混合距离。其次提出了针对数值型决策属性诱导决策类的3种方法。其后构造了广义邻域粗糙集模型,提出了动态粒度下的上下近似及规则提取算法,构建了基于邻域粒化的动态规则提取模型。该模型可用于具有以下特点的信息系统的规则提取: (1)条件属性集可包括单层符号型、错层符号型、数值型、区间型、集值型、未知型等; (2)决策属性集可包括符号型、数值型。利用UCI数据库中的数据集进行了对比实验,分类精度表明了规则提取算法的有效性。  相似文献   

6.
The incremental technique is a way to solve the issue of added-in data without re-implementing the original algorithm in a dynamic database. There are numerous studies of incremental rough set based approaches. However, these approaches are applied to traditional rough set based rule induction, which may generate redundant rules without focus, and they do not verify the classification of a decision table. In addition, these previous incremental approaches are not efficient in a large database. In this paper, an incremental rule-extraction algorithm based on the previous rule-extraction algorithm is proposed to resolve there aforementioned issues. Applying this algorithm, while a new object is added to an information system, it is unnecessary to re-compute rule sets from the very beginning. The proposed approach updates rule sets by partially modifying the original rule sets, which increases the efficiency. This is especially useful while extracting rules in a large database.  相似文献   

7.
基于粗糙集理论的图像分割智能决策方法   总被引:4,自引:0,他引:4       下载免费PDF全文
尽管如今已有多种图像分割算法,但是没有任何一种分割方法能够适用于所有的图像.为了使图像跟踪系统能根据图像特征自适应选取分割算法,给出了一种基于粗糙集理论的图像分割智能决策方法.该方法首先选取若干具代表性的分割算法构成算法库,并用它们对各种样本图像进行分割;然后利用从样本图像中提取出来的各种数值特征,并根据图像分割质量评价标准评判出各样本图像的最优分割算法,用其构成决策信息表;最后应用粗糙集理论来对决策信息表进行离散化处理和属性约简,以生成图像分割算法选取的决策规则.该决策方法解决了图像跟踪系统中分割算法选取的一系列难题.实验证明,该决策方法能比较有效地根据系统所处理图像的特征选取出算法库中最优的分割算法,并可满足车载图像跟踪系统的实时性要求.  相似文献   

8.
We present a data mining method which integrates discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated. Numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. The horizontal reduction is done by merging identical tuples after substituting an attribute value by its higher level value in a pre- defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples we consider further in the database(s). In the second phase, a novel context- sensitive feature merit measure is used to rank features, a subset of relevant attributes is chosen, based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without changing the interdependence relationships between the classes and the attributes. Finally, the tuples in the reduced relation are transformed into different knowledge rules based on different knowledge discovery algorithms. Based on these principles, a prototype knowledge discovery system DBROUGH-II has been constructed by integrating discretization, generalization, rough set feature selection and a variety of data mining algorithms. Tests on a telecommunication customer data warehouse demonstrates that different kinds of knowledge rules, such as characteristic rules, discriminant rules, maximal generalized classification rules, and data evolution regularities, can be discovered efficiently and effectively.  相似文献   

9.
根据医学图像数据的特性,提出一种基于粗糙集和决策树相结合的数据挖掘新方法。该方法利用粗糙集中基于属性重要性的离散化方法对医学图像特征进行离散化,采用粗糙集对其属性进行约简,得到低维训练数据,再用SLIQ决策树算法产生决策规则。实验表明:将粗糙理论与SLIQ相结合的数据挖掘方法既保留了原始数据的内部特点,同时剔除了与分类无关或关系不大的冗余特征,从而提高了分类的准确率和效率。  相似文献   

10.
刘洋  张卓  周清雷 《计算机科学》2014,41(12):164-167
医疗健康数据通常属性较多,且存在连续型、离散型并存的混合数据,这在很大程度上限制了知识发现方法对医疗健康数据的挖掘效率。以模糊粗糙集理论为基础,研究混合数据上的分类规则挖掘方法,通过引入规则获取算法的泛化阈值,来控制获取规则集的大小和复杂程度,提高粗糙集知识发现方法在医疗健康数据上的分类效率。最后通过对比实验验证了该算法在医疗决策表上挖掘规则的有效性。  相似文献   

11.
在多标记学习中,数据降维是一项重要且具有挑战性的任务,而特征选择又是一种高效的数据降维技术。在邻域粗糙集理论的基础上提出一种多标记专属特征选择方法,该方法从理论上确保了所得到的专属特征与相应标记具有较强的相关性,进而改善了约简效果。首先,该方法运用粗糙集理论的约简算法来减少冗余属性,在保持分类能力不变的情况下获得标记的专属特征;然后,在邻域精确度和邻域粗糙度概念的基础上,重新定义了基于邻域粗糙集的依赖度与重要度的计算方法,探讨了该模型的相关性质;最后,构建了一种基于邻域粗糙集的多标记专属特征选择模型,实现了多标记分类任务的特征选择算法。在多个公开的数据集上进行仿真实验,结果表明了该算法是有效的。  相似文献   

12.
Attribute selection is one of the important problems encountered in pattern recognition, machine learning, data mining, and bioinformatics. It refers to the problem of selecting those input attributes or features that are most effective to predict the sample categories. In this regard, rough set theory has been shown to be successful for selecting relevant and nonredundant attributes from a given data set. However, the classical rough sets are unable to handle real valued noisy features. This problem can be addressed by the fuzzy-rough sets, which are the generalization of classical rough sets. A feature selection method is presented here based on fuzzy-rough sets by maximizing both relevance and significance of the selected features. This paper also presents different feature evaluation criteria such as dependency, relevance, redundancy, and significance for attribute selection task using fuzzy-rough sets. The performance of different rough set models is compared with that of some existing feature evaluation indices based on the predictive accuracy of nearest neighbor rule, support vector machine, and decision tree. The effectiveness of the fuzzy-rough set based attribute selection method, along with a comparison with existing feature evaluation indices and different rough set models, is demonstrated on a set of benchmark and microarray gene expression data sets.  相似文献   

13.
The dominance-based rough set approach is proposed as a methodology for plunge grinding process diagnosis. The process is analyzed and next its diagnosis is considered as a multi-criteria decision making problem based on the modelling of relationships between different process states and their symptoms using a set of rules induced from measured process data. The development of the diagnostic system is characterized by three phases. Firstly, the process experimental data is prepared in the form of a decision table. Using selected methods of signal processing, each process running is described by 17 process state features (condition attributes) and 5 criteria evaluating process state and results (decision attributes). The semantic correlation between all the attributes is modelled. Next, the phase of condition attributes selection and knowledge extraction are strictly integrated with the phase of the model evaluation using an iterative approach. After each loop of the iterative feature selection procedure the induction of rules is conducted using the VC-DomLEM algorithm. The classification capability of the induced rules is carried out using the leave-one-out method and a set of measures. The classification accuracy of individual models is in the range of 80.77–98.72 %. The induced set of rules constitutes a classifier for an assessment of new process run cases.  相似文献   

14.
对医学图像进行分类时,特征选择是影响分类准确率的非常重要的因素。针对医学图像的特殊性,以及目前提出的特征选择算法在应用于医学图像分类时效果不够理想等问题,提出一种基于邻域关系的模糊粗糙集模型,基于该模型给出特征选择算法,并将其应用于乳腺X光图像。实验结果表明,同已有的算法相比,该方法能有效选择特征,分类精度有较大的提升。  相似文献   

15.
高飞  周学广  孙艳 《计算机工程》2012,38(10):63-66
针对话题分类文本训练集少、主题相似度大的特点,提出一种基于关联规则和粗糙集的话题特征提取方法。在向量空间模型的基础上,采用挖掘关联规则的方式生成规则集与文本主体,通过调节事务主体的最小支持度与最小置信度查找不同颗粒层次的话题,利用粗糙集理论对词语特征与关联特征进行属性约简。实验结果表明,该方法能提取文本集中描述的评论主题,具有较高的话题分类准确率。  相似文献   

16.
针对不一致决策系统中的规则提取问题,提出一种协调规则提取算法。在粗糙集背景下粒计算描述的基础上,由对象所在的条件信息粒与目标概念的包含度定义对象关于目标概念的隶属度,扩展传统的粗糙近似。给出不一致获取协调规则的算法描述及其时间复杂度。对比分析及说明性算例验证了该算法的有效性和可行性。  相似文献   

17.
Feature selection (attribute reduction) from large-scale incomplete data is a challenging problem in areas such as pattern recognition, machine learning and data mining. In rough set theory, feature selection from incomplete data aims to retain the discriminatory power of original features. To address this issue, many feature selection algorithms have been proposed, however, these algorithms are often computationally time-consuming. To overcome this shortcoming, we introduce in this paper a theoretic framework based on rough set theory, which is called positive approximation and can be used to accelerate a heuristic process for feature selection from incomplete data. As an application of the proposed accelerator, a general feature selection algorithm is designed. By integrating the accelerator into a heuristic algorithm, we obtain several modified representative heuristic feature selection algorithms in rough set theory. Experiments show that these modified algorithms outperform their original counterparts. It is worth noting that the performance of the modified algorithms becomes more visible when dealing with larger data sets.  相似文献   

18.
一种基于粗糙集理论的最简决策规则挖掘算法   总被引:1,自引:2,他引:1       下载免费PDF全文
钱进  孟祥萍  刘大有  叶飞跃 《控制与决策》2007,22(12):1368-1372
研究粗糙集理论中可辨识矩阵,扩展了类别特征矩阵,提出一种基于粗糙集理论的最筒决策规则算法.该算法根据决策属性将原始决策表分成若干个等价子决策表.借助核属性和属性频率函数对各类别特征矩阵挖掘出最简决策规则.与可辨识矩阵相比,采用类别特征矩阵可有效减少存储空间和时间复杂度。增强规则的泛化能力.实验结果表明,采用所提出的算法获得的规则更为简洁和高效.  相似文献   

19.
Rough set reduction has been used as an important preprocessing tool for pattern recognition, machine learning and data mining. As the classical Pawlak rough sets can just be used to evaluate categorical features, a neighborhood rough set model is introduced to deal with numerical data sets. Three-way decision theory proposed by Yao comes from Pawlak rough sets and probability rough sets for trading off different types of classification error in order to obtain a minimum cost ternary classifier. In this paper, we discuss reduction questions based on three-way decisions and neighborhood rough sets. First, the three-way decision reducts of positive region preservation, boundary region preservation and negative region preservation are introduced into the neighborhood rough set model. Second, three condition entropy measures are constructed based on three-way decision regions by considering variants of neighborhood classes. The monotonic principles of entropy measures are proved, from which we can obtain the heuristic reduction algorithms in neighborhood systems. Finally, the experimental results show that the three-way decision reduction approaches are effective feature selection techniques for addressing numerical data sets.  相似文献   

20.
为了在处理噪声数据时获得更可靠的分类规则,提出了一种粗糙规则挖掘算法.通过粗糙规则集的不确定量度,在变精度粗糙集理论下近似约简分析的基础上,引入了信息熵,建立了变精度意义下的决策表的度量方式.利用离散粒子群算法,提出一种基于粒子群优化的粗糙集知识的近似约简算法,导出了粗糙规则集.经过实例分析说明,这种算法不但具有一定的噪声容忍度,而且该算法得到的规则具有较高的正确度和覆盖度,从而保证分类的准确性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号