首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
We present a data mining method which integrates discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated. Numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. The horizontal reduction is done by merging identical tuples after substituting an attribute value by its higher level value in a pre- defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples we consider further in the database(s). In the second phase, a novel context- sensitive feature merit measure is used to rank features, a subset of relevant attributes is chosen, based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without changing the interdependence relationships between the classes and the attributes. Finally, the tuples in the reduced relation are transformed into different knowledge rules based on different knowledge discovery algorithms. Based on these principles, a prototype knowledge discovery system DBROUGH-II has been constructed by integrating discretization, generalization, rough set feature selection and a variety of data mining algorithms. Tests on a telecommunication customer data warehouse demonstrates that different kinds of knowledge rules, such as characteristic rules, discriminant rules, maximal generalized classification rules, and data evolution regularities, can be discovered efficiently and effectively.  相似文献   

2.
现有的混合信息系统知识发现模型涵盖的数据类型大多为符号型、数值型条件属性及符号型决策属性,且大多数模型的关注点是属性约简或特征选择,针对规则提取的研究相对较少。针对涵盖更多数据类型的混合信息系统构建一个动态规则提取模型。首先修正了现有的属性值距离的计算公式,对错层型属性值的距离给出了一种定义形式,从而定义了一个新的混合距离。其次提出了针对数值型决策属性诱导决策类的3种方法。其后构造了广义邻域粗糙集模型,提出了动态粒度下的上下近似及规则提取算法,构建了基于邻域粒化的动态规则提取模型。该模型可用于具有以下特点的信息系统的规则提取: (1)条件属性集可包括单层符号型、错层符号型、数值型、区间型、集值型、未知型等; (2)决策属性集可包括符号型、数值型。利用UCI数据库中的数据集进行了对比实验,分类精度表明了规则提取算法的有效性。  相似文献   

3.
提出了一种粗糙小波网络分类器的模型。其过程为:利用粗糙集理论获取分类知识,根据训练样本属性值离散化、属性约简和值约简来构造粗糙小波网络分类器。该分类器可以有效地克服粗糙集规则匹配方法抗噪声能力和规则泛化能力差的缺点;同时可简化小波网络的结构,加快网络的训练速度。并详细介绍了该分类器用于入侵数据识别的步骤和仿真实验结果。  相似文献   

4.
一种基于CHI值特征选取的粗糙集文本分类规则抽取方法   总被引:7,自引:1,他引:6  
王明春  王正欧  张楷  郝玺龙 《计算机应用》2005,25(5):1026-1028,1033
结合文本分类规则抽取的特点,给出了近似规则的定义。该方法首先利用CHI值进行特征选取并为下一步特征选取提供特征重要性信息,然后使用粗糙集对离散决策表继续进行特征选取,最后用粗糙集抽取出精确规则或近似规则。该方法将CHI值特征选取和粗糙集理论充分结合,避免了用粗糙集对大规模决策表进行特征约简,同时避免了决策表的离散化。该方法提高了文本规则抽取的效率,并使其更趋实用化。实验结果表明了这种方法的有效性和实用性。  相似文献   

5.
基于粗糙集的关联规则挖掘方法   总被引:1,自引:0,他引:1  
对粗糙集进行了相关研究,并提出一种以粗糙集理论为基础的关联规则挖掘方法,该方法首先利用粗糙集的特征属性约简算法进行属性约简,然后在构建约简决策表的基础上应用改进的Apriori算法进行关联规则挖掘。该方法的优势在于消除了不重要的属性,减少了属性数目和候选项集数量,同时只需一次扫描决策表就可产生决策规则。应用实例及实验结果分析表明该方法是一种有效而且快速的关联规则挖掘方法。  相似文献   

6.
Fuzzy rough set is a generalization of crisp rough set to deal with data sets with real value attributes. A primary use of fuzzy rough set theory is to perform attribute reduction for decision systems with numerical conditional attribute values and crisp (symbolic) decision attributes. In this paper we define inconsistent fuzzy decision system and their reductions, and develop discernibility matrix-based algorithms to find reducts. Finally, two heuristic algorithms are developed and comparison study is provided with the existing algorithms of attribute reduction with fuzzy rough sets. The proposed method in this paper can deal with decision systems with numerical conditional attribute values and fuzzy decision attributes rather than crisp ones. Experimental results imply that our algorithm of attribute reduction with general fuzzy rough sets is feasible and valid.  相似文献   

7.
针对冗余属性和不相关属性过多对肺部肿瘤诊断的影响以及Pawlak粗糙集只适合处理离散变量而导致原始信息大量丢失的问题,提出混合信息增益和邻域粗糙集的肺部肿瘤高维特征选择算法(Information gain-neighborhood rough set-support vector machine,IG-NRS-SVM)。该算法首先提取3 000例肺部肿瘤CT图像的104维特征构造决策信息表,借助信息增益结果选出高相关的特征子集,再通过邻域粗糙集剔除高冗余的属性,通过两次属性约简得到最优的特征子集,最后采用网格寻优算法优化的支持向量机构建分类识别模型进行肺部肿瘤良恶性的鉴别。从约简和分类识别两个角度验证方法的可行性与有效性,并与不约简算法、Pawlak粗糙集、信息增益和邻域粗糙集约简算法进行对比。结果表明混合算法精确度优于其他对比算法,精确度达到96.17%,并且有效降低了时间复杂度,对肺部肿瘤计算机辅助诊断具有一定的参考价值。  相似文献   

8.
传统的粗糙集理论主要是针对单层次决策表的属性约简和决策规则获取研究.然而,现实中树型结构的属性值分类是普遍存在的.针对条件属性具有属性值分类的情况,结合全子树泛化模式,提出一种多层次粗糙集模型,分析决策表在不同层次泛化空间下相关性质.结合基于正区域的属性约简理论,提出属性值泛化约简概念讨论二者之间的关系,同时证明求解泛化约简是一个NP Hard问题.为此,提出一种基于正区域的的启发式泛化约简算法,该算法采用自顶向下逐步细化搜索策略,能够在保持原始决策表正区域不改变的前提下,将决策表所有属性值泛化到最佳层次.理论分析和仿真实验表明,泛化约简方法能提高知识发现的层次和泛化能力.  相似文献   

9.
连续属性决策表离散化的图论方法   总被引:1,自引:0,他引:1  
通过研究粗糙集与图论的关系,提出了以集合为权的加权多重完全多部图的概念,得到了加权完全多部图与连续属性决策表的映射关系,给出了断点重要性和断点效率的一种新的量化定义并得到了相关性质;提出了连续属性决策表信息系统的图论形式和连续属性决策表离散化的图论方法。编程实验结果证明,应用此方法可以确保在离散化后决策表相容的前提下得到无剩余属性值的较小的断点集合。  相似文献   

10.
基于遗传算法和模糊粗糙集的知识约简   总被引:4,自引:0,他引:4  
朱江华  李海波  潘丰 《计算机仿真》2007,24(1):86-89,119
虽然粗糙集理论为处理离散属性提供了很好的工具,但它不能直接运用于具有连续变量的数据上面,而现实中的数据又包含着大量的连续变量.为了能够对连续属性集进行有效的知识约简,充分利用遗传算法的全局优化和并行计算的优点,结合模糊粗糙集的理论,对连续属性集进行知识约简,较粗糙集而言避开了连续属性的离散化过程,减少了信息损失,加快了约简速度,提高了决策支持度.首先利用一个仿真实例来验证该算法的有效性和快速性,然后把它运用于某一柴油机的故障数据集的约简,通过约简获得了影响输出故障模式的主要输入变量集,实现了数据的预处理,为进行柴油机的故障模式诊断提供了先决条件.  相似文献   

11.
一种基于粗糙集理论的最简决策规则挖掘算法   总被引:1,自引:2,他引:1       下载免费PDF全文
钱进  孟祥萍  刘大有  叶飞跃 《控制与决策》2007,22(12):1368-1372
研究粗糙集理论中可辨识矩阵,扩展了类别特征矩阵,提出一种基于粗糙集理论的最筒决策规则算法.该算法根据决策属性将原始决策表分成若干个等价子决策表.借助核属性和属性频率函数对各类别特征矩阵挖掘出最简决策规则.与可辨识矩阵相比,采用类别特征矩阵可有效减少存储空间和时间复杂度。增强规则的泛化能力.实验结果表明,采用所提出的算法获得的规则更为简洁和高效.  相似文献   

12.
Rough Set理论中连续属性的离散化方法   总被引:95,自引:0,他引:95  
苗夺谦 《自动化学报》2001,27(3):296-302
Rough Set(RS)理论是一种新的处理不精确、不完全与不相容知识的数学工具.传 统的RS理论只能对数据库中的离散属性进行处理,而绝大多数现实的数据库既包含了离散 属性,又包含了连续属性.文中针对传统RS理论的这一缺陷,利用决策表相容性的反馈信 息,提出了一种领域独立的基于动态层次聚类的连续属性离散化算法.该方法为RS理论处 理离散与连续属性提供了一种统一的框架,从而极大地拓广了RS理论的应用范围.通过一 些例子将本算法与现有方法进行了比较分析,得到了令人鼓舞的结果.  相似文献   

13.
带Rough算子的决策规则及数据挖掘中的软计算   总被引:28,自引:3,他引:25  
文中讨论决策规则及其与演绎推理中的假言推理规则之间的关系,通过数据挖掘中的软计算使决策表中的属性简化和性值区间化,从而找到一种具有广泛表达能力的数据隐含格式,从中选择有代表性的,并删去冗余或过剩的规则,并保持决策表的原有用途和的有性能,我们通过开发一个中医诊疗专家系统的实例说明了这种软计算的过程,并分别用于统计或专家计算带可信度因子的产生式规则和基于Rough集方法计算带Rough算子的决策规则两  相似文献   

14.
Data Mining in Large Databases Using Domain Generalization Graphs   总被引:5,自引:0,他引:5  
Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.  相似文献   

15.
When symbolic AI approaches are applied to handle continuous valued attributes, there is a requirement to transform the continuous attribute values to symbolic data. In this paper, a novel distribution-index-based discretizer is proposed for such a transformation. Based on definitions of dichotomic entropy and a compound distributional index, a simple criterion is applied to discretize continuous attributes adaptively. The dichotomic entropy indicates the homogeneity degree of the decision value distribution, and is applied to determine the best splitting point. The compound distributional index combines both the homogeneity degrees of attribute value distributions and the decision value distribution, and is applied to determine which interval should be split further; thus, a potentially improved solution of the discretization problem can be found efficiently. Based on multiple reducts in rough set theory, a multiknowledge approach can attain high decision accuracy for information systems with a large number of attributes and missing values. In this paper, our discretizer is combined with the multiknowledge approach to further improve decision accuracy for information systems with continuous attributes. Experimental results on benchmark data sets show that the new discretizer can improve not only the multiknowledge approach, but also the naive Bayes classifier and the C5.0 tree  相似文献   

16.
Relief is a measure of attribute quality which is often used for feature subset selection. Its use in induction of classification trees and rules, discretization, and other methods has however been hindered by its inability to suggest subsets of values of discrete attributes and thresholds for splitting continuous attributes into intervals. We present efficient algorithms for both tasks.  相似文献   

17.
This paper presents a hybrid soft computing modeling approach, a neurofuzzy system based on rough set theory and genetic algorithms (GA). To solve the curse of dimensionality problem of neurofuzzy system, rough set is used to obtain the reductive fuzzy rule set. Both the number of condition attributes and rules are reduced. Genetic algorithm is used to obtain the optimal discretization of continuous attributes. The fuzzy system is then represented via an equivalent artificial neural network (ANN). Because the initial parameter of the ANN is reasonable, the convergence of the ANN training is fast. After the rules are reduced, the structure size of the ANN becomes small, and the ANN is not fully weight-connected. The neurofuzzy approach based on RST and GA has been applied to practical application of building a soft sensor model for estimating the freezing point of the light diesel fuel in fluid catalytic cracking unit.  相似文献   

18.
The dominance-based rough set approach is proposed as a methodology for plunge grinding process diagnosis. The process is analyzed and next its diagnosis is considered as a multi-criteria decision making problem based on the modelling of relationships between different process states and their symptoms using a set of rules induced from measured process data. The development of the diagnostic system is characterized by three phases. Firstly, the process experimental data is prepared in the form of a decision table. Using selected methods of signal processing, each process running is described by 17 process state features (condition attributes) and 5 criteria evaluating process state and results (decision attributes). The semantic correlation between all the attributes is modelled. Next, the phase of condition attributes selection and knowledge extraction are strictly integrated with the phase of the model evaluation using an iterative approach. After each loop of the iterative feature selection procedure the induction of rules is conducted using the VC-DomLEM algorithm. The classification capability of the induced rules is carried out using the leave-one-out method and a set of measures. The classification accuracy of individual models is in the range of 80.77–98.72 %. The induced set of rules constitutes a classifier for an assessment of new process run cases.  相似文献   

19.
入侵检测数据往往含有大量的冗余、噪音特征及部分连续型属性,为了提高网络入侵检测的效果,利用邻域粗糙集对入侵检测数据集进行属性约简,消除冗余属性及噪声,也避免了传统粗糙集在连续型属性离散化过程中带来的信息损失;使用粒子群算法优化支持向量机的核函数参数和惩罚参数,以避免靠主观选择参数带来精度较低的风险,进一步提高入侵检测的性能。仿真实验结果表明,该算法能有效提高入侵检测的精度,具有较高的泛化性和稳定性。  相似文献   

20.
现实世界中常常包含着海量的、不完整的、模糊及不精确的数据或对象,使得模糊信息粒化成为近年来研究趋势。利用论域上的模糊等价关系定义了模糊粒度世界的模糊知识粒度,给出了新的属性约简条件和核属性计算方法,以便更好地挖掘出潜在的、有利用价值的信息。针对粗糙集在对连续属性约简的过程中容易造成信息缺失和不能对模糊属性处理的现象,提出了一种基于模糊知识粒度对混合决策系统约简的启发式算法,省去了连续属性离散化过程,减少了计算量,为离散值域和混合值域约简提供了统一的方法。最后通过实例验证了其有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号