首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Khiops: A Statistical Discretization Method of Continuous Attributes   总被引:6,自引:1,他引:6  
In supervised machine learning, some algorithms are restricted to discrete data and have to discretize continuous attributes. Many discretization methods, based on statistical criteria, information content, or other specialized criteria, have been studied in the past. In this paper, we propose the discretization method Khiops,1 based on the chi-square statistic. In contrast with related methods ChiMerge and ChiSplit, this method optimizes the chi-square criterion in a global manner on the whole discretization domain and does not require any stopping criterion. A theoretical study followed by experiments demonstrates the robustness and the good predictive performance of the method.  相似文献   

Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.  相似文献   

刘洋  张卓  周清雷 《计算机科学》2014,41(12):164-167
医疗健康数据通常属性较多,且存在连续型、离散型并存的混合数据,这在很大程度上限制了知识发现方法对医疗健康数据的挖掘效率。以模糊粗糙集理论为基础,研究混合数据上的分类规则挖掘方法,通过引入规则获取算法的泛化阈值,来控制获取规则集的大小和复杂程度,提高粗糙集知识发现方法在医疗健康数据上的分类效率。最后通过对比实验验证了该算法在医疗决策表上挖掘规则的有效性。  相似文献   

Rough集方法是一种处理不确定或模糊知识的重要工具。论文在现有的基于Rough集理论的缺省规则挖掘算法的基础上,将单属性信息增益概念扩充为多属性的情况,提出了基于信息增益的缺省规则的搜索策略和挖掘方法。实验表明,该方法能够发现简洁、易理解和实用的规则,同时具有较低的计算复杂性。  相似文献   

基于粗糙集理论的关联规则挖掘研究及应用   总被引:2,自引:0,他引:2  
提出了一种基于粗糙集理论的关联规则算法,使用粗糙集理论对数据进行预处理,同时使用属性限制避免挖掘无用的关联规则,挖掘出来的关联规则是分类规则,可以对未知数据进行分类;使用规则过滤去除冗余规则,只保留本质的、一般的规则。通过对网络安全审计数据的分析的试验表明,该方法是行之有效的。  相似文献   

This paper addresses an important problem related to the use ofinduction systems in analyzing real world data. The problem is thequality and reliability of the rules generated by the systems.~Wediscuss the significance of having a reliable and efficient rule quality measure. Such a measure can provide useful support ininterpreting, ranking and applying the rules generated by aninduction system. A number of rule quality and statistical measuresare selected from the literature and their performance is evaluatedon four sets of semiconductor data. The primary goal of thistesting and evaluation has been to investigate the performance ofthese quality measures based on: (i) accuracy, (ii) coverage, (iii)positive error ratio, and (iv) negative error ratio of the ruleselected by each measure. Moreover, the sensitivity of these qualitymeasures to different data distributions is examined. Inconclusion, we recommend Cohens statistic as being the best qualitymeasure examined for the domain. Finally, we explain some future workto be done in this area.  相似文献   

一种连续属性离散化的新方法   总被引:6,自引:0,他引:6  
提出了一种基于聚类方法、结合粗集理论的连续属性离散化方法。在粗集理论中有一个重要概念:属性重要度(Attribute significance),它常用来作为生成好的约简所采用的启发式评价函数。受此启发,在连续属性离散化方法中可把它用于属性选择,即从已离散化的属性集中选择出属性重要度最高的属性,再把它和待离散化的连续属性一起进行聚类学习,得到该连续属性的离散区间。文中介绍了该方法的算法描述,并通过实验与其他算法进行了比较。实验结果表明,由于这种方法在离散化过程中结合了粗集理论的思想,考虑了属性间的相互影响,从而产生了比较合理的划分点,提高了规则的分类精度。  相似文献   

基于粗糙集和贝叶斯理论的IT项目风险规则挖掘   总被引:2,自引:0,他引:2  
针对IT项目的风险决策过程中存在大量不确定、不完全信息等特征,本文在传统粗糙集的基础上,将贝叶斯理论引入到IT项目的风险管理中,提出了规则支持度、置信因子、覆盖因子等获取的相关算法,构建了基于粗糙集与贝叶斯理论相结合的风险规则挖掘模型,并通过实例对该模型进行了详细分析。  相似文献   

文章介绍了数据挖掘中常用技术和数据仓库结构.并且探讨了粗糙集方法,决策树方法以及关联规则方法等数据挖掘技术在保险风险规则挖掘中的应用。  相似文献   

针对传统数据挖掘中的“尖锐边界”问题,采用将模糊理论和关联规则挖掘技术相结合的思想,在改进传统Apriori算法的基础上,结合多层关联规则挖掘的方法,提出了一种模糊多层关联规则挖掘算法。对模糊多层关联规则挖掘的基本概念进行了定义,详细描述了模糊多层关联规则挖掘算法。最后用Visual FoxPro6.0语言实现了该算法程序,通过交易数据库挖掘实验表明算法是有效的。  相似文献   

基于Rough Set的数据预处理   总被引:2,自引:0,他引:2  
RoughSet理论是一种新的处理不精确,不完全与不相容知识的数学工具。数据预处理是数据挖掘中必不可少的步骤,处理的结果对下一步数据挖掘有直接影响。论文利用RoughSet一些特性对KDD99中的数据集进行处理,并且针对数据集的特点实现了对其进行数据离散化、属性约简等处理。通过这些处理过程为下一步的数据挖掘打下了基础。  相似文献   

提出一种基于信息论与集合论的基本理论相结合的方法,用来从数据库发现分类规则知识;利用该方法可以快速发现知识,且发现的知识简捷、可靠。  相似文献   

启发式知识获取方法研究   总被引:3,自引:0,他引:3  
归纳学习是解决知识自动获取的有效方法,针对ID3算法、基于粗集的归纳学习以及其它一些归纳学习方法存在的问题,提出了一种新的归纳学习算法ITIL。此算法用信息增益为启发式,选择尽量少的重要属性或组合,以可分辨性为依据提取规则,许多实例表明,这些规则不仅简单,而且冗余小,作为知识获取模块的一部分,ITIL已被集成到一个“基于知识发现的医疗诊断辅助系统”动态知识库子系统中。  相似文献   

针对粗糙集理论只能处理离散数据的局限,提出了基于决策的剥离式连续属性离散化方法,一改传统的候选断点集合的获取方法,直接通过分析连续属性在各决策类的取值范围和计算属性重要度,完成对连续属性的初步离散.此外,本文提出候选断点集的推移原则,可逐步减小候选断点集的范围.由于每次都是针对尚不能明确分类的样本进行细化,因此随着候选断点集的减少和明确分类样本的增加,系统能够迅速收敛,并且离散化后的决策表总是相容的,这与目前很多离散方法不考虑决策相容性相比,能够最大限度地保留系统的有用信息.本文提出的离散化方法是领域独立的,不需要领域知识,可应用于不同领域的连续属性的离散化.  相似文献   

基于粗糙集的飞行数据模式特征提取   总被引:4,自引:0,他引:4  
谢川  倪世宏  张宗麟 《计算机工程》2005,31(12):169-171
针对专家系统知识难以获取的问题,提出了利用粗糙集理论对飞行数据模式特征进行提取的方法,分别采用自组织映射方法对属性进行离散化、利用遗传算法进行属性约简,最后在处理结果中引入专家的领域先验知识。实验结果表明,这种方法是可行的。  相似文献   

三种差别矩阵的比较   总被引:8,自引:0,他引:8  
差别矩阵是Rough集理论中重要概念之一,使用差别矩阵可以计算决策表的核和约简。当前有多种定义差别矩阵的方法,导致差另q矩阵有多种定义的原因是决策表的不一致性。本文分析一致决策表和不一致决策表关系,给出将不一致决策表转换为一致决策表的方法,并给出差别矩阵的等价性定义。在此基础上,讨论并证明三种差别矩阵的关系,结果表明利用这种转换方法和等价性定义可以将三种差别矩阵统一起来,从而保证在实际应用中可以用统一方法隶构造差别矩阵。  相似文献   

粗糙集理论中求取最小决策规则的研究   总被引:1,自引:0,他引:1  
本文探讨了粗糙集理论中最小决策规则的求取方法,提出决策依赖度的定义,尝试从最短的条件属性组合中提取尽可能多的决策规则。只有现有长度的决策规则无法完全覆盖所有样本时,才会考虑增加决策规则的长度。同时提出了3种减少计算复杂性的方案:1)引入跳跃系数λ;2)在计算中只对具有相同决策值的样本进行等价类划分,从而避免了对含有不同决策值的等价类的无用划分;3)设计Remain集合,只针对其中的样本进行等价类的划分,随着Remain中样本数的减少,计算量会大幅下降。此外,本文所提出的基于决策依赖度的跳跃式决策规则求取方法可以直接应用于不完备信息系统,因此具有良好的实用价值。  相似文献   

基于Rough Set带结论域的关联规则挖掘   总被引:2,自引:0,他引:2  
论文构建了一种基于RoughSet(RS)带结论域的强关联规则挖掘模型,采用约简决策表和改进的Apriori算法来挖掘关联规则,提高了关联规则的挖掘效率和挖掘质量,提出并实现了带结论域的关联规则挖掘的解决方案。  相似文献   

A Bayesian Method for the Induction of Probabilistic Networks from Data   总被引:108,自引:3,他引:108  
This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computer-assisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. We extend the basic method to handle missing data and hidden (latent) variables. We show how to perform probabilistic inference by averaging over the inferences of multiple belief networks. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. Finally, we relate the methods in this paper to previous work, and we discuss open problems.  相似文献   

Artificial Intelligence (AI)-based rule induction techniques such as IXL and ID3 are powerful tools that can be used to classify firms as acquisition candidates or not, based on financial and other data. The purpose of this paper is to develop an expert system that employs uncertainty representation and predicts acquisition targets. We outline in this paper, the features of IXL, a machine learning technique that we use to induce rules. We also discuss how uncertainty is handled by IXL and describe the use of confidence factors. Rules generated by IXL are incorporated into a prototype expert system, ACQTARGET, which evaluates corporate acquisitions. The use of confidence factors in ACQTARGET allows investors to specifically incorporate uncertainties into the decision making process. A set of training examples comprising 65 acquired and 65 non-acquired real world firms is used to generate the rules and a separate holdout sample containing 32 acquired and 32 non-acquired real world firms is used to validate the expert system results. The performance of the expert system is also compared with a conventional discriminant analysis model and a logit model using the same data. The results show that the expert system, ACQTARGET, performs as well as the statistical models and is a useful evaluation tool to classify firms into acquisition and non-acquisition target categories. This rule induction technique can be a valuable decision aid to help financial analysts and investors in their buy/sell decisions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号