首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
基于聚类的数据预处理对模糊决策树产生的影响   总被引:1,自引:1,他引:0  
在模糊决策树归纳过程中,数据的模糊化预处理通常使用三角形隶属函数,该隶属函数的中心点参数将决定数据模糊化的效果,进而影响模糊决策树的执行效率、精度和规模。Kohonen'sfeature-maps聚类算法能够用来选取连续属性值的中心点。实验研究表明,该算法选取的中心点使模糊子集之间的覆盖范围不再相同,因而能够更合理地表示模糊概念之间的重叠关系。通过与其它算法比较证明该算法使模糊决策树可以获得更高的分类精度。  相似文献   

2.
本文针对现有的决策树分类算法中,存在若干影响运行效率的因素对避免这种重复等问题进行了探讨,提出了一种基于数据集决策树分类算法,算法使用了扫描一遍数据构建属性统计表组的方法,减小了由于连续属性值过多而使AVC过大导致无法放于内存的问题.实验验证测试了本文提出的改进后算法的有效性,从而降低了算法的时间复杂性  相似文献   

3.
针对传统C4.5算法存在容易产生冗余规则、决策树规模过大、分类速度过慢等问题,提出一种基于余弦相似度的改进C4.5决策树算法。计算每个属性的信息熵和增益率,如果任意属性的任意两个属性值的信息熵之差在一个很小范围内时,计算两个属性值的余弦相似度;合并相似度在阈值范围内的属性值,重新计算合并后属性的信息增益率,依据传统的C4.5算法进行计算。抽取某医院普检数据进行仿真,仿真结果表明,所提算法能够有效降低分裂属性维度,缩减了决策树规模,减少了冗余规则,提高了分类速度。  相似文献   

4.
在已有的多种决策树测试属性选择方法中,未见将属性值遗漏数据处理集成在测试属性选择过程中的报道, 而现有的属性值遗漏数据处理方法都会不同程度地带入偏置。基于此,提出了一种将基于联合墒的信息增益率作为 决策树测试属性选择标准的方法,用以在生成决策树的过程中消除值遗漏数据对测试属性选择的影响。在WEKA机 器平台上进行了对比实验,结果表明,改进算法能够从总体上提高算法的执行效率和分类精度。  相似文献   

5.
在已有的多种决策树测试属性选择方法中,来见将属性值遗漏数据处理集成在测试属性选择过程中的报道,而现有的属性值遗漏数据处理方法都会不同程度地带入偏置.基于此,提出了一种将基于联合熵的信息增益率作为决策树测试属性选择标准的方法,用以在生成决策树的过程中消除值遗漏数据对测试属性选择的影响.在WEKA机器平台上进行了对比实验,结果表明,改进算法能够从总体上提高算法的执行效率和分类精度.  相似文献   

6.
基于粗集的决策树构建的探讨   总被引:1,自引:0,他引:1  
杨宝华 《微机发展》2006,16(8):83-84
决策树是对未知数据进行分类预测的一种方法。自顶向下的决策树生成算法关键是对结点属性值的选择。近似精度是RS中描述信息系统模糊程度的参量,能够准确地刻画粗集。文中在典型的ID3算法的基础上提出了基于RS的算法。该算法基于近似精度大的属性选择根结点,分支由分类产生。该算法计算简单,且分类使决策树和粗集更易理解。  相似文献   

7.
决策树是对未知数据进行分类预测的一种方法。自顶向下的决策树生成算法关键是对结点属性值的选择。近似精度是RS中描述信息系统模糊程度的参量,能够准确地刻画粗集。文中在典型的ID3算法的基础上提出了基于RS的算法。该算法基于近似精度大的属性选择根结点,分支由分类产生。该算法计算简单,且分类使决策树和粗集更易理解。  相似文献   

8.
C4.5算法是用于生成决策树的一种经典算法,虽然其有很强的噪声处理能力,但当属性值缺失率高时,分类准确率会明显下降,而且该算法在构建决策树时,需要多次扫描、排序数据集、以及频繁调用对数,针对以上缺点,本文提出一种改进的分类算法.采用一种基于朴素贝叶斯定理方法,来处理空缺属性值,提高分类准确率.通过优化精简计算公式,在计算过程中,改进后的计算公式使用四则混合运算代替原来的对数运算,减少构建决策树的运行时间.为了验证该算法的性能,通过对UCI数据库中5个数据集进行实验,实验结果表明,改进后的算法极大的提高了运行效率.  相似文献   

9.
在数据挖掘中决策树方式是一种分类方法,决策树像是一个树形结构,在数据挖掘过程中要对其进行一定的测试,每个决策树上都有节点,每个节点就代表着类别.人们可以利用决策树来对数据进行分门别类,按照模型中属性测试结果找到合适的路径,然后把叶节点属性值进行记录得出最后的分类结果.阐述了决策树的基本概念,利用决策树算法挖掘数据,针对不同的算法选出最佳的方案,给相关人员提出合理化建议.  相似文献   

10.
阐明决策树分类器在用于分类的数据挖掘技术中依然重要,论述基于决策树归纳分类的ID3、C4.5算法,并且对决策属性的选取法则进行说明。通过实例解析ID3、C4.5算法实现过程,结果表明C4.5算法相比较于ID3算法的优越性.尤其在处理具有多属性值的数据时的更加合理和正确。  相似文献   

11.
The Default&Refine algorithm is a new rule-based learning algorithm that was developed as an accurate and efficient pronunciation prediction mechanism for speech processing systems. The algorithm exhibits a number of attractive properties including rapid generalisation from small training sets, good asymptotic accuracy, robustness to noise in the training data, and the production of compact rule sets. We describe the Default&Refine algorithm in detail and demonstrate its performance on two benchmarked pronunciation databases (the English OALD and Flemish FONILEX pronunciation dictionaries) as well as a newly-developed Afrikaans pronunciation dictionary. We find that the algorithm learns more efficiently (achieves higher accuracy on smaller data sets) than any of the alternative pronunciation prediction algorithms considered. In addition, we demonstrate the ability of the algorithm to generate an arbitrarily small rule set in such a way that the trade-off between rule set size and accuracy is well controlled. A conceptual comparison with alternative algorithms (including Dynamically Expanding Context, Transformation-Based Learning and Pronunciation by Analogy) clarifies the competitive performance obtained with Default&Refine.  相似文献   

12.
Our objective is a comparison of two data mining approaches to dealing with imbalanced data sets. The first approach is based on saving the original rule set, induced by the LEM2 (Learning from Example Module) algorithm, and changing the rule strength for all rules for the smaller class (concept) during classification. In the second approach, rule induction is split: the rule set for the larger class is induced by LEM2, while the rule set for the smaller class is induced by EXPLORE, another data mining algorithm. Results of our experiments show that both approaches increase the sensitivity compared to the original LEM2. However, the difference in performance of both approaches is statistically insignificant. Thus the appropriate approach for dealing with imbalanced data sets should be selected individually for a specific data set.  相似文献   

13.
The Fuzzy C-Means (FCM) algorithm is a widely used objective function-based clustering method exploited in numerous applications. In order to improve the quality of clustering algorithms, this study develops a novel approach, in which a transformed data-based FCM is developed. Two data transformation methods are proposed, using which the original data are projected in a nonlinear fashion onto a new space of the same dimensionality as the original one. Next, clustering is carried out on the transformed data. Two optimization criteria, namely a classification error and a reconstruction error, are introduced and utilized to guide the optimization of the performance of the new clustering algorithm and a transformation of the original data space. Unlike other data transformation methods that require some prior knowledge, in this study, Particle Swarm Optimization (PSO) is used to determine the optimal transformation realized on a basis of a certain performance index. Experimental studies completed for a synthetic data set and a number of data sets coming from the Machine Learning Repository demonstrate the performance of the FCM with transformed data. The experiments show that the proposed fuzzy clustering method achieves better performance (in terms of the clustering accuracy and the reconstruction error) in comparison with the outcomes produced by the generic version of the FCM algorithm.  相似文献   

14.
基于SVM(支持向量机)的SVDD(支持向量数据描述)分类算法存在计算复杂、分类准确率较低的缺陷, 针对股票数据非线性、高噪声的特点, 在传统的SVDD分类算法基础上, 模糊核超球快速分类算法(FCABFKH)通过合并法寻找超球集, 并依据最大隶属度原则构建分类器, 排除了离群点和超球集的重叠问题, 同时避免了复杂的二次规划, 具有分类速度快, 分类结果准确率高的特点。采用中国沪市上市公司数据验证该方法的有效性, 实验结果表明, 运用FCABFKH算法得到的组合回报率超过了市场基准。  相似文献   

15.
In recent years, a few sequential covering algorithms for classification rule discovery based on the ant colony optimization meta-heuristic (ACO) have been proposed. This paper proposes a new ACO-based classification algorithm called AntMiner-C. Its main feature is a heuristic function based on the correlation among the attributes. Other highlights include the manner in which class labels are assigned to the rules prior to their discovery, a strategy for dynamically stopping the addition of terms in a rule’s antecedent part, and a strategy for pruning redundant rules from the rule set. We study the performance of our proposed approach for twelve commonly used data sets and compare it with the original AntMiner algorithm, decision tree builder C4.5, Ripper, logistic regression technique, and a SVM. Experimental results show that the accuracy rate obtained by AntMiner-C is better than that of the compared algorithms. However, the average number of rules and average terms per rule are higher.  相似文献   

16.
Developing rule extraction algorithms from machine learning techniques such as artificial neural networks and support vector machines (SVMs), which are considered incomprehensible black-box models, is an important topic in current research. This study proposes a rule extraction algorithm from SVMs that uses a kernel-based clustering algorithm to integrate all support vectors and genetic algorithms into extracted rule sets. This study uses measurements of accuracy, sensitivity, specificity, coverage, fidelity and comprehensibility to evaluate the performance of the proposed method on the public credit screening data sets. Results indicate that the proposed method performs better than other rule extraction algorithms. Thus, the proposed algorithm is an essential analysis tool that can be effectively used in data mining fields.  相似文献   

17.
地图及工程图纸的智能矢量化方法   总被引:2,自引:0,他引:2  
针对常规矢量化几何失真较明显,抗噪声干扰能力弱等不足,本文提出了基于骨架点对划进行跟踪,通过设定规划库进行噪声滤除和形状校正的智能矢量化方法。本方法在矢量化各环节分别采用了动态模板定位骨架点;根据典型噪声和失真的知识,设计准则自动进行噪声滤除和形状校正;利用最大距离法进行矢量化等方法,以克服典型的噪声和干扰。  相似文献   

18.
在粗糙集理论的基础上,对决策信息系统中边界区域的数据进行研究,提出一种从边界区域数据中挖掘决策规则的算法——近似序列决策规则挖掘算法。在16个UCI数据集上的测试表明,该算法在规则的准确度和平均前件长度2个指标上优于ID3算法,能简洁、高效地挖掘出决策信息系统中的全部决策规则,为挖掘未知知识提供了新的思路。针对挖掘出的全部决策规则,提出新的确定性度量和一致性度量指标,用以准确地反映决策规则的性能。  相似文献   

19.
Ant colony optimization (ACO) algorithms have been successfully applied in data classification, which aim at discovering a list of classification rules. However, due to the essentially random search in ACO algorithms, the lists of classification rules constructed by ACO-based classification algorithms are not fixed and may be distinctly different even using the same training set. Those differences are generally ignored and some beneficial information cannot be dug from the different data sets, which may lower the predictive accuracy. To overcome this shortcoming, this paper proposes a novel classification rule discovery algorithm based on ACO, named AntMinermbc, in which a new model of multiple rule sets is presented to produce multiple lists of rules. Multiple base classifiers are built in AntMinermbc, and each base classifier is expected to remedy the weakness of other base classifiers, which can improve the predictive accuracy by exploiting the useful information from various base classifiers. A new heuristic function for ACO is also designed in our algorithm, which considers both of the correlation and coverage for the purpose to avoid deceptive high accuracy. The performance of our algorithm is studied experimentally on 19 publicly available data sets and further compared to several state-of-the-art classification approaches. The experimental results show that the predictive accuracy obtained by our algorithm is statistically higher than that of the compared targets.  相似文献   

20.
In this paper, a method for constructing Takagi-Sugeno (TS) fuzzy system from data is proposed with the objective of preserving TS submodel comprehensibility, in which linguistic modifiers are suggested to characterize the fuzzy sets. A good property held by the proposed linguistic modifiers is that they can broaden the cores of fuzzy sets while contracting the overlaps of adjoining membership functions (MFs) during identification of fuzzy systems from data. As a result, the TS submodels identified tend to dominate the system behaviors by automatically matching the global model (GM) in corresponding subareas, which leads to good TS model interpretability while producing distinguishable input space partitioning. However, the GM accuracy and model interpretability are two conflicting modeling objectives, improving interpretability of fuzzy models generally degrades the GM performance of fuzzy models, and vice versa. Hence, one challenging problem is how to construct a TS fuzzy model with not only good global performance but also good submodel interpretability. In order to achieve a good tradeoff between GM performance and submodel interpretability, a regularization learning algorithm is presented in which the GM objective function is combined with a local model objective function defined in terms of an extended index of fuzziness of identified MFs. Moreover, a parsimonious rule base is obtained by adopting a QR decomposition method to select the important fuzzy rules and reduce the redundant ones. Experimental studies have shown that the TS models identified by the suggested method possess good submodel interpretability and satisfactory GM performance with parsimonious rule bases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号