共查询到20条相似文献,搜索用时 312 毫秒
1.
分类是数据挖掘中的一个重要任务。当前许多分类算法一般要求处理离散属性数据,提出了一种新的基于复合粒子群算法,它能对含有连续属性和离散属性值的混合数据进行分类。为提高分类正确率和效率,对基本粒子群采用复合结构编码,通过粒子群算法得到连续属性离散化后的候选分割点并分类,将混合数据分类问题转化为0-1组合优化问题。实验结果证明,该算法有很好的分类效果,而且具有较快的收敛速度。 相似文献
2.
3.
4.
基于粗糙集的分类关联规则挖掘算法研究 总被引:1,自引:0,他引:1
本文给出了一种将属性约简和分类关联规则挖掘相结合的新型分类挖掘系统的算法(CARMA)。它运用粗糙集理论把关系数据库按属性值分成若干等价类、约简冗余属性及依赖属性,然后对数据约简后的目标关系表求取分类支持度大于阈值的强类和特征置信度大于阈值的强特征,从而有效获取强类中的强特征的决策关联规则。实验结果表明,CARMA对于数据的分类是有效的,比其它算法具有更高的分类精度和效率。它能够有效地克服ID3系列算法的冗余性、复杂性和对大数据量的不适应性,对增量数据能够达到较好的分类效果和具有广泛的应用前景。本文关键讨论了具体的算法、系统框架和实例。 相似文献
5.
决策树是数据挖掘中的一种分类算法,它是一种以实例为基础的归纳学习算法,来发现数据模式和规则.介绍了数据挖掘的定义及分类,详细介绍了决策树ID3算法.又根据ID3算法,对院校中收集的大量教学评价数据样本进行分析,获得不同属性上的信息增益,生成最终决策树,可将此树转换成一个if-then规则的集合.生成规则和决策树,然后对新数据进行分析和预测.通过数据建模以发现规律和模式,从而提取有价值的信息,避免目前教学质量评价中的不合理性,实例验证和分析的结果表示该方法的有效性.为教学质量评价提供合理、科学的决策支持,从而提高教学质量,改进教学成果. 相似文献
6.
沈旭昌 《计算机工程与设计》2005,26(3):750-751,767
隐私保护是数据挖掘中很有意义的研究方向。M.Kantarcioglu等提出了针对水平分割数据的保持隐私的关联规则挖掘的算法,探讨了如何在两个垂直分布的私有数据库的联合样本集上施行数据挖掘算法,同时保证不向对方泄露任何与结果无关的数据库数据,针对资料分类算法中应用非常普遍的关联规则挖掘算法,利用安全两方计算协议.给出一个保持隐私的关联规则挖掘协议。 相似文献
7.
传统的规则挖掘算法通常先约简属性再约简属性值. 该方法存在冗余计算, 当样本集增大时, 复杂性急剧增加. 对此提出一种基于粒计算的最简决策规则挖掘算法. 首先, 在不同粒度空间下计算条件粒与决策粒之间的粒关系矩阵; 然后, 将粒关系矩阵中隐含的信息H 1、H 2 作为启发式算子, 按信息粒约简属性值; 最后, 去除冗余属性并设置终止条件, 实现决策规则的快速挖掘. 理论分析和实验结果表明, 所提出的算法可以获得更简洁的规则, 且规则的泛化能力更强. 相似文献
8.
庄卿卿 《电脑与微电子技术》2009,(5):43-46
决策树是数据挖掘的一种重要方法,通常用来形成分类器和预测模型。ID3算法作为决策树的核心算法,由于它的简单与高效而得到了广泛的应用,然而它倾向于选择属性值较多的属性作为分支属性,从而可能错过分类能力强的属性。对ID3算法的分支策略进行改进,增加了对属性的类区分度的考量。经实验比较,新方法能提高决策树的精度,简化决策树。 相似文献
9.
决策树是数据挖掘的一种重要方法,通常用来形成分类器和预测模型。ID3算法作为决策树的核心算法,由于它的简单与高效而得到了广泛的应用,然而它倾向于选择属性值较多的属性作为分支属性,从而可能错过分类能力强的属性。对ID3算法的分支策略进行改进,增加了对属性的类区分度的考量。经实验比较,新方法能提高决策树的精度,简化决策树。 相似文献
10.
1 引言数据挖掘是一个从数据中提取出有效的、新颖的、潜在有用的、并能最终被人理解的模式的非平凡过程。数据挖掘可以挖掘出的知识包括关联规则(Association)、特征规则(Characterization)、分类规则(Classification)、聚类规则(Clustering)和趋势规则(Trend)等。数据挖掘是一交叉学科,涉及到诸如统计学、数据库、人工智能、数据可视化等学科。在数据挖掘的研究领域,对于关联规则挖掘的研究开展得比较积极和深入。关联规则挖掘就是要找出隐藏在数据间的相互关系。它展示了数据间未知的依赖关系,根据这种关联性就可从某一数据对象的信息来推断另一数据对象的信息。文[8~13]对关联规则的挖掘作了有意义的研究。R.Agrawal等提出了Aprilri算法和挖掘多层次关联规则的Culmulate,Stratify等算法,J. S.Park等提出了DHP算法,J.Han等提出了面向属性归纳的关联规则挖掘算法ML—T2L1等。 相似文献
11.
Discretisation, as one of the basic data preparation techniques, has played an important role in data mining. This article introduces a new hypercube division-based (HDD) algorithm for supervised discretisation. The algorithm considers the distribution of both class and continuous attributes and the underlying correlation structure in the data set. It tries to find a minimal set of cut points, which divides the continuous attribute space into a finite number of hypercubes, and the objects within each hypercube belong to the same decision class. Finally, tests are performed on seven mix-mode data sets, and the C5.0 algorithm is used to generate classification rules from the discretised data. Compared with the other three well-known discretisation algorithms, the HDD algorithm can generate a better discretisation scheme, which improves the accuracy of classification and reduces the number of classification rules. 相似文献
12.
13.
Associative classification is a new classification approach integrating association mining and classification. It becomes a significant tool for knowledge discovery and data mining. However, high-order association mining is time consuming when the number of attributes becomes large. The recent development of the AdaBoost algorithm indicates that boosting simple rules could often achieve better classification results than the use of complex rules. In view of this, we apply the AdaBoost algorithm to an associative classification system for both learning time reduction and accuracy improvement. In addition to exploring many advantages of the boosted associative classification system, this paper also proposes a new weighting strategy for voting multiple classifiers. 相似文献
14.
决策树是数据挖掘任务中分类的常用方法。在构造决策树的过程中,分离属性的选择标准直接影响到分类的效果,传统的决策树算法往往是基于信息论度量的。基于粗糙集的理论提出了一种基于属性重要度和依赖度为属性选择标准的决策树规则提取算法。使用该算法,能提取出明确的分类规则,比传统的ID3算法结构简单,并且能提高分类效率。 相似文献
15.
16.
A multi-objective GRASP for partial classification 总被引:4,自引:1,他引:3
Alan P. Reynolds Beatriz de la Iglesia 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2009,13(3):227-243
Metaheuristic algorithms have been used successfully in a number of data mining contexts and specifically in the production
of classification rules. Classification rules describe a class of interest or a subset of this class, and as such may also
be used as an aid in prediction. The production and selection of classification rules for a particular class of the database
is often referred to as partial classification. Since partial classification rules are often evaluated according to a number
of conflicting objectives, the generation of such rules is a task that is well suited to a multi-objective (MO) metaheuristic
approach. In this paper we discuss how to adapt well known MO algorithms for the task of partial classification. Additionally,
we introduce a new MO algorithm for this task based on a greedy randomized adaptive search procedure (GRASP). GRASP has been
applied to a number of problems in combinatorial optimization, but it has very seldom been used in a MO setting, and generally
only through repeated optimization of single objective problems, using either linear combinations of the objectives or additional
constraints. The approach presented takes advantage of some specific characteristics of the data mining problem being solved,
allowing for the very effective construction of a set of solutions that form the starting point for the local search phase
of the GRASP. The resulting algorithm is guided solely by the concepts of dominance and Pareto-optimality. We present experimental
results for our partial classification GRASP and other MO metaheuristics. These show that such algorithms are generally very
well suited to this data mining task and furthermore, the GRASP brings additional efficiency to the search for partial classification
rules. 相似文献
17.
18.
In this paper, we present a recursive algorithm for extracting classification rules from feedforward neural networks (NNs) that have been trained on data sets having both discrete and continuous attributes. The novelty of this algorithm lies in the conditions of the extracted rules: the rule conditions involving discrete attributes are disjoint from those involving continuous attributes. The algorithm starts by first generating rules with discrete attributes only to explain the classification process of the NN. If the accuracy of a rule with only discrete attributes is not satisfactory, the algorithm refines this rule by recursively generating more rules with discrete attributes not already present in the rule condition, or by generating a hyperplane involving only the continuous attributes. We show that for three real-life credit scoring data sets, the algorithm generates rules that are not only more accurate but also more comprehensible than those generated by other NN rule extraction methods. 相似文献
19.
基于模糊分类关联规则的分类系统 总被引:9,自引:0,他引:9
为了构建高性能的分类系统,应用模糊集软化数量型属性的划分边界,提出了模糊分类关联规则的挖掘算法。由于模糊集能很好地贴近人类的思维方式,因此挖掘得到的模糊分类关联规则易于被人理解.接着提出了基于模糊分类关联规则的分类系统,并采用遗传优化算法训练分类系统.实例分析的结果表明,基于模糊分类关联规则的分类系统具有较好的精度和可解释性. 相似文献
20.
基于改进粗糙逼近近似度量的数据挖掘方法 总被引:1,自引:0,他引:1
张文宇 《计算机工程与应用》2005,41(23):203-205
数据挖掘是知识发现领域的一个重要问题,粗糙集理论是一种具有模糊边界的数据挖掘方法,它被广泛应用于决策系统的分类规则提取中。论文在决策表条件属性重要性度量的基础上,根据条件属性对决策类划分的逼近近似度量,提出了基于改进粗糙逼近近似度量的数据挖掘进行属性约减方法,并用算例验证了算法的合理性和可行性。 相似文献