首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对现有关联分类技术的不足,提出了一种适用于关联分类的增量更新算法IUAC。该算法是基于频繁模式树挖掘和更新关联规则的,并使用一种树形结构来存储最终用于分类的关联规则。同时,增加了对分类规则的约束条件,进一步控制了用于分类的关联规则的数量。最后,对算法整体进行了分析和讨论。  相似文献   

2.
The paper focuses on the adaptive relational association rule mining problem. Relational association rules represent a particular type of association rules which describe frequent relations that occur between the features characterizing the instances within a data set. We aim at re-mining an object set, previously mined, when the feature set characterizing the objects increases. An adaptive relational association rule method, based on the discovery of interesting relational association rules, is proposed. This method, called ARARM (Adaptive Relational Association Rule Mining) adapts the set of rules that was established by mining the data before the feature set changed, preserving the completeness. We aim to reach the result more efficiently than running the mining algorithm again from scratch on the feature-extended object set. Experiments testing the method's performance on several case studies are also reported. The obtained results highlight the efficiency of the ARARM method and confirm the potential of our proposal.  相似文献   

3.
Simple association rules (SAR) and the SAR-based rule discovery   总被引:13,自引:0,他引:13  
Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a problem of concern, as conventional mining algorithms often produce too many rules for decision makers to digest. Instead, this paper concentrates on a smaller set of rules, namely, a set of simple association rules each with its consequent containing only a single attribute. Such a rule set can be used to derive all other association rules, meaning that the original rule set based on conventional algorithms can be ‘recovered’ from the simple rules without any information loss. The number of simple rules is much less than the number of all rules. Moreover, corresponding algorithms are developed such that certain forms of rules (e.g. ‘P?’ or ‘?Q’) can be generated in a more efficient manner based on simple rules.  相似文献   

4.
We present a data mining method which integrates discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated. Numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. The horizontal reduction is done by merging identical tuples after substituting an attribute value by its higher level value in a pre- defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples we consider further in the database(s). In the second phase, a novel context- sensitive feature merit measure is used to rank features, a subset of relevant attributes is chosen, based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without changing the interdependence relationships between the classes and the attributes. Finally, the tuples in the reduced relation are transformed into different knowledge rules based on different knowledge discovery algorithms. Based on these principles, a prototype knowledge discovery system DBROUGH-II has been constructed by integrating discretization, generalization, rough set feature selection and a variety of data mining algorithms. Tests on a telecommunication customer data warehouse demonstrates that different kinds of knowledge rules, such as characteristic rules, discriminant rules, maximal generalized classification rules, and data evolution regularities, can be discovered efficiently and effectively.  相似文献   

5.
On optimal rule discovery   总被引:4,自引:0,他引:4  
In machine learning and data mining, heuristic and association rules are two dominant schemes for rule discovery. Heuristic rule discovery usually produces a small set of accurate rules, but fails to find many globally optimal rules. Association rule discovery generates all rules satisfying some constraints, but yields too many rules and is infeasible when the minimum support is small. Here, we present a unified framework for the discovery of a family of optimal rule sets and characterize the relationships with other rule-discovery schemes such as nonredundant association rule discovery. We theoretically and empirically show that optimal rule discovery is significantly more efficient than association rule discovery independent of data structure and implementation. Optimal rule discovery is an efficient alternative to association rule discovery, especially when the minimum support is low.  相似文献   

6.
7.
基于Rough集的数据挖掘模型研究   总被引:13,自引:0,他引:13  
这项工作的主要目的是表明怎样能够有效地实现基于Rough集的数据挖掘技术,在这篇论文里,我们详细讨论了Rough集理论,为了从基于Rough集的数据库中发现新的规则,研究了一种适合数据挖掘的面向对象的软件系结构,给出了数据挖掘算法、规则发现算法和规则约简算法,从初始数据库的信息出发,依次建造差别矩阵、约简表和规则表,最后给出了一个模拟实例,表明我们的模型和算法是可行的。  相似文献   

8.
现有的混合信息系统知识发现模型涵盖的数据类型大多为符号型、数值型条件属性及符号型决策属性,且大多数模型的关注点是属性约简或特征选择,针对规则提取的研究相对较少。针对涵盖更多数据类型的混合信息系统构建一个动态规则提取模型。首先修正了现有的属性值距离的计算公式,对错层型属性值的距离给出了一种定义形式,从而定义了一个新的混合距离。其次提出了针对数值型决策属性诱导决策类的3种方法。其后构造了广义邻域粗糙集模型,提出了动态粒度下的上下近似及规则提取算法,构建了基于邻域粒化的动态规则提取模型。该模型可用于具有以下特点的信息系统的规则提取: (1)条件属性集可包括单层符号型、错层符号型、数值型、区间型、集值型、未知型等; (2)决策属性集可包括符号型、数值型。利用UCI数据库中的数据集进行了对比实验,分类精度表明了规则提取算法的有效性。  相似文献   

9.
一个最优分类关联规则算法   总被引:1,自引:0,他引:1  
分类和关联规则发现是数据挖掘中的两个重要领域。使用关联规则算法挖掘分类规则被叫做分类关联规则算法,是一个有较好前景的方法。本文提出了一个最优分类关联规则算法——OCARA。该算法使用最优关联规则挖掘算法挖掘分类规则,并对最优规则集排序,从而获得一个分类精度较高的分类器。将OCARA与传统分类算法C4.5和一般分类关联规则算法CBA、RMR在8个UCI数据集上进行实验比较,结果显示OCARA具有更好的性能,证明OCARA是一个有效的分类关联规则挖掘算法。  相似文献   

10.
刘洋  张卓  周清雷 《计算机科学》2014,41(12):164-167
医疗健康数据通常属性较多,且存在连续型、离散型并存的混合数据,这在很大程度上限制了知识发现方法对医疗健康数据的挖掘效率。以模糊粗糙集理论为基础,研究混合数据上的分类规则挖掘方法,通过引入规则获取算法的泛化阈值,来控制获取规则集的大小和复杂程度,提高粗糙集知识发现方法在医疗健康数据上的分类效率。最后通过对比实验验证了该算法在医疗决策表上挖掘规则的有效性。  相似文献   

11.
新型决策树构造方法   总被引:1,自引:0,他引:1       下载免费PDF全文
决策树是一种重要的数据挖掘工具,但构造最优决策树是一个NP-完全问题。提出了一种基于关联规则挖掘的决策树构造方法。首先定义了高可信度的近似精确规则,给出了挖掘这类规则的算法;在近似精确规则的基础上产生新的属性,并讨论了新生成属性的评价方法;然后利用新生成的属性和数据本身的属性共同构造决策树;实验结果表明新的决策树构造方法具有较高的精度。  相似文献   

12.
关联规则挖掘是数据挖掘的重要领域之一,利用粗糙集理论来挖掘关联规则的方法已经得到广泛关注.针对不完备信息系统,提出了基于粗糙集理论的快速ORD关联规则挖掘算法.该算法首先采用基于粗糙集理论的属性约简算法进行属性约简,然后采用快速、高效的冗余项集和冗余规则修剪算法--ORD算法获取关联规则.将该算法与其它同类流行的算法在4个UCI数据集上进行实验比较,结果表明该算法性能良好.  相似文献   

13.
空间数据挖掘是从空间数据库中抽取隐含知识、空间关系及空间数据库中存储的其它信息的方法。空间关联规则是空间数据挖掘的一个重要研究领域,利用空间关联规则把空间数据库中的数据转化为知识是一个很好的方法。在分析空间关联规则的基础上,用基于关联规则的逐步求精挖掘算法,得出空间数据库中的隐含知识,通过实例证明其方法的可行性。  相似文献   

14.
15.
基于关联规则的空间数据知识发现及实现   总被引:4,自引:0,他引:4  
空间数据挖掘就是从空间数据库中抽取隐含知识、空间关系及空间数据库中存储的其它模式的方法。空间关联规则是空间数据挖掘的一个重要表现形式,利用空间关联规则把空间数据库中的数据转化为知识是一个很好的方法。本文在分析空间关联规则的基础上,用基于关联规则的逐步求精挖掘算法,得出空间数据库中的知识,通过实例证明其方法的可行性。  相似文献   

16.
To date, association rule mining has mainly focused on the discovery of frequent patterns. Nevertheless, it is often interesting to focus on those that do not frequently occur. Existing algorithms for mining this kind of infrequent patterns are mainly based on exhaustive search methods and can be applied only over categorical domains. In a previous work, the use of grammar-guided genetic programming for the discovery of frequent association rules was introduced, showing that this proposal was competitive in terms of scalability, expressiveness, flexibility and the ability to restrict the search space. The goal of this work is to demonstrate that this proposal is also appropriate for the discovery of rare association rules. This approach allows one to obtain solutions within specified time limits and does not require large amounts of memory, as current algorithms do. It also provides mechanisms to discard noise from the rare association rule set by applying four different and specific fitness functions, which are compared and studied in depth. Finally, this approach is compared with other existing algorithms for mining rare association rules, and an analysis of the mined rules is performed. As a result, this approach mines rare rules in a homogeneous and low execution time. The experimental study shows that this proposal obtains a small and accurate set of rules close to the size specified by the data miner.  相似文献   

17.
The purpose of the work described in this paper is to provide an intelligent intrusion detection system (IIDS) that uses two of the most popular data mining tasks, namely classification and association rules mining together for predicting different behaviors in networked computers. To achieve this, we propose a method based on iterative rule learning using a fuzzy rule-based genetic classifier. Our approach is mainly composed of two phases. First, a large number of candidate rules are generated for each class using fuzzy association rules mining, and they are pre-screened using two rule evaluation criteria in order to reduce the fuzzy rule search space. Candidate rules obtained after pre-screening are used in genetic fuzzy classifier to generate rules for the classes specified in IIDS: namely Normal, PRB-probe, DOS-denial of service, U2R-user to root and R2L-remote to local. During the next stage, boosting genetic algorithm is employed for each class to find its fuzzy rules required to classify data each time a fuzzy rule is extracted and included in the system. Boosting mechanism evaluates the weight of each data item to help the rule extraction mechanism focus more on data having relatively more weight, i.e., uncovered less by the rules extracted until the current iteration. Each extracted fuzzy rule is assigned a weight. Weighted fuzzy rules in each class are aggregated to find the vote of each class label for each data item.  相似文献   

18.
The application of certainty factors to neural computing for rulediscovery   总被引:1,自引:0,他引:1  
Discovery of domain principles has been a major long-term goal for scientists. The paper presents a system called DOMRUL for learning such principles in the form of rules. A distinctive feature of the system is the integration of the certainty factor (CF) model and a neural network. These two elements complement each other. The CF model offers the neural network better semantics and generalization advantage, and the neural network overcomes possible limitations such as inaccuracies and overcounting of evidence associated with certainty factors. It is a major contribution of the paper to show mathematically the quantizability nature of the CFNet since previously the quantizability of the CF model was demonstrated only empirically. The rule discovery system can be applied to any domain without restriction on both the rule number and rule size. In a hypothetical domain, DOMRUL discovered complex domain rules at a considerably higher accuracy than a commonly used rule-learning program C4.5 in both normal and noisy conditions. The scalability in a large domain is also shown. On a real data set concerning promoters prediction in molecular biology, DOMRUL learned rules with more complete semantics than C4.5.  相似文献   

19.
In many data mining applications, the objective is to select data cases of a target class. For example, in direct marketing, marketers want to select likely buyers of a particular product for promotion. In such applications, it is often too difficult to predict who will definitely be in the target class (e.g., the buyer class) because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of classifying each data case to a definite class (e.g., buyer or non-buyer), a classification system is modified to produce a class probability estimate (or a score) for the data case to indicate the likelihood that the data case belongs to the target class (e.g., the buyer class). However, existing classification systems only aim to find a subset of the regularities or rules that exist in data. This subset of rules only gives a partial picture of the domain. In this paper, we show that the target selection problem can be mapped to association rule mining to provide a more powerful solution to the problem. Since association rule mining aims to find all rules in data, it is thus able to give a complete picture of the underlying relationships in the domain. The complete set of rules enables us to assign a more accurate class probability estimate to each data case. This paper proposes an effective and efficient technique to compute class probability estimates using association rules. Experiment results using public domain data and real-life application data show that in general the new technique performs markedly better than the state-of-the-art classification system C4.5, boosted C4.5, and the Naïve Bayesian system.  相似文献   

20.
一种新的关联规则挖掘方法   总被引:1,自引:0,他引:1       下载免费PDF全文
关联规则挖掘是数据挖掘的主要任务之一。为了进一步提高关联规则挖掘算法的认知特性和运算效果,提出了一种新的关联规则挖掘思想并由此构造了一种基于规则模糊认知图的关联规则挖掘算法。该算法使用规则模糊认知图进行知识表示,对每个挖掘到的关联规则进行可达模糊推理,从而减少了与数据库交互的次数。实验证明该方法与Apriori的关联规则算法相比,提高了关联规则挖掘的效率,增强了智能化程度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号