首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In a recent paper by Toloo et al. [Toloo, M., Sohrabi, B., & Nalchigar, S. (2009). A new method for ranking discovered rules from data mining by DEA. Expert Systems with Applications, 36, 8503–8508], they proposed a new integrated data envelopment analysis model to find most efficient association rule in data mining. Then, utilizing this model, an algorithm is developed for ranking association rules by considering multiple criteria. In this paper, we show that their model only selects one efficient association rule by chance and is totally depended on the solution method or software is used for solving the problem. In addition, it is shown that their proposed algorithm can only rank efficient rules randomly and will fail to rank inefficient DMUs. We also refer to some other drawbacks in that paper and propose another approach to set up a full ranking of the association rules. A numerical example illustrates some contents of the paper.  相似文献   

2.
Association rule mining can provide genuine insight into the data being analysed; however, rule sets can be extremely large, and therefore difficult and time-consuming for the user to interpret. We propose reducing the size of Apriori rule sets by removing overlapping rules, and compare this approach with two standard methods for reducing rule set size: increasing the minimum confidence parameter, and increasing the minimum antecedent support parameter. We evaluate the rule sets in terms of confidence and coverage, as well as two rule interestingness measures that favour rules with antecedent conditions that are poor individual predictors of the target class, as we assume that these represent potentially interesting rules. We also examine the distribution of the rules graphically, to assess whether particular classes of rules are eliminated. We show that removing overlapping rules substantially reduces rule set size in most cases, and alters the character of a rule set less than if the standard parameters are used to constrain the rule set to the same size. Based on our results, we aim to extend the Apriori algorithm to incorporate the suppression of overlapping rules.  相似文献   

3.
Relational rule learning algorithms are typically designed to construct classification and prediction rules. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach was successfully applied to standard ILP problems (East-West trains, King-Rook-King chess endgame and mutagenicity prediction) and two real-life problems (analysis of telephone calls and traffic accident analysis). Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at .  相似文献   

4.
一种新的关联规则挖掘方法   总被引:1,自引:0,他引:1       下载免费PDF全文
关联规则挖掘是数据挖掘的主要任务之一。为了进一步提高关联规则挖掘算法的认知特性和运算效果,提出了一种新的关联规则挖掘思想并由此构造了一种基于规则模糊认知图的关联规则挖掘算法。该算法使用规则模糊认知图进行知识表示,对每个挖掘到的关联规则进行可达模糊推理,从而减少了与数据库交互的次数。实验证明该方法与Apriori的关联规则算法相比,提高了关联规则挖掘的效率,增强了智能化程度。  相似文献   

5.
现有的关联规则挖掘算法没有考虑数据流中会话的非均匀分布特性和历史数据的作用,并且忽略了连续属性处理时的“尖锐边界”问题。针对这些问题,本文提出一种基于时间衰减模型的模糊会话关联规则挖掘算法。首先,针对数据流中会话的非均匀分布特性,基于时间片对会话进行划分,完整的保留了时间片内会话之间的相关性信息;然后,采用模糊集对会话的连续属性进行处理,增加了规则的兴趣度和可理解性;最后,在考虑历史数据作用和允许误差情况的基础上,基于时间衰减模型挖掘数据流中的临界频繁项集和模糊关联规则。实验结果表明,本文方法在提高时间效率、降低冗余率和增加规则兴趣度方面存在明显优势。  相似文献   

6.
一种基于多维集的关联模式挖掘算法   总被引:2,自引:0,他引:2  
大多数维间关联规则挖掘算法如基于数据立方体的关联规则挖掘算法都假定对象的属性取值只具有单值性.将对象的属性取值扩展到多值,据此提出多维集的概念和基于多维集关联规则的语义特征.在此语义特征下,提出了一个多维集的关联规则挖掘算法.该算法利用多维集关联规则的限制特征,能够在数据集缩减的同时进行侯选集的三重剪枝,因此,具有比直接使用apriori等算法更好的性能,分析了算法的性能和正确性、完备性,并通过实验对算法有效性进行了对比.  相似文献   

7.
关联规则是数据挖掘的一种常用方法。本文以Apriorl算法中频繁项集的概念为基础,在加入了元向量、子规则、父规则等概念后,提出一种关联规则挖掘的改进方法(Improve算法)。该方法克服了传统关联规则挖掘方法的不足,在产生频繁项集的同时进行规则挖掘,从而提高了挖掘效率。  相似文献   

8.
One strategy for increasing the efficiency of rule discovery in data mining is to target a restricted class of rules, such as exact or almost exact rules, rules with a limited number of conditions, or rules in which each condition, on its own, eliminates a competing outcome class. An algorithm is presented for the discovery of rules in which each condition is a distinctive feature of the outcome class on its right-hand side in the subset of the data set defined by the conditions, if any, which precede it. Such a rule is said to be characteristic for the outcome class. A feature is defined as distinctive for an outcome class if it maximises a well-known measure of rule interest or is unique to the outcome class in the data set. In the special case of data mining which arises when each outcome class is represented by a single instance in the data set, a feature of an object is shown to be distinctive if and only if no other feature is shared by fewer objects in the data set.  相似文献   

9.
Although knowledge discovery from large relational databases has gained popularity and its significance is well recognized, the prohibitive nature of the cost associated with extracting such knowledge, as well as the lack of suitable declarative query language support act as limiting factors. Surprisingly, little or no relational technology has yet been significantly exploited in data mining even though data often reside in relational tables. Consequently, no relational optimization has yet been possible for data mining. We exploit the transitive nature of large item sets and the so called anti-monotonicity property of support thresholds of large item sets to develop a natural least fixpoint operator for set oriented data mining from relational databases. The operator proposed has several advantages including optimization opportunities, and traditional candidate set free large item set generation. We present an SQL3 expression for association rule mining and discuss its mapping to the least fixpoint operator developed in this paper.  相似文献   

10.
针对单一层次结构实现规则提取具有规则提取准确性不高、算法运行时间长、难以满足用户使用需求的问题,提出一种基于改进多层次模糊关联规则的定量数据挖掘算法。采用高频项目集合,通过不断深化迭代的方法形成自顶向下的挖掘过程,整合模糊集合理论、数据挖掘算法以及多层次分类技术,从事务数据集中寻找模糊关联规则,挖掘出储存在多层次结构事务数据库中定量值信息的隐含知识,实现用户的定制化信息挖掘需求。实验结果表明,提出的数据挖掘算法在挖掘精度和运算时间方面相较于其他算法具有突出优势,可为多层次关联规则提取方法的实际应用带来新的发展空间。  相似文献   

11.
关联规则挖掘是经典的数据挖掘方法,越来越多的企业都把它看作是必不可少的战略分析工具。当前关联规则挖掘方法得到的规则过多,令用户在运用时难以理解,因此研究关联规则集的约简方法具有应用价值。研究了数据库模式中关键字包含的主属性对基于Apriori算法的关联规则挖掘产生的关联规则的影响,即部分函数依赖会导致关联规则挖掘的数据集中冗余信息的频繁出现,并产生没有实际价值的关联规则,识别并消除这样的规则就能实现规则集的约简。求全部主属性如同求所有候选关键字问题都是NP难题,因此提出了一种基于一个候选关键字进行验证的算法来判定主属性,从而完成基于主属性判定的关联规则挖掘约简算法的设计与实现,并在最后的实验中验证了该算法的有效性。   相似文献   

12.
Mining association rules plays an important role in data mining and knowledge discovery since it can reveal strong associations between items in databases. Nevertheless, an important problem with traditional association rule mining methods is that they can generate a huge amount of association rules depending on how parameters are set. However, users are often only interested in finding the strongest rules, and do not want to go through a large amount of rules or wait for these rules to be generated. To address those needs, algorithms have been proposed to mine the top-k association rules in databases, where users can directly set a parameter k to obtain the k most frequent rules. However, a major issue with these techniques is that they remain very costly in terms of execution time and memory. To address this issue, this paper presents a novel algorithm named ETARM (Efficient Top-k Association Rule Miner) to efficiently find the complete set of top-k association rules. The proposed algorithm integrates two novel candidate pruning properties to more effectively reduce the search space. These properties are applied during the candidate selection process to identify items that should not be used to expand a rule based on its confidence, to reduce the number of candidates. An extensive experimental evaluation on six standard benchmark datasets show that the proposed approach outperforms the state-of-the-art TopKRules algorithm both in terms of runtime and memory usage.  相似文献   

13.
14.

With millions of Web users visiting Web servers each day, the Web log contains valuable information about users' browsing behavior. In this work, we construct sequential classifiers for predicting the users' next visits based on the current actions using association rule mining. The domain feature of Web-log mining entails that we adopt a special kind of association rules we call latest-substring rules, which take into account the temporal information as well as the correlation information. Furthermore, when constructing the classification model, we adopt a pessimistic selection method for choosing among alternative predictions. To make such prediction models useful, especially for small devices with limited memory and bandwidth, we also introduce a model compression method, which removes redundant association rules from the model. We empirically show that the resulting prediction model performs very well.  相似文献   

15.
《Knowledge》2002,15(7):399-405
We define an optimal class association rule set to be the minimum rule set with the same predictive power of the complete class association rule set. Using this rule set instead of the complete class association rule set we can avoid redundant computation that would otherwise be required for mining predictive association rules and hence improve the efficiency of the mining process significantly. We present an efficient algorithm for mining the optimal class association rule set using an upward closure property of pruning weak rules before they are actually generated. We have implemented the algorithm and our experimental results show that our algorithm generates the optimal class association rule set, whose size is smaller than 1/17 of the complete class association rule set on average, in significantly less rime than generating the complete class association rule set. Our proposed criterion has been shown very effective for pruning weak rules in dense databases.  相似文献   

16.
IntroductionAn important quality of association rules is novelty. However, evaluating rule novelty is AI-hard and has been a serious challenge for most data mining systems.ObjectiveIn this paper, we introduce functional novelty, a new non-pairwise approach to evaluating rule novelty. A functionally novel rule is interesting as it suggests previously unknown relations between user hypotheses.MethodsWe developed a novel domain-driven KDD framework for discovering functionally novel association rules. Association rules were mined from cardiovascular data sets. At post-processing, domain knowledge-compliant rules were discovered by applying semantic-based filtering based on UMLS ontology. Their knowledge compliance scores were computed against medical knowledge in Pubmed literature. A cardiologist explored possible relationships between several pairs of unknown hypotheses. The functional novelty of each rule was computed based on its likelihood to mediate these relationships.ResultsHighly interesting rules were successfully discovered. For instance, common rules such as diabetes mellitus?coronary arteriosclerosis was functionally novel as it mediated a rare association between von Willebrand factor and intracardiac thrombus.ConclusionThe proposed post-mining domain-driven rule evaluation technique and measures proved to be useful for estimating candidate functionally novel rules with the results validated by a cardiologist.  相似文献   

17.
In this paper we deal with the problem of mining for approximate dependencies (AD) in relational databases. We introduce a definition of AD based on the concept of association rule, by means of suitable definitions of the concepts of item and transaction. This definition allow us to measure both the accuracy and support of an AD. We provide an interpretation of the new measures based on the complexity of the theory (set of rules) that describes the dependence, and we employ this interpretation to compare the new measures with existing ones. A methodology to adapt existing association rule mining algorithms to the task of discovering ADs is introduced. The adapted algorithms obtain the set of ADs that hold in a relation with accuracy and support greater than user-defined thresholds. The experiments we have performed show that our approach performs reasonably well over large databases with real-world data.  相似文献   

18.
关联分析是一种重要的数据挖掘技术。文中结合房地产行业的特点,将关联分析方法应用于对消费者购房行为的研究中。传统的关联规则挖掘算法-Apriori算法在实际应用中存在着计算量大、挖掘效率低、产生大量不相关的关联规则等问题。为了减少计算量、提高挖掘效率、发现有价值的关联规则,提出了一种灰色关联度分析算法和Apriori算法结合的研究方法。首先采用灰色关联度分析算法得出影响消费者购房需求和偏好的关键因子,然后采用Apriori算法对关键因子和目标因子之间进行关联规则挖掘。以某市问卷调查的消费者信息记录进行建模,结果表明该关联分析方法具有较高的挖掘效率并且研究结果具有合理性和准确性。  相似文献   

19.
空间数据挖掘是从空间数据库中抽取隐含知识、空间关系及空间数据库中存储的其它信息的方法。空间关联规则是空间数据挖掘的一个重要研究领域,利用空间关联规则把空间数据库中的数据转化为知识是一个很好的方法。在分析空间关联规则的基础上,用基于关联规则的逐步求精挖掘算法,得出空间数据库中的隐含知识,通过实例证明其方法的可行性。  相似文献   

20.

Association rules mining is a popular data mining modeling tool. It discovers interesting associations or correlation relationships among a large set of data items, showing attribute values that occur frequently together in a given dataset. Despite their great potential benefit, current association rules modeling tools are far from optimal. This article studies how visualization techniques can be applied to facilitate the association rules modeling process, particularly what visualization elements should be incorporated and how they can be displayed. Original designs for visualization of rules, integration of data and rule visualizations, and visualization of rule derivation process for supporting interactive visual association rules modeling are proposed in this research. Experimental results indicated that, compared to an automatic association rules modeling process, the proposed interactive visual association rules modeling can significantly improve the effectiveness of modeling, enhance understanding of the applied algorithm, and bring users greater satisfaction with the task. The proposed integration of data and rule visualizations can significantly facilitate understanding rules compared to their nonintegrated counterpart.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号