首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
传统的正关联规则考虑的是事务中所列举的项目,而负关联规则不仅考虑事务中所包含的项,还考虑了数据库中存在而事务中所不包含的项。该文首先讨论了负关联规则的相关定义,以及它的支持度、置信度与相关度,并分析了PNARC模型的算法。最后对负关联规则的研究发展方向进行分析。  相似文献   

2.
传统的关联规则挖掘研究事务中所包含的项与项之间的关联性,而负关联规则挖掘不仅要考虑事务中包含的项,还要考虑事务中不包含的项。给出了完全负关联规则的定义,提出一种基于树的算法Free-PNP,通过此算法挖掘数据库中的负频繁模式,继而得到所要挖掘的完全负关联规则。通过实验验证了算法的有效性。  相似文献   

3.
针对Apriori算法的主要问题,提出了关联规则挖掘的两种改进算法:高维分解法通过遍历事务数据库形成高维频繁项目集和关联规则,然后直接分解高维关联规则得到低维关联规则;前缀广义链表法先通过对事务数据库的遍历形成前缀链表,然后再次扫描事务数据库,遍历其前缀链表,根据判断每个事务是否与其中的一条路径完全或部分重合而找到关联规则。这两种算法均能极大地减少事务数据库的遍历和大规模候选序列集的产生,提高挖掘算法的效率,使得关联规则的产生简单化。  相似文献   

4.
基于规模约简和多支持度的关联规则挖掘   总被引:1,自引:0,他引:1  
史原  鲁汉榕  罗菁  高婷 《计算机工程与设计》2006,27(21):4105-4107,4114
关联规则挖掘的经典算法是Apriori算法,但是存在两大突出的问题,即多次扫描事务数据库和使用单一的支持度,导致了由于事务数据库的规模而增加搜索时间和产生冗余规则或有效规则被丢弃。以往的改进算法只从其中一方面进行考虑。因此同时考虑存在问题,给出了一种基于规模约简和多支持度的关联规则挖掘算法。分析和试验显示在效率上有提高。  相似文献   

5.
李晓虹  杨有 《计算机科学》2007,34(9):142-144
关联规则挖掘是数据挖掘的一个重要研究方向,其算法主要有Apriori算法和FP—growth算法,它们需要多次扫描事务数据库,严重影响算法的效率。为了减少扫描事务数据库的次数,本文提出一种基于线性链表(LinearLinker)的LL算法,它只需扫描事务数据库一次,把事务数据库转换为线性链表LL,进而对LL进行关联规则挖掘。实验表明,LL算法的时间开销明显优于Apriori算法和FP—growth算法,且LL算法通过定义备用候选频繁项目集,有效地支持了关联规则的更新挖掘。  相似文献   

6.
针对项目少、事务多的数据库关联规则挖掘问题,提出一种基于二叉树编码的关联规则动态挖掘算法。通过对应事务数据库项目建立二叉树,对应项集编码定义计数数组;对照二叉树扫描记录并计数;分析计算关联规则这几个步骤可以实现关联规则的动态挖掘。该算法充分利用了二叉树的编码特性,有效降低了I/0负载,容易实现事务的增删及数据库的划分、合并,具有较强的适用性。  相似文献   

7.
在研究负关联规则相关特性的基础上,将向量内积引入到该领域,提出了一种基于向量内积的多最小支持度正负关联规则挖掘算法。考虑到事务数据库中各项集分布不均而导致的单一最小支持度难以设定的问题,采用了多最小支持度策略,设计了一种能同时挖掘出频繁与非频繁项集,以及从这些项集中挖掘出正负关联规则的算法。实验结果表明,该算法仅需扫描一次数据库,且具有动态剪枝,不保留中间候选项和节省大量内存等优点,对事务数据库中负关联规则的挖掘具有重要意义。  相似文献   

8.
采用频繁项目链表变换的频繁项目集挖掘算法   总被引:1,自引:0,他引:1  
频繁项目集的产生是关联规则挖掘的关键问题,经典的关联规则挖掘算法是通过对事务数据库的多次扫描实现的.最新的研究已经开始探索合适的数据结构以支持进行极少次数的事务数据库的扫描,进而减少关联规则挖掘过程中巨大的I/O开销以获得更高的效率.文中利用频繁项目链表的数据结构,给出了一种仅需扫描两次事务数据库的关联规则挖掘算法 ,称为FILLT算法.该算法采取分而治之策略,对频繁项目链表实施分割、变换来进行关联规则挖掘.文中最后对这一算法的效率进行了理论分析和实验验证.  相似文献   

9.
为了提高经典关联规则Apriori算法的挖掘效率,针对Apriori算法的瓶颈问题,提出了一种链式结构存储频繁项目集并生成最大频繁项目集的关联规则算法.该算法采用比特向量方式存储事务,生成频繁项目集的同时,把包含此频繁项目的事务作为链表连接到频繁项目之后,生成最大频繁项目集.该算法能够减小扫描事物数据库的次数和生成候选项目集的数量,从而减少了生成最大频繁项目集的时间,实验结果表明,该算法提高了运算效率.  相似文献   

10.
频繁项集的挖掘是关联规则挖掘中一个关键的问题,典型的关联规则挖掘算法都是以数据库的多次扫描来实现的,而且不能即时反映数据库的变化,且其频繁项集的产生都只考虑了项目在数据库中出现的频度而没有考虑项目的重要性。本文提出了一种基于频繁链表的完全加权项频繁集的挖掘算法,该算法不但能动态反映数据库的变化,而且在频繁集的挖掘中只需扫描一次数据库,并根据项目的重要性程度对项目赋予了一定的权值,用以挖掘人们更感兴趣的关联规则。  相似文献   

11.
典型关联规则挖掘算法的分析与比较   总被引:3,自引:0,他引:3  
冯洁  陶宏才 《微机发展》2007,17(3):121-124
关联规则的发现是数据挖掘的一个重要方面,目前许多研究人员正致力于关联规则的快速开采算法的研究。文中介绍了几种典型的开采大型事务数据库中所有关联规则的算法,特别针对算法过程中产生候选频繁项集的大小和所需扫描事务数据库的次数这两个影响关联规则挖掘效率的关键问题,分析各个算法采用的解决策略及相应的局限性,并比较它们的时间效率和空间效率。最后展望了关联规则挖掘算法的研究方向。  相似文献   

12.
We examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that, given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items that appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying, within this candidate set, these itemsets that meet the large itemset requirement. Generally, this is done iteratively for each large k-itemset in increasing order of k, where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate sets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we develop an effective algorithm for the candidate set generation. It is a hash-based algorithm and is especially effective for the generation of a candidate set for large 2-itemsets. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. The advantage of the proposed algorithm also provides us the opportunity of reducing the amount of disk I/O required. An extensive simulation study is conducted to evaluate performance of the proposed algorithm  相似文献   

13.
Mining Fuzzy Multiple-Level Association Rules from Quantitative Data   总被引:2,自引:0,他引:2  
Machine-learning and data-mining techniques have been developed to turn data into useful task-oriented knowledge. Most algorithms for mining association rules identify relationships among transactions using binary values and find rules at a single-concept level. Transactions with quantitative values and items with hierarchical relationships are, however, commonly seen in real-world applications. This paper proposes a fuzzy multiple-level mining algorithm for extracting knowledge implicit in transactions stored as quantitative values. The proposed algorithm adopts a top-down progressively deepening approach to finding large itemsets. It integrates fuzzy-set concepts, data-mining technologies and multiple-level taxonomy to find fuzzy association rules from transaction data sets. Each item uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of original items. The algorithm therefore focuses on the most important linguistic terms for reduced time complexity.  相似文献   

14.
为了在事务数据库中发现关联规则,在现实挖掘应用中,经常采用不同的标准去判断不同项目的重要性,管理项目之间的分类关系和处理定量数据集这3个方法去处理问题,因此提出一个在定量事务数据库中采用多最小支持度,在项目集中获取隐含知识的多层模糊关联规则挖掘算法。该挖掘算法使用两种支持度约束和至上而下逐步细化的方法推导出频繁项集,同时可以发现交叉层次的模糊关联规则。通过实例证明了该挖掘算法在多最小支持度约束下推导出的多层模糊关联规则是易于理解和有意义的,具有很好的效率和伸缩性。  相似文献   

15.
16.
约束性相联规则发现方法及算法   总被引:47,自引:0,他引:47  
文中研究了在大型事务7库中发现有约束条件的相联规则问题,提出了有效实现约束性相联规则发现的两种方法,过滤数据库算法Filtering和频繁项集生成算法Separate,这两种可以同时并有物方法比已有算法运算效率有显著性提高。  相似文献   

17.
In this paper, we study the issues of mining and maintaining association rules in a large database of customer transactions. The problem of mining association rules can be mapped into the problems of finding large itemsets which are sets of items brought together in a sufficient number of transactions. We revise a graph-based algorithm to further speed up the process of itemset generation. In addition, we extend our revised algorithm to maintain discovered association rules when incremental or decremental updates are made to the databases. Experimental results show the efficiency of our algorithms. The revised algorithm is a significant improvement over the original one on mining association rules. The algorithms for maintaining association rules are more efficient than re-running the mining algorithms for the whole updated database and outperform previously proposed algorithms that need multiple passes over the database. Received 4 August 1999 / Revised 18 March 2000 / Accepted in revised form 18 October 2000  相似文献   

18.
A genetic-fuzzy mining approach for items with multiple minimum supports   总被引:2,自引:2,他引:0  
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Mining association rules from transaction data is most commonly seen among the mining techniques. Most of the previous mining approaches set a single minimum support threshold for all the items and identify the relationships among transactions using binary values. In the past, we proposed a genetic-fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions under a single minimum support. In real applications, different items may have different criteria to judge their importance. In this paper, we thus propose an algorithm which combines clustering, fuzzy and genetic concepts for extracting reasonable multiple minimum support values, membership functions and fuzzy association rules from quantitative transactions. It first uses the k-means clustering approach to gather similar items into groups. All items in the same cluster are considered to have similar characteristics and are assigned similar values for initializing a better population. Each chromosome is then evaluated by the criteria of requirement satisfaction and suitability of membership functions to estimate its fitness value. Experimental results also show the effectiveness and the efficiency of the proposed approach.  相似文献   

19.
Mining associations with the collective strength approach   总被引:1,自引:0,他引:1  
The large itemset model has been proposed in the literature for finding associations in a large database of sales transactions. A different method for evaluating and finding itemsets referred to as strongly collective itemsets is proposed. We propose a criterion stressing the importance of the actual correlation of the items with one another rather than their absolute level of presence. Previous techniques for finding correlated itemsets are not necessarily applicable to very large databases. We provide an algorithm which provides very good computational efficiency, while maintaining statistical robustness. The fact that this algorithm relies on relative measures rather than absolute measures such as support also implies that the method can be applied to find association rules in data sets in which items may appear in a sizeable percentage of the transactions (dense data sets), data sets in which the items have varying density, or even negative association rules  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号