共查询到20条相似文献,搜索用时 15 毫秒
1.
高效的关联规则挖掘算法 总被引:2,自引:0,他引:2
针对Apriori算法多次扫描数据库且生成的候选项集数量大的缺陷,提出了一种数据库优化策略,并结合修剪频繁集和连接优化策略,得到一种新的关联规则挖掘算法-NApriori算法.该算法减小了数据库的规模以及候选项集的数目,避免了连接过程中相同项目的重复比较.实验表明此方法比Apriori算法有更好的性能. 相似文献
2.
基于内存受限的RFID复杂事件处理优化算法 * 总被引:2,自引:0,他引:2
复杂事件处理是RFID数据管理的关键技术,由于受到内存的限制,海量实时的RFID原始流数据处理 的中间结果部分只能存储在外存中,会产生内存瓶颈,严重限制了大规模RFID的部署。为此,提出了B+-树 分时优化索引(BIOT)的复杂事件处理算法。在内存受限的情况下,将数据流按时序进行分割,且用B+-树进 行区间分块索引,之后利用RFID数据流统计分布特性进行复杂事件查找与匹配,避免了频繁搜索外存,极大地 降低了I/O开销并提高了吞吐量。此外,进行了相关的对比实验,验证了算法的有效性。 相似文献
3.
Loan T.T. Nguyen Bay Vo Tzung-Pei Hong Hoang Chi Thanh 《Expert systems with applications》2013,40(6):2305-2311
Building a high accuracy classifier for classification is a problem in real applications. One high accuracy classifier used for this purpose is based on association rules. In the past, some researches showed that classification based on association rules (or class-association rules – CARs) has higher accuracy than that of other rule-based methods such as ILA and C4.5. However, mining CARs consumes more time because it mines a complete rule set. Therefore, improving the execution time for mining CARs is one of the main problems with this method that needs to be solved. In this paper, we propose a new method for mining class-association rule. Firstly, we design a tree structure for the storage frequent itemsets of datasets. Some theorems for pruning nodes and computing information in the tree are developed after that, and then, based on the theorems, we propose an efficient algorithm for mining CARs. Experimental results show that our approach is more efficient than those used previously. 相似文献
4.
In this paper,the problem of discovering association rules between items in a large database of sales transactions is discussed.and a novel algorithm,BitMatrix,is proposed.The proposed algorithm is fundamentally different from the known algorithms Apriori and AprioriTid.Empirical evaluation shows that the algorithm outperforms the known ones for large databases.Scale-up experiments show that the algorithm scales linearly with the number of transactions. 相似文献
5.
在事务数据集中发现项目间的关联规则是数据挖掘的一个经典问题,但传统的关联规则挖掘方法对于大事务数据集而言,执行效率相对较低。已经有研究表明,采样技术能有效地改善挖掘效率。在分析现有采样方法的基础上,提出了一种新的基于采样的高效关联规则挖掘算法ESMA。该算法采用了更加有效的双向采样策略。通过实验分析表明,该算法明显地加快了大事务数据库中采样的速度,从而降低了CPU时间,而且具有很好的可扩展性。 相似文献
6.
传统的关联规则挖掘是单向的,不能确定相互依赖的规则,找到的规则不一定是有意义的,甚至是错误的。鉴于此,本文在分析的基础上,提出双向关联规则挖掘算法。并根据其相关性找出对我们有意义的规则。 相似文献
7.
Clusters of mobile elements, such as vehicles and humans, are a common mobility pattern of interest for many applications. The on-line detection of them from large position streams of mobile entities is a challenging task because it requires algorithms that are capable of continuously and efficiently processing the high volume of position updates in a timely manner. Currently, the majority of approaches for cluster detection operate in batch mode, where position updates are recorded during time periods of certain length and then batch processed by an external routine, thus delaying the result of the cluster detection until the end of the time period. However, if the monitoring application requires results at a higher frequency than the one delivered by batch algorithms, then results might not reflect the current clustering state of the entities. To overcome this limitation, in this paper we propose DG2CEP, an algorithm that combines the well-known density-based clustering algorithm DBSCAN with the data stream processing paradigm Complex Event Processing (CEP) to achieve continuous, on-line detection of clusters. Our experiments with synthetic and real world datasets indicate that DG2CEP is able to detect the formation and dispersion of clusters with small latency and higher similarity to DBSCAN׳s output than batch-based approaches. 相似文献
8.
9.
关联规则挖掘AprioriTid算法的改进 总被引:7,自引:0,他引:7
提出了一种将AprioriTid算法与事务压缩和项目压缩相结合的改进算法。该算法中候选项目集及支持度计算是在每条事务压缩后通过联接产生,候选项目集采用关键字识别,省去了AprioriTid算法中的剪枝和字符串模式匹配步骤。实验结果表明,改进的算法执行效率明显优于AprioriTid算法。 相似文献
10.
对基于数据挖掘的通信网告警相关性分析进行了研究。由于通信网络是动态变化的,用于动态网络资源和服务的自适应关联规则算法需要充分利用和维护原有规则来发现新规则,使网络结构与规则库都能快速更新,为此提出了新型的动态关联规则挖掘算法IDARM。理论分析与仿真实验都显示此算法性能优越、可扩展性好,并在一些特定情况下能显著提高效率。 相似文献
11.
We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms—the same order of magnitude as the optimum. 相似文献
12.
一种隐私保护关联规则挖掘的混合算法* 总被引:3,自引:2,他引:1
针对现有的隐私保护关联规则挖掘算法无法满足效率与精度之间较好的折中问题,提出了一种基于安全多方计算与随机干扰相结合的混合算法。算法基于半诚实模型,首先使用项集随机干扰矩阵对各个分布站点的数据进行变换和隐藏,然后提出一种方法恢复项集的全局支持数。由于采用的是对项集进行干扰,克服了传统方法由于独立地干扰每个项而破坏项之间相关性,导致恢复精度下降的缺陷。将小于阈值的项集进行剪枝,再使用安全多方计算在剪枝后的空间中精确找出全局频繁项集,进而生成全局关联规则。实验表明,该算法在保持隐私度的情况下,能够获得精度和效率之间较好的折中。 相似文献
13.
挖掘关联规则的两大经典算法Apriori和FP-tree算法都是以批处理方式处理所有事务。但在实际应用中,新事务频繁地出现,这就需要不断更新关联规则。为了提高更新效率,有效减少扫描原数据库的次数,基于次频繁项的概念,在快速更新频繁模式树(FUFP-tree)算法的基础上,提出了一种改进的算法。实验结果表明新算法具有良好的性能。 相似文献
14.
《Applied Soft Computing》2008,8(1):646-656
In this paper, a Pareto-based multi-objective differential evolution (DE) algorithm is proposed as a search strategy for mining accurate and comprehensible numeric association rules (ARs) which are optimal in the wider sense that no other rules are superior to them when all objectives are simultaneously considered. The proposed DE guided the search of ARs toward the global Pareto-optimal set while maintaining adequate population diversity to capture as many high-quality ARs as possible. ARs mining problem is formulated as a four-objective optimization problem. Support, confidence value and the comprehensibility of the rule are maximization objectives while the amplitude of the intervals which conforms the itemset and rule is minimization objective. It has been designed to simultaneously search for intervals of numeric attributes and the discovery of ARs which these intervals conform in only single run of DE. Contrary to the methods used as usual, ARs are directly mined without generating frequent itemsets. The proposed DE performs a database-independent approach which does not rely upon the minimum support and the minimum confidence thresholds which are hard to determine for each database. The efficiency of the proposed DE is validated upon synthetic and real databases. 相似文献
15.
目前已经提出了许多用于高效地发现大规模数据库中的关联规则的算法,但都是对关联规则中满足最小支持度的频繁项集的研究,没有对频繁项集中如何高效地计算得到满足最小置信度的关联规则进行研究.针对这种情况,提出了一种高效关联规则的挖掘算法EA,解决了在挖掘关联规则过程中如何高效挖掘满足最小置信度的关联规则问题. 相似文献
16.
基于频繁模式树的分布式关联规则挖掘算法 总被引:1,自引:0,他引:1
提出一种基于频繁模式树的分布式关联规则挖掘算法(DMARF).DMARF算法设置了中心结点,利用局部频繁模式树让各计算机结点快速获取局部频繁项集,然后与中心结点交互实现数据汇总,最终获得全局频繁项集.DMARF算法采用顶部和底部策略,能大幅减少候选项集,降低通信量.理论分析和实验结果均表明了DMARF算法是快速而有效的. 相似文献
17.
One of the major challenges in data mining is the extraction of comprehensible knowledge from recorded data. In this paper, a coevolutionary-based classification technique, namely COevolutionary Rule Extractor (CORE), is proposed to discover classification rules in data mining. Unlike existing approaches where candidate rules and rule sets are evolved at different stages in the classification process, the proposed CORE coevolves rules and rule sets concurrently in two cooperative populations to confine the search space and to produce good rule sets that are comprehensive. The proposed coevolutionary classification technique is extensively validated upon seven datasets obtained from the University of California, Irvine (UCI) machine learning repository, which are representative artificial and real-world data from various domains. Comparison results show that the proposed CORE produces comprehensive and good classification rules for most datasets, which are competitive as compared with existing classifiers in literature. Simulation results obtained from box plots also unveil that CORE is relatively robust and invariant to random partition of datasets. 相似文献
18.
约束关联挖掘是在把项或项集限制在用户给定的某一条件或多个条件下的关联挖掘,是一种重要的关联挖掘类型,在现实中有着不少的应用。但由于大多数算法处理的约束条件类型单一,提出一种多约束关联挖掘算法。该算法以FP-growth为基础,创建项集的条件数据库。利用非单调性和单调性约束的性质,采用多种剪枝策略,快速寻找约束点。实验证明,该算法能有效地挖掘多约束条件下的关联规则,且可扩展性能很好。 相似文献
19.
一种高效的关联规则增量更新算法 总被引:3,自引:0,他引:3
对挖掘关联规则中FUP算法的关键思想以及性能进行了研究,提出了改进的FUP算法SFUP。该算法充分利用原有挖掘结果中候选频繁项集的支持数,能有效减少对数据库的重复扫描次数,并通过实验对这两种算法进行比较,结果充分说明了SFUP算法的效率要明显优于FUP算法。 相似文献