期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A unified view of the apriori-based algorithms for frequent episode discovery

Avinash Achar Srivatsan Laxman P. S. Sastry 《Knowledge and Information Systems》2012,31(2):223-250

Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discovery methods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies. Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders. 相似文献

2.

面向事件流的频繁片断计数算法

下载免费PDF全文

黄鹏王鹏汪卫《计算机科学与探索》2010,4(10):909-917

在事件流上挖掘频繁片断已经成为近来研究的热点,在很多应用中起到重要作用。以往的研究提出了一些挖掘算法,包括基于滑动窗口和基于非重叠出现的方法。然而,这些算法在处理基于片断互异出现的支持度计数时,效率很低甚至无效。为此,提出了一种包含状态计数的有限状态自动机模型,并使用该模型给出了一种高效挖掘算法。从理论上对算法的效率和有效性进行了分析;实验结果证明了算法是有效且高效的。相似文献

3.

基于时滞特征的时序依赖情节发现

顾佩月刘峥李云李涛《计算机应用》2019,39(2):421-428

对于事件序列中的时序依赖发现，传统的频繁情节发现方法一方面使用时间窗口机制挖掘事件之间简单的关联依赖，另一方面无法有效处理事件的交叉时序关联。针对以上问题，提出了时滞情节发现的概念，在频繁情节发现的基础上，设计了一种基于相邻事件匹配集（AEM）的时滞情节发现算法。首先，引入时滞的概率统计模型进行事件序列匹配，避免预先设定时间窗口，处理可能存在的交叉关联；然后，将时滞挖掘转化为最优化问题，使用迭代的方式得到时滞情节之间的时间间隔分布；最后，利用假设检验区分串行时滞情节和并行时滞情节。理论分析与实验结果表明，与目前最新的时滞挖掘方法迭代最近事件（ICE）算法相比，基于AEM的时滞情节发现算法模拟的时滞分布与真实时滞分布的平均KL距离为0.056，缩短了20.68%。基于AEM的时滞情节发现算法通过时滞的概率统计模型衡量事件多种匹配情况的可能性，获得一对多的相邻事件匹配集，比ICE算法中的一对一匹配更加有效地模拟了实际情况。相似文献

4.

事件序列中频繁串行情节的增量式发现算法

魏正红欧阳为民蔡庆生《小型微型计算机系统》1999,(9)

本文研究事件序列中频繁情节的发现问题,提出了在事件序列中发现频繁串行情节的增量式算法．如果在事件序列中发现了频繁情节及其出现频率,我们就可以生成描述或预测该序列行为的情节规则．相似文献

5.

Fast exhaustive subgroup discovery with numerical target concepts

Florian Lemmerich Martin Atzmueller Frank Puppe 《Data mining and knowledge discovery》2016,30(3):711-762

相似文献

6.

Discovering frequent episodes and learning hidden Markov models: a formal connection 总被引：7，自引：0，他引：7

Srivatsan Laxman Sastry P.S. Unnikrishnan K.P. 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(11):1505-1517

This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery. 相似文献

7.

From association to classification: inference using weight of evidence 总被引：1，自引：0，他引：1

Wang Y. Wong A.K.C. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(3):764-767

Association and classification are two important tasks in data mining and knowledge discovery. Intensive studies have been carried out in both areas. But, how to apply discovered event associations to classification is still seldom found in current publications. Trying to bridge this gap, this paper extends our previous paper on significant event association discovery to classification. We propose to use weight of evidence to evaluate the evidence of a significant event association in support of, or against, a certain class membership. Traditional weight of evidence in information theory is extended here to measure the event associations of different orders with respect to a certain class. After the discovery of significant event associations inherent in a data set, it is easy and efficient to apply the weight of evidence measure for classifying an observation according to any attribute. With this approach, we achieve flexible prediction. 相似文献

8.

Discovery of Frequent Episodes in Event Sequences 总被引：48，自引：4，他引：44

Heikki Mannila Hannu Toivonen A. Inkeri Verkamo 《Data mining and knowledge discovery》1997,1(3):259-289

Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in a sequence. Once such episodes are known, one can produce rules for describing or predicting the behavior of the sequence. We give efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and present detailed experimental results. The methods are in use in telecommunication alarm management. 相似文献

9.

Exploration of Ordinal Data Using Association Rules

Oliver Büchter Rüdiger Wirth 《Knowledge and Information Systems》1999,1(4):393-414

The discovery of association rules is a very efficient data mining technique that is especially suitable for large amounts of categorical data. This paper shows how the discovery of association rules can be of benefit for numeric data as well. Based on a review of previous approaches we introduce Q2, a faster algorithm for the discovery of multi-dimensional association rules over ordinal data. We experimentally compare the new algorithm with the previous approach, obtaining performance improvements of more than an order of magnitude on supermarket data. In addition, a new absolute measure for the interestingness of quantitative association rules is introduced. It is based on the view that quantitative association rules have to be interpreted with respect to their Boolean generalizations. This measure has two major benefits compared to the previously used relative interestingness measure; first, it speeds up rule extraction and evaluation and second, it is easier to interpret for a user. Finally we introduce a rule browser which supports the exploration of ordinal data with quantitative association rules. 相似文献

10.

OPTIMONOTONE MEASURES FOR OPTIMAL RULE DISCOVERY

Yannick Le Bras Philippe Lenca Stéphane Lallich 《Computational Intelligence》2012,28(4):475-504

Many studies have shown the limits of the support/confidence framework used in Apriori ‐like algorithms to mine association rules. There are a lot of efficient implementations based on the antimonotony property of the support, but candidate set generation (e.g., frequent item set mining) is still costly. In addition, many rules are uninteresting or redundant and one can miss interesting rules like nuggets. We are thus facing a complexity issue and a quality issue. One solution is to not use frequent itemset mining and to focus as soon as possible on interesting rules using additional interestingness measures. We present here a formal framework that allows us to make a link between analytic and algorithmic properties of interestingness measures. We introduce the notion of optimonotony in relation with the optimal rule discovery framework. We then demonstrate a necessary and sufficient condition for the existence of optimonotony. This result can thus be applied to classify the measures. We study the case of 39 classical measures and show that 31 of them are optimonotone. These optimonotone measures can thus be used with an underlying pruning strategy. Empirical evaluations show that the pruning strategy is efficient and leads to the discovery of nuggets using an optimonotone measure and without the support constraint. 相似文献

11.

Mining multiple-level association rules in large databases 总被引：2，自引：0，他引：2

Jiawei Han Yongjian Fu 《Knowledge and Data Engineering, IEEE Transactions on》1999,11(5):798-805

A top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the a priori principle. A group of variant algorithms is proposed based on the ways of sharing intermediate results, with the relative performance tested and analyzed. The enforcement of different interestingness measurements to find more interesting rules, and the relaxation of rule conditions for finding “level-crossing” association rules, are also investigated. The study shows that efficient algorithms can be developed from large databases for the discovery of interesting and strong multiple-level association rules 相似文献

12.

一种基于遗传算法的兴趣规则挖掘算法

武永成刘钊《微计算机应用》2007,28(2):117-120

数据挖掘是在数据中发现隐藏的结构和模式。但发现的许多模式对用卢来说可能是已知的，从而使这些模式毫无意义，毫无兴趣性。文献中多强调分类规则的准确性和可理解性，但发现兴趣规则在数据挖掘算法中依然是一个令人生畏的挑战。本文采用一种遗传数据挖掘方法，在分类规则产生的同时对其兴趣性进行度量，直接产生兴趣规则。实验表明该方法是可行的、高效的。相似文献

13.

融合多种支持度定义的频繁情节挖掘算法

朱辉生陈琳倪艺洋汪卫施伯乐《软件学报》2020,31(7):2169-2183

事件序列中蕴藏的频繁情节刻画了用户或系统的行为规律.现有的频繁情节挖掘算法在各自支持度定义下具有较好的挖掘效果,但在支持度定义发生变化时却很难甚至无法直接挖掘频繁情节.针对用户多变的支持度定义需求,提出了一种频繁情节挖掘算法FEM-DFS(frequent episode mining-depth first search).该算法通过单遍扫描事件序列,以深度优先搜索方式来发现频繁情节,以共享前/后缀树来存储频繁情节,以单调性、前缀单调性或后缀单调性来压缩频繁情节的搜索空间.实验评估证实了所提出算法的有效性. 相似文献

14.

加权关联规则的并行挖掘算法 总被引：4，自引：1，他引：4

杨泽民陈莉范全润《计算机工程与应用》2003,39(8):192-193

关联规则是数据挖掘的重要研究内容之一,而传统的算法均为串行算法且将数据库项目按平等一致方式加以处理。文章提出了加权关联规则的并行挖掘算法,探讨了相关的数据结构,并对算法进行了定性分析。相似文献

15.

Mining interestingness measures for string pattern mining

M. Baena-Garc?´a R. Morales-Bueno 《Knowledge》2012,25(1):45-50

相似文献

16.

A framework for mining interesting high utility patterns with a strong frequency affinity

Chowdhury Farhan Ahmed Ho-Jin Choi 《Information Sciences》2011,181(21):4878-4894

High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity. 相似文献

17.

Using interesting sequences to interactively build Hidden Markov Models

Szymon Jaroszewicz 《Data mining and knowledge discovery》2010,21(1):186-220

The paper presents a method of interactive construction of global Hidden Markov Models (HMMs) based on local sequence patterns discovered in data. The method is based on finding interesting sequences whose frequency in the database differs from that predicted by the model. The patterns are then presented to the user who updates the model using their intelligence and their understanding of the modelled domain. It is demonstrated that such an approach leads to more understandable models than automated approaches. Two variants of the problem are considered: mining patterns occurring only at the beginning of sequences and mining patterns occurring at any position; both practically meaningful. For each variant, algorithms have been developed allowing for efficient discovery of all sequences with given minimum interestingness. Applications to modelling webpage visitors behavior and to modelling protein secondary structure are presented, validating the proposed approach. 相似文献

18.

Discovering Frequent Generalized Episodes When Events Persist for Different Durations 总被引：2，自引：0，他引：2

Laxman S. Sastry P.S. Unnikrishnan K.P. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(9):1188-1201

This paper is concerned with the framework of frequent episode discovery in event sequences. A new temporal pattern, called the generalized episode, is defined, which extends this framework by incorporating event duration constraints explicitly into the pattern's definition. This new formalism facilitates extension of the technique of episodes discovery to applications where data appears as a sequence of events that persist for different durations (rather than being instantaneous). We present efficient algorithms for episode discovery in this new framework. Through extensive simulations, we show the expressive power of the new formalism. We also show how the duration constraint possibilities can be used as a design choice to properly focus the episode discovery process. Finally, we briefly discuss some interesting results obtained on data from manufacturing plants of General Motors. 相似文献

19.

基于改进FP树的项项正相关关联规则挖掘

刘上力杨清《计算机工程与科学》2011,33(7):183

兴趣度量在关联规则挖掘中常用来发现那些潜在的令人感兴趣的模式,基于FP树结构的FP-growth算法是目前较高效的关联规则挖掘算法之一,如果挖掘潜在的有价值的低支持度模式,这种算法效率较低。为此,本文提出一种新的兴趣度量—项项正相关兴趣度量,该量度具有良好的反单调性,所得到的模式中任意一项在事务中的出现均可提升模式中其余项出现的可能性。同时,提出一种改进的FP挖掘算法,该算法采用一种压缩的FP树结构,并利用非递归调用方法来减少挖掘中建立额外条件模式树的开销。更为重要的是,在频繁项集挖掘中引入项项正相关兴趣度量剪枝策略,有效过滤掉非正相关长模式和无效项集,扩大了可挖掘支持度阈值范围。实验结果表明,该算法是有效和可行的。相似文献

20.

Mining bridging rules between conceptual clusters

Shichao Zhang Feng Chen Xindong Wu Chengqi Zhang Ruili Wang 《Applied Intelligence》2012,36(1):108-118

Bridging rules take the antecedent and action from different conceptual clusters. They are distinguished from association rules (frequent itemsets) because (1) they can be generated by the infrequent itemsets that are pruned in association rule mining, and (2) they are measured by their importance including the distance between two conceptual clusters, whereas frequent itemsets are measured only by their support. In this paper, we first design two algorithms for mining bridging rules between clusters, and then propose two non-linear metrics to measure their interestingness. We evaluate these algorithms experimentally and demonstrate that our approach is promising. 相似文献