过程挖掘的目标是从信息系统所记录的事件日志中重现过程模型.尽管信息系统会生成多种事件日志,但只有很少一部分的事件日志被应用于分析流程.提出一种基于后继任务的新型过程挖掘算法(χ 算法).该算法不仅能够直接从后继任务中挖掘出因果依赖关系,而且能够挖掘出潜在并发关系.其中,因果依赖关系包括显式依赖和隐式依赖(由非自由选择结构产生)两种.另外,χ 算法能够正确地挖掘SWF网、大部分带隐式依赖的非SWF网、一些非良好处理的工作流网和一些含有隐式库所的工作流网.因为χ 算法所使用的事件日志多出了一种新的事件类型———后继任务,所以χ 算法能够处理范围更广的工作流网.  相似文献   

流程增量挖掘中的模型更新方法   总被引:1,自引:1,他引:0  
正确发现流程实际运作情况对工作流管理有着重要的意义.流程挖掘抽取系统日志信息,挖掘流程的真实运作模型.目前很多该方面的研究,着重于从一份日志中挖掘出工作流模型.然而,这些挖掘方法只关注日志信息,忽略了流程设计者的先验知识.而且,日志所包含信息量较大,进行一次挖掘耗费较大.因此,希望能结合已有工作流模型及新增日志信息,更新工作流模型.已有研究给出对模型及日志的增量挖掘算法.但是,业务流程会随着时间推移变更,可能已有的任务被取消了,因此在新增的一段日志中该任务没被记录.但由于该任务曾经在已有日志中记录下来,故应用已有挖掘算法或增量挖掘算法,在更新模型中,该任务也会被挖掘出来.提出了一种增量挖掘模型更新的改进算法.通过流程设计者的先验知识及统计任务出现的频率,判断该任务是否被取消.最后给出一个实验,验证算法的可行性.  相似文献   

一种新的工作流频繁模式挖掘算法研究   总被引:1,自引:1,他引:0  
高昂  杨扬  王玥薇 《计算机科学》2009,36(9):231-233
为了提高工作流模型挖掘技术的准确性,提出了一种新的工作流频繁模式挖掘算法.首先,阐述了工作流模型依赖矩阵的定义,并利用工作流日志建立了依赖矩阵.然后采用活动间的依赖关系作为频繁项集,设计了一种基于依赖矩阵的频繁项集自动生成算法.最后对频繁项集进行处理,得到最终的工作流频繁模式.该算法能够处理活动间交叠关系和具有串、并行关系的工作流模型,因此更具优越性.  相似文献   

对于不完备日志挖掘这一过程挖掘的难题,在传统α算法的基础上,提出了一种不完备日志的过程挖掘算法。该算法在α算法给出的并行关系定义的基础上,提出了一组隐含并行关系的推导规则,利用已知的任务关系推导出日志中缺失的隐含并行关系,在此基础上构造出过程的结构模型。实验结果证明,该算法对不完备日志的挖掘效果优于传统α系列算法。  相似文献   

针对α算法在挖掘短循环结构等方面的不足,提出了一种改进的α算法.该算法先基于事件类型定义任务间序关系;然后依据这些关系逐渐缩小日志,推出序关系矩阵.最后据此矩阵借助形式化步骤生成工作流网.用一个案例解释了算法的执行过程并进行模拟实验,验证了该算法的可行性和有效性.实验结果表明,改进算法比α算法在处理模型结构、重发现模型种类以及挖掘质量上更具优势.  相似文献   

一种基于时序行为的流过程协同重构算法   总被引:1,自引:0,他引:1  
过程流数据具有实时性、连续性和时序性等特征,使得传统过程挖掘算法难以发现隐含信息和演化过程。针对流过程模型的动态演化和重构要求,提出了一种基于时序行为分析的自适应混合启发式协同优化算法。首先定义演化流过程模型,基于日志活动间的隐含依赖关系改进过程逻辑的启发式挖掘规则,然后定义基于时序行为的老化因子,并引入高斯变异的多种群协作的自适应策略,改进粒子群优化算法的全局和局部精确寻优能力,实现优化和重构过程模型。该算法在四个典型测试函数上进行了对比实验,结果表明该算法在流过程挖掘中具有更好的收敛性和稳定性。  相似文献   

业务流程挖掘旨在从记录的事件日志中挖掘出满足人们需求的流程模型。以往的方法多是根据事件之间的直接依赖关系建立流程模型,具有一定的局限性,提出了基于拟间接依赖的流程挖掘优化分析方法。依据事件日志,以行为轮廓为基础,构建初始模型。在执行日志下,通过基于整数线性规划流程发现算法的基本约束体查找出具有拟间接依赖关系的变迁对,并对模型进行完善,挖掘出优化模型。通过具体的实例分析验证了该方法的有效性。  相似文献   

为了提高流程挖掘的准确性和抗噪性,针对目前流程挖掘的基本结构有限、抗噪能力弱、计算耗时长等问题,提出了一种基于相邻事件概率统计的流程挖掘方法。该方法基于挖掘规则,仅需做一次日志遍历和矩阵的简单运算,就可生成挖掘的流程模型。与α算法和启发式算法的实验验证结果表明,该算法不仅能够挖掘顺序、选择、并行、短循环、递归等流程基本结构,而且具有计算复杂度低、抗噪能力强等优势。  相似文献   

过程挖掘对于部署新的商业流程以及审计、分析和改进已有的流程是非常有帮助的。在商业流程系统日志中,同名任务和重复任务是大量存在的。现有的挖掘算法都不能很好地区分,这导致在过程挖掘的结果中往往会产生不准确的流程模型。为了提高过程挖掘的准确性,提出了一种改进方法,它不仅能够挖掘日志中的循环结构、非自由选择结构等复杂结构,还能够挖掘日志中的同名任务和重复任务。  相似文献   

在跨企业、跨系统的环境中,流程数据通常记录在单独的事件日志中,这使得无法挖掘完整的端到端的执行流程,因此本算法提出仅使用事件名称以及时间戳属性对日志进行合并。首先分别获取两个系统的过程模型以及根据活动的跨系统跟随依赖关系获得的合并模型,接着将两个系统的流程一对一进行合并并按照时间戳排序,留下与合并模型路径一致的合并流程,然后从这些流程中获得一对一的实例对,即唯一主流程仅与唯一子流程可以合并,再从这些实例对中挖掘活动间的时间约束用于剩余日志的合并,重复最后两步直到所有日志均合并或无法一对一合并日志。该算法在真实的事件日志上进行了实验,达到了满意的合并效果并获得较高的准确率与召回率。  相似文献   

The more knowledge industrial practitioners detain of their production processes, the more they are capable of performing process improvements. Nonetheless, there may exist process characteristics and dependencies that are not easily extractable from business models, such as routing dependent attributes. This paper introduces an algorithm-driven framework to establish whether process path decisions influence the attributes in non-direct sequences, e.g., deploying machine A instead of machine B affects the % of rejected parts on the process, 4 stages down the line. This problem is shown to bears similarities with sequential pattern mining problems. The basis of the solution framework relies on process mining and data mining techniques. The approach proposed is applied on a real industrial log, unveiling deficiencies in the system and providing further improvement recommendations.  相似文献   

Mining process models with non-free-choice constructs   总被引:6,自引:0,他引:6  
Process mining aims at extracting information from event logs to capture the business process as it is being executed. Process mining is particularly useful in situations where events are recorded but there is no system enforcing people to work in a particular way. Consider for example a hospital where the diagnosis and treatment activities are recorded in the hospital information system, but where health-care professionals determine the “careflow.” Many process mining approaches have been proposed in recent years. However, in spite of many researchers’ persistent efforts, there are still several challenging problems to be solved. In this paper, we focus on mining non-free-choice constructs, i.e., situations where there is a mixture of choice and synchronization. Although most real-life processes exhibit non-free-choice behavior, existing algorithms are unable to adequately deal with such constructs. Using a Petri-net-based representation, we will show that there are two kinds of causal dependencies between tasks, i.e., explicit and implicit ones. We propose an algorithm that is able to deal with both kinds of dependencies. The algorithm has been implemented in the ProM framework and experimental results shows that the algorithm indeed significantly improves existing process mining techniques.  相似文献   

Discovering branching and fractional dependencies in databases   总被引:1,自引:1,他引:0  
The discovery of dependencies between attributes in databases is an important problem in data mining, and can be applied to facilitate future decision-making. In the present paper some properties of the branching dependencies are examined. We define a minimal branching dependency and we propose an algorithm for finding all minimal branching dependencies between a given set of attributes and a given attribute in a relation of a database. Our examination of the branching dependencies is motivated by their application in a database storing realized sales of products. For example, finding out that arbitrary p products have totally attracted at most q new users can prove to be crucial in supporting the decision making.In addition, we also consider the fractional and the fractional branching dependencies. Some properties of these dependencies are examined. An algorithm for finding all fractional dependencies between a given set of attributes and a given attribute in a database relation is proposed. We examine the general case of an arbitrary relation, as well as a particular case where the problem of discovering the fractional dependencies is considerably simplified.  相似文献   

对现有的基于MapReduce的并行频繁项集挖掘算法进行了研究, 提出一种基于后缀项表的并行闭频繁项集挖掘算法, 通过后缀项表的引入及以闭频繁项集挖掘的形式, 减少组分间的数据传送量, 提高挖掘效率。实验表明, 该算法可以有效缩短平均挖掘时间, 对于高维大数据具有较好的性能。  相似文献   

随着网络安全问题受到越来越多的关注,在数据挖掘中做好隐私保护已成为当前的研究热点。如何在挖掘过程中不泄露私有信息或敏感数据,同时能得到比较准确的挖掘效果,是数据挖掘研究中的一个热点课题。本文从数据分布方式结合挖掘算法对当前几种关键的隐私保护方法进行分析,并给出算法的评估,最后提出隐私保护数据挖掘方法的未来研究方向。  相似文献   

数据挖掘是一门交叉性学科,是情报学专业的重要课程之一。它主要介绍数据挖掘的基本概念、原理、方法和技术,涉及多个学科和算法因而教学难度较大。由于数据挖掘学科交叉性强涉及的挖掘方法和相关算法多并繁杂,造成学生对数据挖掘的整体工作流程缺乏了解形成“不识庐山真面目只缘身在此山中”现象。文中以时间序列服装销售额预测挖掘项目为教学案例让学生首先掌握数据挖掘的标准流程,重点讲解用到的相关挖掘方法和算法及其在实际挖掘环境中的开发方法,达到“会当I临绝顶一览众山小”的教学目标。通过教学实践教学效果良好。  相似文献   

工作流管理系统由工作流模型所驱动,但产业界的实践表明定义工作流模型的工作不仅费时而且易错。工作流挖掘技术能够帮助解决这一问题,并能为现有工作流的分析与优化提供参考。简要介绍三种典型且具有应用价值的工作流模型挖掘算法,并应用其中一种挖掘算法,详细讨论了一个实际的工作流模型挖掘过程。挖掘过程以某Staffware系统的工作流日志文件为起点,包括数据预处理、初始工作流模型挖掘、初始工作流模型化简三个主要步骤,具体实现可通过一个工作流模型挖掘子系统参与完成。  相似文献   

In this paper, we propose an efficient rule discovery algorithm, called FD_Mine, for mining functional dependencies from data. By exploiting Armstrong’s Axioms for functional dependencies, we identify equivalences among attributes, which can be used to reduce both the size of the dataset and the number of functional dependencies to be checked. We first describe four effective pruning rules that reduce the size of the search space. In particular, the number of functional dependencies to be checked is reduced by skipping the search for FDs that are logically implied by already discovered FDs. Then, we present the FD_Mine algorithm, which incorporates the four pruning rules into the mining process. We prove the correctness of FD_Mine, that is, we show that the pruning does not lead to the loss of useful information. We report the results of a series of experiments. These experiments show that the proposed algorithm is effective on 15 UCI datasets and synthetic data.  相似文献   

Today, development of e-commerce has provided many transaction databases with useful information for investigators exploring dependencies among the items. In data mining, the dependencies among different items can be shown using an association rule. The new fuzzy-genetic (FG) approach is designed to mine fuzzy association rules from a quantitative transaction database. Three important advantages are associated with using the FG approach: (1) the association rules can be extracted from the transaction database with a quantitative value; (2) extracting proper membership functions and support threshold values with the genetic algorithm will exert a positive effect on the mining process results; (3) expressing the association rules in a fuzzy representation is more understandable for humans. In this paper, we design a comprehensive and fast algorithm that mines level-crossing fuzzy association rules on multiple concept levels with learning support threshold values and membership functions using the cluster-based master–slave integrated FG approach. Mining the fuzzy association rules on multiple concept levels helps find more important, useful, accurate, and practical information.  相似文献   

