首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Empirical evidence shows that repositories of business process models used in industrial practice contain significant amounts of duplication. This duplication arises for example when the repository covers multiple variants of the same processes or due to copy-pasting. Previous work has addressed the problem of efficiently retrieving exact clones that can be refactored into shared subprocess models. This paper studies the broader problem of approximate clone detection in process models. The paper proposes techniques for detecting clusters of approximate clones based on two well-known clustering algorithms: DBSCAN and Hierarchical Agglomerative Clustering (HAC). The paper also defines a measure of standardizability of an approximate clone cluster, meaning the potential benefit of replacing the approximate clones with a single standardized subprocess. Experiments show that both techniques, in conjunction with the proposed standardizability measure, accurately retrieve clusters of approximate clones that originate from copy-pasting followed by independent modifications to the copied fragments. Additional experiments show that both techniques produce clusters that match those produced by human subjects and that are perceived to be standardizable.  相似文献   

2.
The increased adoption of business process management approaches, tools, and practices has led organizations to accumulate large collections of business process models. These collections can easily include from a hundred to a thousand models, especially in the context of multinational corporations or as a result of organizational mergers and acquisitions. A concrete problem is thus how to maintain these large repositories in such a way that their complexity does not hamper their practical usefulness as a means to describe and communicate business operations. This paper proposes a technique to automatically infer suitable names for business process models and fragments thereof. This technique is useful for model abstraction scenarios, as for instance when user-specific views of a repository are required, or as part of a refactoring initiative aimed to simplify the repository's complexity. The technique is grounded in an adaptation of the theory of meaning to the realm of business process models. We implemented the technique in a prototype tool and conducted an extensive evaluation using three process model collections from practice and a case study involving process modelers with different experience.  相似文献   

3.

Context

Large organizations often run hundreds or even thousands of different business processes. Managing such large collections of business process models is a challenging task. Software can assist in performing that task, by supporting common management functions such as storage, search and version management of models. It can also provide advanced functions that are specific for managing collections of process models, such as managing the consistency of public and private processes. Software that supports the management of large collections of business process models is called: business process model repository software.

Objective

This paper contributes to the development of business process model repositories, by analyzing the state of the art.

Method

To perform the analysis a literature survey and a comparison of existing (business process model) repository technology is performed.

Result

The results of the state of the art analysis are twofold. First, a framework for business process model repositories is presented, which consists of a management model and a reference architecture. The management model lists the functionality that can be provided and the reference architecture presents the components that provide that functionality. Second, an analysis is presented of the extent to which existing business process model repositories implement the functionality from the framework.

Conclusion

The results presented in the paper are valuable as a comprehensive overview of business process model repository functionality. In addition they form a basis for a future research agenda. We conclude that existing repositories focus on traditional functionality rather than exploiting the full potential of information management tools, thus we show that there is a strong basis for further research.  相似文献   

4.
传统的基于Token的克隆检测方法利用代码字符串的序列化特性, 可以在大型代码仓中快速检测克隆. 但是与基于抽象语法树(AST)、程序依赖图(PDG)的方法相比, 由于缺少语法及语义信息, 针对文本有较大差异的克隆代码检测困难. 为此, 提出一种赋予语义信息的Token克隆检测方法. 首先, 分析抽象语法树, 使用AST路径抽象位于叶子节点的Token的语义信息; 然后, 在函数名和类型名角色的Token上建立低成本索引, 达到快速并有效地筛选候选克隆片段的目的. 最后, 使用赋予语义信息的Token判定代码块之间的相似性. 在公开的大规模数据集BigCloneBench实验结果表明, 该方法在文本相似度较低的Moderately Type-3和Weakly Type-3/Type-4类型克隆上显著优于主流方法, 包括NiCad、Deckard、CCAligner等, 同时在大型代码仓上需要更少的检测时间.  相似文献   

5.
ContextMining software repositories has emerged as a research direction over the past decade, achieving substantial success in both research and practice to support various software maintenance tasks. Software repositories include bug repository, communication archives, source control repository, etc. When using these repositories to support software maintenance, inclusion of irrelevant information in each repository can lead to decreased effectiveness or even wrong results.ObjectiveThis article aims at selecting the relevant information from each of the repositories to improve effectiveness of software maintenance tasks.MethodFor a maintenance task at hand, maintainers need to implement the maintenance request on the current system. In this article, we propose an approach, MSR4SM, to extract the relevant information from each software repository based on the maintenance request and the current system. That is, if the information in a software repository is relevant to either the maintenance request or the current system, this information should be included to perform the current maintenance task. MSR4SM uses the topic model to extract the topics from these software repositories. Then, relevant information in each software repository is extracted based on the topics.ResultsMSR4SM is evaluated for two software maintenance tasks, feature location and change impact analysis, which are based on four subject systems, namely jEdit, ArgoUML, Rhino and KOffice. The empirical results show that the effectiveness of traditional software repositories based maintenance tasks can be greatly improved by MSR4SM.ConclusionsThere is a lot of irrelevant information in software repositories. Before we use them to implement a maintenance task at hand, we need to preprocess them. Then, the effectiveness of the software maintenance tasks can be improved.  相似文献   

6.
凌济民  张莉 《软件学报》2015,26(3):460-473
随着过程模型的不断积累和演化,企业组织常常拥有并管理维护成百上千个业务过程模型.由于建模目标和应用场景的不同,参考模型的裁剪和定制以及模型的更新修改等因素,导致过程模型库中可能存在大量相似的过程模型变体.重点研究如何有效管理和识别过程变体之间的共同点和差异性,即自动化地构建过程模型变体之间的匹配关系.为了支持复杂对应关系,保证匹配关系查找效率和结果的有效性,提出了基于过程结构树的模型元素匹配关系构建技术,并进一步给出了基于树编辑距离的过程模型相似性度量方法.通过针对真实的过程模型集合的实验评估表明,该方法在查全率和查准率指标上表现出了良好的效果.  相似文献   

7.
Business process management technology is becoming increasingly popular, resulting in more and more business process models being created. Hence, there is a need for these business process models to be managed effectively. For effective business process model management, being able to efficiently query large amount of business process models is essential. For example, it is preferable to find a similar or related model to customize, rather than building a new one from scratch. This would not only save time, but would also be less error-prone and more coherent with the existing models of the enterprise. Querying large amounts of business process models efficiently is also vital during company amalgamation, in which business process models from multiple companies need to be examined and integrated. This paper provides: an overview of the field of querying business process models; a summary of its literature; and a list of challenges (and some potential solutions) that have yet to be addressed. In particular, we aim to compare the differences between querying business process models and general graph querying. We also discuss literature work from graph querying research that can be used when querying business process models.  相似文献   

8.
It is common for large organizations to maintain repositories of business process models in order to document and to continuously improve their operations. Given such a repository, this paper deals with the problem of retrieving those models in the repository that most closely resemble a given process model or fragment thereof. Up to now, there is a notable research gap on comparing different approaches to this problem and on evaluating them in the same setting. Therefore, this paper presents three similarity metrics that can be used to answer queries on process repositories: (i) node matching similarity that compares the labels and attributes attached to process model elements; (ii) structural similarity that compares element labels as well as the topology of process models; and (iii) behavioral similarity that compares element labels as well as causal relations captured in the process model. These metrics are experimentally evaluated in terms of precision and recall. The results show that all three metrics yield comparable results, with structural similarity slightly outperforming the other two metrics. Also, all three metrics outperform text-based search engines when it comes to searching through a repository for similar business process models.  相似文献   

9.
The stochastic dynamics of biochemical reaction networks can be modeled using a number of succinct formalisms all of whose semantics are expressed as Continuous Time Markov Chains (CTMC). While some kinetic parameters for such models can be measured experimentally, most are estimated by either fitting to experimental data or by performing ad hoc, and often manual search procedures. We consider an alternative strategy to the problem, and introduce algorithms for automatically synthesizing the set of all kinetic parameters such that the model satisfies a given high-level behavioral specification. Our algorithms, which integrate statistical model checking and abstraction refinement, can also report the infeasibility of the model if no such combination of parameters exists. Behavioral specifications can be given in any finitely monitorable logic for stochastic systems, including the probabilistic and bounded fragments of linear and metric temporal logics. The correctness of our algorithms is established using a novel combination of arguments based on survey sampling and uniform continuity. We prove that the probability of a measurable set of paths is uniformly and jointly continuous with respect to the kinetic parameters. Under a suitable technical condition, we also show that the unbiased statistical estimator for the probability of a measurable set of paths is monotonic in the parameter space. We apply our algorithms to two benchmark models of biochemical signaling, and demonstrate that they can efficiently find parameter regimes satisfying a given high-level behavioral specification. In particular, we show that our algorithms can synthesize up to 6 parameters, simultaneously, which is more than that reported by any other synthesis algorithm for stochastic systems. Moreover, when parameter estimation is desired, as opposed to synthesis, we show that our approach can scale to even higher dimensional spaces, by identifying the single parameter combination that maximizes the probability of the behavior being true in an 11-dimensional system.  相似文献   

10.
Model repositories play a central role in the model driven development of complex software-intensive systems by offering means to persist and manipulate models obtained from heterogeneous languages and tools. Complex models can be assembled by interconnecting model fragments by hard links, i.e., regular references, where the target end points to external resources using storage-specific identifiers. This approach, in certain application scenarios, may prove to be a too rigid and error prone way of interlinking models. As a flexible alternative, we propose to combine derived features with advanced incremental model queries as means for soft interlinking of model elements residing in different model resources. These soft links can be calculated on-demand with graceful handling for temporarily unresolved references. In the background, the links are maintained efficiently and flexibly by using incremental model query evaluation. The approach is applicable to modeling environments or even property graphs for representing query results as first-class relations, which also allows the chaining of soft links that is useful for modular applications. The approach is evaluated using the Eclipse Modeling Framework (EMF) and EMF-IncQuery in two complex industrial case studies. The first case study is motivated by a knowledge management project from the financial domain, involving a complex interlinked structure of concept and business process models. The second case study is set in the avionics domain with strict traceability requirements enforced by certification standards (DO-178b). It consists of multiple domain models describing the allocation scenario of software functions to hardware components.  相似文献   

11.
Efficient retrieval of ontology fragments using an interval labeling scheme   总被引:1,自引:0,他引:1  
Nowadays very large domain ontologies are being developed in life-science areas like Biomedicine, Agronomy, Astronomy, etc. Users and applications can benefit enormously from these ontologies in very different tasks, such as visualization, vocabulary homogenizing and data classification. However, due to their large size, they are often unmanageable for these applications. Instead, it is necessary to provide small and useful fragments of these ontologies so that the same tasks can be performed as if the whole ontology is being used. In this work we present a novel method for efficiently indexing and generating ontology fragments according to the user requirements. Moreover, the generated fragments preserve relevant inferences that can be made with the selected symbols in the original ontology. Such a method relies on an interval labeling scheme that efficiently manages the transitive relationships present in the ontologies. Additionally, we provide an interval’s algebra to compute some logical operations over the ontology concepts. We have evaluated the proposed method over several well-known biomedical ontologies. Results show very good performance and scalability, demonstrating the applicability of the proposed method in real scenarios.  相似文献   

12.
In using pooling designs to identify clones containing a specific subsequence called positive clones, sometimes there exist nonpositive clones which can cancel the effect of positive clones. Various models have been studied which differ in the power of cancellation. Although the various models pose interesting mathematical problems, and ingenious constructions of pooling designs have been proposed, in practice we rarely are sure about the true model and thus about which pooling design to use. In this paper we give a pooling design which fits all inhibitor models, and does not use more tests than in the more specific models. In particular, we obtain a 1-round pooling design for the k-inhibitor model for which only sequential designs are currently known.  相似文献   

13.
14.
Recent developments in the analysis of large Markov models facilitate the fast approximation of transient characteristics of the underlying stochastic process. Fluid analysis makes it possible to consider previously intractable models whose underlying discrete state space grows exponentially as model components are added. In this work, we show how fluid-approximation techniques may be used to extract passage-time measures from performance models. We focus on two types of passage measure: passage times involving individual components, as well as passage times which capture the time taken for a population of components to evolve.Specifically, we show that for models of sufficient scale, global passage-time distributions can be well approximated by a deterministic fluid-derived passage-time measure. Where models are not of sufficient scale, we are able to generate upper and lower approximations for the entire cumulative distribution function of these passage-time random variables, using moment-based techniques. Additionally, we show that, for passage-time measures involving individual components, the cumulative distribution function can be directly approximated by fluid techniques.Finally, using the GPA tool, we take advantage of the rapid fluid computation of passage times to show how a multi-class client-server system can be optimised to satisfy multiple service level agreements.  相似文献   

15.
A string similarity join finds similar pairs between two collections of strings. Many applications, e.g., data integration and cleaning, can significantly benefit from an efficient string-similarity-join algorithm. In this paper, we study string similarity joins with edit-distance constraints. Existing methods usually employ a filter-and-refine framework and suffer from the following limitations: (1) They are inefficient for the data sets with short strings (the average string length is not larger than 30); (2) They involve large indexes; (3) They are expensive to support dynamic update of data sets. To address these problems, we propose a novel method called trie-join, which can generate results efficiently with small indexes. We use a trie structure to index the strings and utilize the trie structure to efficiently find similar string pairs based on subtrie pruning. We devise efficient trie-join algorithms and pruning techniques to achieve high performance. Our method can be easily extended to support dynamic update of data sets efficiently. We conducted extensive experiments on four real data sets. Experimental results show that our algorithms outperform state-of-the-art methods by an order of magnitude on the data sets with short strings.  相似文献   

16.
In this paper we propose a novel framework for contour based object detection from cluttered environments. Given a contour model for a class of objects, it is first decomposed into fragments hierarchically. Then, we group these fragments into part bundles, where a part bundle can contain overlapping fragments. Given a new image with set of edge fragments we develop an efficient voting method using local shape similarity between part bundles and edge fragments that generates high quality candidate part configurations. We then use global shape similarity between the part configurations and the model contour to find optimal configuration. Furthermore, we show that appearance information can be used for improving detection for objects with distinctive texture when model contour does not sufficiently capture deformation of the objects.  相似文献   

17.
There seems to be a never ending stream of new process modeling notations. Some of these notations are foundational and have been around for decades (e.g., Petri nets). Other notations are vendor specific, incremental, or are only popular for a short while. Discussions on the various competing notations concealed the more important question “What makes a good process model?”. Fortunately, large scale experiences with process mining allow us to address this question. Process mining techniques can be used to extract knowledge from event data, discover models, align logs and models, measure conformance, diagnose bottlenecks, and predict future events. Today’s processes leave many trails in data bases, audit trails, message logs, transaction logs, etc. Therefore, it makes sense to relate these event data to process models independent of their particular notation. Process models discovered based on the actual behavior tend to be very different from the process models made by humans. Moreover, conformance checking techniques often reveal important deviations between models and reality. The lessons that can be learned from process mining shed a new light on process model quality. This paper discusses the role of process models and lists seven problems related to process modeling. Based on our experiences in over 100 process mining projects, we discuss these problems. Moreover, we show that these problems can be addressed by exposing process models and modelers to event data.  相似文献   

18.
Efficient queries over Web views   总被引:1,自引:0,他引:1  
Large Web sites are becoming repositories of structured information that can benefit from being viewed and queried as relational databases. However, querying these views efficiently requires new techniques. Data usually resides at a remote site and is organized as a set of related HTML documents, with network access being a primary cost factor in query evaluation. This cost can be reduced by exploiting the redundancy often found in site design. We use a simple data model, a subset of the Araneus data model, to describe the structure of a Web site. We augment the model with link and inclusion constraints that capture the redundancies in the site. We map relational views of a site to a navigational algebra and show how to use the constraints to rewrite algebraic expressions, reducing the number of network accesses. We show that similar techniques can be used to maintain materialized views over sets of HTML pages.  相似文献   

19.
Mathematical modeling of cerebral tumor growth is of great importance in clinics. It can help in understanding the physiology of tumor growth, future prognosis of tumor shape and volume, quantify tumor aggressiveness, and the response to therapy. This can be achieved at macroscopic level using medical imaging techniques (particularly, magnetic resonance imaging (MRI) and diffusion tensor imaging (DTI)) and complex mathematical models which are either diffusive or biomechanical. We propose an optimized generic modeling framework that couples tumor diffusivity and infiltration with the induced mass effect. Tumor cell diffusivity and infiltration are captured using a modified reaction-diffusion model with logistic proliferation term. On the other hand, tumor mass effect is modeled using continuum mechanics formulation. In addition, we consider the treatment effects of both radiotherapy and chemotherapy. The efficacy of chemotherapy is included via an adaptively modified log-kill method to consider tissue heterogeneity while the efficacy of radiotherapy is considered using the linear quadratic model. Moreover, our model efficiently utilizes the diffusion tensor of the diffusion tensor imaging. Furthermore, we optimize the tumor growth parameters to be patient-specific using bio-inspired particle swarm optimization (PSO) algorithm. Our model is tested on an atlas and real MRI scans of 8 low grade gliomas subjects. Experimental results show that our model efficiently incorporates both treatment effects in the growth modelingprocess. In addition, simulated growths of our model have high accuracy in terms of Dice coefficient (average 87.1%) and Jaccard index (77.14%) when compared with the follow up scans (ground truth) and other models as well.  相似文献   

20.
Using background knowledge to rank itemsets   总被引:1,自引:1,他引:0  
Assessing the quality of discovered results is an important open problem in data mining. Such assessment is particularly vital when mining itemsets, since commonly many of the discovered patterns can be easily explained by background knowledge. The simplest approach to screen uninteresting patterns is to compare the observed frequency against the independence model. Since the parameters for the independence model are the column margins, we can view such screening as a way of using the column margins as background knowledge. In this paper we study techniques for more flexible approaches for infusing background knowledge. Namely, we show that we can efficiently use additional knowledge such as row margins, lazarus counts, and bounds of ones. We demonstrate that these statistics describe forms of data that occur in practice and have been studied in data mining. To infuse the information efficiently we use a maximum entropy approach. In its general setting, solving a maximum entropy model is infeasible, but we demonstrate that for our setting it can be solved in polynomial time. Experiments show that more sophisticated models fit the data better and that using more information improves the frequency prediction of itemsets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号