Detecting approximate clones in business process model repositories |
| |
Affiliation: | 1. School of Computer Science and Technology, Shandong University, Jinan, China;2. School of Engineering, Brown University, Providence, United States;3. Engineering Research Center of Digital Media Technology, Ministry of Education of PRC, Jinan, China;1. Laboratoire de Conception et Application de Molécules Bioactives, CNRS - Université de Strasbourg UMR 7199, Faculté de Pharmacie, 74, Route du Rhin, F-67400 Illkirch, France.;2. Ecole Supérieure de Biotechnologie de Strasbourg, CNRS - Université de Strasbourg UMR, Bld Sébastien Brant, F-67412 Illkirch, France;3. Laboratoire de Biophotonique et Pharmacologie, CNRS - Université de Strasbourg UMR 7213, Faculté de Pharmacie, 74, Route du Rhin, F-67400 Illkirch, France;4. Université de Lorraine, CNRS, CRAN, UMR 7039, Campus Sciences, 54500 Vandoeuvre les Nancy, France |
| |
Abstract: | Empirical evidence shows that repositories of business process models used in industrial practice contain significant amounts of duplication. This duplication arises for example when the repository covers multiple variants of the same processes or due to copy-pasting. Previous work has addressed the problem of efficiently retrieving exact clones that can be refactored into shared subprocess models. This paper studies the broader problem of approximate clone detection in process models. The paper proposes techniques for detecting clusters of approximate clones based on two well-known clustering algorithms: DBSCAN and Hierarchical Agglomerative Clustering (HAC). The paper also defines a measure of standardizability of an approximate clone cluster, meaning the potential benefit of replacing the approximate clones with a single standardized subprocess. Experiments show that both techniques, in conjunction with the proposed standardizability measure, accurately retrieve clusters of approximate clones that originate from copy-pasting followed by independent modifications to the copied fragments. Additional experiments show that both techniques produce clusters that match those produced by human subjects and that are perceived to be standardizable. |
| |
Keywords: | Business process model Clone detection Model collection Repository Standardization |
本文献已被 ScienceDirect 等数据库收录! |
|