共查询到20条相似文献,搜索用时 0 毫秒
1.
Discovering expressive process models by clustering log traces 总被引:5,自引:0,他引:5
Greco G. Guzzo A. Ponieri L. Sacca D. 《Knowledge and Data Engineering, IEEE Transactions on》2006,18(8):1010-1027
Process mining techniques have recently received notable attention in the literature; for their ability to assist in the (re)design of complex processes by automatically discovering models that explain the events registered in some log traces provided as input. Following this line of research, the paper investigates an extension of such basic approaches, where the identification of different variants for the process is explicitly accounted for, based on the clustering of log traces. Indeed, modeling each group of similar executions with a different schema allows us to single out "conformant" models, which, specifically, minimize the number of modeled enactments that are extraneous to the process semantics. Therefore, a novel process mining framework is introduced and some relevant computational issues are deeply studied. As finding an exact solution to such an enhanced process mining problem is proven to require high computational costs, in most practical cases, a greedy approach is devised. This is founded on an iterative, hierarchical, refinement of the process model, where, at each step, traces sharing similar behavior patterns are clustered together and equipped with a specialized schema. The algorithm guarantees that each refinement leads to an increasingly sound mDdel, thus attaining a monotonic search. Experimental results evidence the validity of the approach with respect to both effectiveness and scalability. 相似文献
2.
3.
Of the many problems facing the casual user of a data-base enquiry system probably the most difficult is gaining a competent understanding of the associated query language. Given that he manages to construct a well-formed query expression there is no guarantee that it exactly reflects the original question. The research described here concerns the design of an interpreter from a formal query language to natural language to aid query verification in a relational data-base environment. The system is being developed to work in conjunction with the ICL Natural Language enquiry interface NEL which translates English query expressions into the formal query language QUERYMASTER. The requirements of a natural-language paraphraser are first discussed and the nature of an intermediate representation is defined and motivated with respect to an applied relational calculus. Consideration is then given to choosing a suitable underlying framework with which to underpin the practical work and the choice of Lexical Functional Grammar as the guiding theory is explained. Finally, the research is set in the context of a longer-term programme to construct a multi-purpose user interface incorporating facilities for handling data-base metaknowledge and query building, 相似文献
4.
提出了一种从关系数据库半自动学习OWL本体的方法.该方法在形式化表示关系数据库模式和OWL本体的基础上,遵循从关系数据库模式到OWL本体的一组通用映射方法和规则,并基于Java 2平台实现了原型工具OntoLeamer.利用OntoLeamer进行的典型案例研究表明了该方法的有效性. 相似文献
5.
Dongyi Wang Jidong Ge Hao Hu Bin Luo Liguo Huang 《Expert systems with applications》2012,39(15):11970-11978
The aim of process mining is to discover the process model from the event log which is recorded by the information system. Typical steps of process mining algorithm can be described as: (1) generating event traces from event log, (2) analyzing event traces and obtaining ordering relations of tasks, (3) generating process model with ordering relations of tasks. The first two steps could be very time consuming involving millions of events and thousands of event traces. This paper presents a novel algorithm (λ-algorithm) which almost eliminates these two steps in generating event traces from event log and analyzing event traces so as to reduce the performance of process mining algorithm. Firstly, we retrieve the event multiset (input data of algorithm marked as MS) which records the frequency of each event but ignores their orders when extracted from event logs. The event in event multiset contains the information of post-activities. Secondly, we obtain ordering relations from event multiset. The ordering relations contain causal dependency, potential parallelism and non-potential parallelism. Finally, we discover a process models with ordering relations. The complexity of λ-algorithm is only bound up with the event classes (the set of events in event logs) that has significantly improved the performance of existing process mining algorithms and is expected to be more practical in real-world process mining based on event logs, as well as being able to detect SWF-nets, short-loops and most of implicit dependency (generated by non-free choice constructions). 相似文献
6.
Discovering frequent episodes and learning hidden Markov models: a formal connection 总被引:7,自引:0,他引:7
Srivatsan Laxman Sastry P.S. Unnikrishnan K.P. 《Knowledge and Data Engineering, IEEE Transactions on》2005,17(11):1505-1517
This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery. 相似文献
7.
Michael Jamieson Yulia Eskin Afsaneh Fazly Suzanne Stevenson Sven J. Dickinson 《Computer Vision and Image Understanding》2012,116(7):842-853
We address the problem of automatically learning the recurring associations between the visual structures in images and the words in their associated captions, yielding a set of named object models that can be used for subsequent image annotation. In previous work, we used language to drive the perceptual grouping of local features into configurations that capture small parts (patches) of an object. However, model scope was poor, leading to poor object localization during detection (annotation), and ambiguity was high when part detections were weak. We extend and significantly revise our previous framework by using language to drive the perceptual grouping of parts, each a configuration in the previous framework, into hierarchical configurations that offer greater spatial extent and flexibility. The resulting hierarchical multipart models remain scale, translation and rotation invariant, but are more reliable detectors and provide better localization. Moreover, unlike typical frameworks for learning object models, our approach requires no bounding boxes around the objects to be learned, can handle heavily cluttered training scenes, and is robust in the face of noisy captions, i.e., where objects in an image may not be named in the caption, and objects named in the caption may not appear in the image. We demonstrate improved precision and recall in annotation over the non-hierarchical technique and also show extended spatial coverage of detected objects. 相似文献
8.
Henning Fernau 《Information and Computation》2009,207(4):521-541
We describe algorithms that directly infer very simple forms of 1-unambiguous regular expressions from positive data. Thus, we characterize the regular language classes that can be learned this way, both in terms of regular expressions and in terms of (not necessarily minimal) deterministic finite automata. 相似文献
9.
Minoru Ito Motoaki Iwasaki Kenichi Taniguchi Tadao Kasami 《Theoretical computer science》1984,34(3):315-335
In relational databases, a query can be formulated in terms of a relational algebra expression using projection, selection, restriction, cross product and union. In this paper, we consider a problem, called the membership problem, of determining whether a given dependency d is valid in a given relational expression E over a given database scheme R that is, whether every instance of the view scheme defined by E satisfies d (assuming that the underlying constraints in R are always satisfied).Consider the case where each relation scheme in R is associated with functional dependencies (FDs) as constraints, and d is an FD. Then the complement of the membership problem is NP-complete. However, if E contains no union, then the membership problem can be solved in polynomial time. Furthermore, if E contains neither a union nor a projection, then we can construct in polynomial time a cover for valid FDs in E, that is, a set of FDs which implies every valid FD in E.Consider the case where each relation scheme in R is associated with multivalued dependencies (MVDs) as well as FDs, and d is an FD or an MVD. Even if E consists of selections and cross products only, the membership problem is NP-hard. However, if E contains no union, and each relation scheme name in R occurs in E at most once, then the membership problem can be solved in polynomial time. As a corollary of this result, it can be determined in polynomial time whether a given FD or MVD is valid in , where R1,…,Rs are relation schemes with FDs and MVDs, and is the natural join of Ri and Rj. 相似文献
10.
The built-up in Information Technology capital fueled by the Internet and cost-effectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. It is now evident that physical connectivity (the ability to exchange bits and bytes) is no longer adequate: the integration of data from autonomous and heterogeneous systems calls for the prior identification and resolution of semantic conflicts that may be present. Unfortunately, this requires the system integrator to sift through the data from disparate systems in a painstaking manner. We suggest that this process can be partially automated by presenting a methodology and technique for the discovery of potential semantic conflicts as well as the underlying data transformation needed to resolve the conflicts. Our methodology begins by classifying data value conflicts into two categories: context independent and context dependent. While context independent conflicts are usually caused by unexpected errors, the context dependent conflicts are primarily a result of the heterogeneity of underlying data sources. To facilitate data integration, data value conversion rules are proposed to describe the quantitative relationships among data values involving context dependent conflicts. A general approach is proposed to discover data value conversion rules from the data. The approach consists of the five major steps: relevant attribute analysis, candidate model selection, conversion function generation, conversion function selection and conversion rule formation. It is being implemented in a prototype system, DIRECT, for business data using statistics based techniques. Preliminary study using both synthetic and real world data indicated that the proposed approach is promising. 相似文献
11.
Recent meta-learning approaches are oriented towards algorithm selection, optimization or recommendation of existing algorithms. In this article we show how data-tailored algorithms can be constructed from building blocks on small data sub-samples. Building blocks, typically weak learners, are optimized and evolved into data-tailored hierarchical ensembles. Good-performing algorithms discovered by evolutionary algorithm can be reused on data sets of comparable complexity. Furthermore, these algorithms can be scaled up to model large data sets. We demonstrate how one particular template (simple ensemble of fast sigmoidal regression models) outperforms state-of-the-art approaches on the Airline data set. Evolved hierarchical ensembles can therefore be beneficial as algorithmic building blocks in meta-learning, including meta-learning at scale. 相似文献
12.
Srinivasan Sriram Dickens Charles Augustine Eriq Farnadi Golnoosh Getoor Lise 《Machine Learning》2022,111(8):2799-2838
Machine Learning - Statistical relational learning (SRL) frameworks are effective at defining probabilistic models over complex relational data. They often use weighted first-order logical rules... 相似文献
13.
Sriraam Natarajan Tushar Khot Kristian Kersting Bernd Gutmann Jude Shavlik 《Machine Learning》2012,86(1):25-56
Dependency networks approximate a joint probability distribution over multiple random variables as a product of conditional distributions. Relational Dependency Networks (RDNs) are graphical models that extend dependency networks to relational domains. This higher expressivity, however, comes at the expense of a more complex model-selection problem: an unbounded number of relational abstraction levels might need to be explored. Whereas current learning approaches for RDNs learn a single probability tree per random variable, we propose to turn the problem into a series of relational function-approximation problems using gradient-based boosting. In doing so, one can easily induce highly complex features over several iterations and in turn estimate quickly a very expressive model. Our experimental results in several different data sets show that this boosting method results in efficient learning of RDNs when compared to state-of-the-art statistical relational learning approaches. 相似文献
14.
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable
agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement
learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value
has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper
we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian
processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard
prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has
low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance
of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the
Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance
based regression as a generalization algorithm for RRL.
Editors: David Page and Akihiro Yamamoto 相似文献
15.
Organization of relational models for scene analysis 总被引:1,自引:0,他引:1
Shapiro LG Haralick RM 《IEEE transactions on pattern analysis and machine intelligence》1982,(6):595-602
Relational models are commonly used in scene analysis systems. Most such systems are experimental and deal with only a small number of models. Unknown objects to be analyzed are usually sequentially compared to each model. In this paper, we present some ideas for organizing a large database of relational models. We define a simple relational distance measure, prove it is a metric, and using this measure, describe two organizational/access methods: clustering and binary search trees. We illustrate these methods with a set of randomly generated graphs. 相似文献
16.
Identifier attributes—very high-dimensional categorical attributes such as particular product ids or people's names—rarely
are incorporated in statistical modeling. However, they can play an important role in relational modeling: it may be informative
to have communicated with a particular set of people or to have purchased a particular set of products. A key limitation of
existing relational modeling techniques is how they aggregate bags (multisets) of values from related entities. The aggregations
used by existing methods are simple summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM,
or COUNT. This paper's main contribution is the introduction of aggregation operators that capture more information about
the value distributions, by storing meta-data about value distributions and referencing this meta-data when aggregating—for
example by computing class-conditional distributional distances. Such aggregations are particularly important for aggregating
values from high-dimensional categorical attributes, for which the simple aggregates provide little information. In the first
half of the paper we provide general guidelines for designing aggregation operators, introduce the new aggregators in the
context of the relational learning system ACORA (Automated Construction of Relational Attributes), and provide theoretical
justification. We also conjecture special properties of identifier attributes, e.g., they proxy for unobserved attributes
and for information deeper in the relationship network. In the second half of the paper we provide extensive empirical evidence
that the distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical attributes, and in
support of the aforementioned conjectures.
Editors: Hendrik Blockeel, David Jensen and Stefan Kramer
An erratum to this article is available at . 相似文献
17.
Jure Žabkar Martin Možina Ivan Bratko Janez Demšar 《Artificial Intelligence》2011,175(9-10):1604-1619
Qualitative models describe relations between the observed quantities in qualitative terms. In predictive modelling, a qualitative model tells whether the output increases or decreases with the input. We describe Padé, a new method for qualitative learning which estimates partial derivatives of the target function from training data and uses them to induce qualitative models of the target function. We formulated three methods for computation of derivatives, all based on using linear regression on local neighbourhoods. The methods were empirically tested on artificial and real-world data. We also provide a case study which shows how the developed methods can be used in practice. 相似文献
18.
Association Link Network (ALN) is a kind of Semantic Link Network built by mining the association relations among multimedia Web resources for effectively supporting Web intelligent application such as Web-based learning, and semantic search. This paper explores the Small-World properties of ALN to provide theoretical support for association learning (i.e., a simple idea of “learning from Web resources”). First, a filtering algorithm of ALN is proposed to generate the filtered status of ALN, aiming to observe the Small-World properties of ALN at given network size and filtering parameter. Comparison of the Small-World properties between ALN and random graph shows that ALN reveals prominent Small-World characteristic. Then, we investigate the evolution of Small-World properties over time at several incremental network sizes. The average path length of ALN scales with the network size, while clustering coefficient of ALN is independent of the network size. And we find that ALN has smaller average path length and higher clustering coefficient than WWW at the same network size and network average degree. After that, based on the Small-World characteristic of ALN, we present an Association Learning Model (ALM), which can efficiently provide association learning of Web resources in breadth or depth for learners. 相似文献
19.
Kate Smith-Miles Jano van Hemert 《Annals of Mathematics and Artificial Intelligence》2011,61(2):87-104
The suitability of an optimisation algorithm selected from within an algorithm portfolio depends upon the features of the particular instance to be solved. Understanding the relative strengths and weaknesses of different algorithms in the portfolio is crucial for effective performance prediction, automated algorithm selection, and to generate knowledge about the ideal conditions for each algorithm to influence better algorithm design. Relying on well-studied benchmark instances, or randomly generated instances, limits our ability to truly challenge each of the algorithms in a portfolio and determine these ideal conditions. Instead we use an evolutionary algorithm to evolve instances that are uniquely easy or hard for each algorithm, thus providing a more direct method for studying the relative strengths and weaknesses of each algorithm. The proposed methodology ensures that the meta-data is sufficient to be able to learn the features of the instances that uniquely characterise the ideal conditions for each algorithm. A case study is presented based on a comprehensive study of the performance of two heuristics on the Travelling Salesman Problem. The results show that prediction of search effort as well as the best performing algorithm for a given instance can be achieved with high accuracy. 相似文献
20.
Marco A. Casanova 《Information Processing Letters》1983,16(3):153-160
A formal system for reasoning about functional dependencies (FDs) and subset dependencies (SDs) defined over relational expressions is described. An FD e:X → Y indicates that Y is functionally dependent on X in the relation denoted by expression e; an SD e ? f indicates that the relation denoted by e is a subset of that denoted by f. The system is shown to be sound and complete by resorting to the analytic tableaux method. Applications of the system include the problem of determining if a constraint of a subschema is implied by the constraints of the base schema and the development of database design methodologies similar to normalization. 相似文献