首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
We describe an approach to machine learning from numerical data that combines both qualitative and numerical learning. This approach is carried out in two stages: (1) induction of a qualitative model from numerical examples of the behaviour of a physical system, and (2) induction of a numerical regression function that both respects the qualitative constraints and fits the training data numerically. We call this approach Q2 learning, which stands for Qualitatively faithful Quantitative learning. Induced numerical models are “qualitatively faithful” in the sense that they respect qualitative trends in the learning data. Advantages of Q2 learning are that the induced qualitative model enables a (possibly causal) explanation of relations among the variables in the modelled system, and that numerical predictions are guaranteed to be qualitatively consistent with the qualitative model which alleviates the interpretation of the predictions. Moreover, as we show experimentally the qualitative model's guidance of the quantitative modelling process leads to predictions that may be considerably more accurate than those obtained by state-of-the-art numerical learning methods. The experiments include an application of Q2 learning to the identification of a car wheel suspension system—a complex, industrially relevant mechanical system.  相似文献   

2.
We revisit an application developed originally using abductive Inductive Logic Programming (ILP) for modeling inhibition in metabolic networks. The example data was derived from studies of the effects of toxins on rats using Nuclear Magnetic Resonance (NMR) time-trace analysis of their biofluids together with background knowledge representing a subset of the Kyoto Encyclopedia of Genes and Genomes (KEGG). We now apply two Probabilistic ILP (PILP) approaches—abductive Stochastic Logic Programs (SLPs) and PRogramming In Statistical modeling (PRISM) to the application. Both approaches support abductive learning and probability predictions. Abductive SLPs are a PILP framework that provides possible worlds semantics to SLPs through abduction. Instead of learning logic models from non-probabilistic examples as done in ILP, the PILP approach applied in this paper is based on a general technique for introducing probability labels within a standard scientific experimental setting involving control and treated data. Our results demonstrate that the PILP approach provides a way of learning probabilistic logic models from probabilistic examples, and the PILP models learned from probabilistic examples lead to a significant decrease in error accompanied by improved insight from the learned results compared with the PILP models learned from non-probabilistic examples.  相似文献   

3.
Incorporating the quantity and variety of observations in atmospheric and oceanographic assimilation and prediction models has become an increasingly complex task. Data assimilation allows for uneven spatial and temporal data distribution and redundancy to be addressed so that the models can ingest massive data sets. Traditional data assimilation methods introduce Kalman filters and variational approaches. This study introduces a family of algorithms, motivated by advances in machine learning. These algorithms provide an alternative approach to incorporating new observations into the analysis forecast cycle. The application of kernel methods to processing the states of a quasi-geostrophic numerical model is intended to demonstrate the feasibility of the method as a proof-of-concept. The speed, efficiency, accuracy and scalability in recovering unperturbed state trajectories establishes the viability of machine learning for data assimilation.  相似文献   

4.
Attribute-value based representations, standard in today's data mining systems, have a limited expressiveness. Inductive Logic Programming provides an interesting alternative, particularly for learning from structured examples whose parts, each with its own attributes, are related to each other by means of first-order predicates. Several subsets of first-order logic (FOL) with different expressive power have been proposed in Inductive Logic Programming (ILP). The challenge lies in the fact that the more expressive the subset of FOL the learner works with, the more critical the dimensionality of the learning task. The Datalog language is expressive enough to represent realistic learning problems when data is given directly in a relational database, making it a suitable tool for data mining. Consequently, it is important to elaborate techniques that will dynamically decrease the dimensionality of learning tasks expressed in Datalog, just as Feature Subset Selection (FSS) techniques do it in attribute-value learning. The idea of re-using these techniques in ILP runs immediately into a problem as ILP examples have variable size and do not share the same set of literals. We propose here the first paradigm that brings Feature Subset Selection to the level of ILP, in languages at least as expressive as Datalog. The main idea is to first perform a change of representation, which approximates the original relational problem by a multi-instance problem. The representation obtained as the result is suitable for FSS techniques which we adapted from attribute-value learning by taking into account some of the characteristics of the data due to the change of representation. We present the simple FSS proposed for the task, the requisite change of representation, and the entire method combining those two algorithms. The method acts as a filter, preprocessing the relational data, prior to the model building, which outputs relational examples with empirically relevant literals. We discuss experiments in which the method was successfully applied to two real-world domains.  相似文献   

5.
Relational learning can be described as the task of learning first-order logic rules from examples. It has enabled a number of new machine learning applications, e.g. graph mining and link analysis. Inductive Logic Programming (ILP) performs relational learning either directly by manipulating first-order rules or through propositionalization, which translates the relational task into an attribute-value learning task by representing subsets of relations as features. In this paper, we introduce a fast method and system for relational learning based on a novel propositionalization called Bottom Clause Propositionalization (BCP). Bottom clauses are boundaries in the hypothesis search space used by ILP systems Progol and Aleph. Bottom clauses carry semantic meaning and can be mapped directly onto numerical vectors, simplifying the feature extraction process. We have integrated BCP with a well-known neural-symbolic system, C-IL2P, to perform learning from numerical vectors. C-IL2P uses background knowledge in the form of propositional logic programs to build a neural network. The integrated system, which we call CILP++, handles first-order logic knowledge and is available for download from Sourceforge. We have evaluated CILP++ on seven ILP datasets, comparing results with Aleph and a well-known propositionalization method, RSD. The results show that CILP++ can achieve accuracy comparable to Aleph, while being generally faster, BCP achieved statistically significant improvement in accuracy in comparison with RSD when running with a neural network, but BCP and RSD perform similarly when running with C4.5. We have also extended CILP++ to include a statistical feature selection method, mRMR, with preliminary results indicating that a reduction of more than 90 % of features can be achieved with a small loss of accuracy.  相似文献   

6.
Inductive Logic Programming (ILP) deals with the problem of finding a hypothesis covering positive examples and excluding negative examples, where both hypotheses and examples are expressed in first-order logic. In this paper we employ constraint satisfaction techniques to model and solve a problem known as template ILP consistency, which assumes that the structure of a hypothesis is known and the task is to find unification of the contained variables. In particular, we present a constraint model with index variables accompanied by a Boolean model to strengthen inference and hence improve efficiency. The efficiency of models is demonstrated experimentally.  相似文献   

7.
王洪利   《智能系统学报》2013,8(4):367-371
针对复杂系统仿真中系统信息缺乏、QSIM建模方法可使用微分方程的特点,在复杂系统仿真方法中,提出了一种GM(1,N)和QSIM相结合的定性建模方法.首先给出了相关研究的现状,然后提出了GM(1,N)和QSIM相结合的仿真建模方法的基本原理和主要过程.最后通过一个系统仿真建模实例验证了该方法的可行性.结果表明,该方法具有充分利用系统较少信息,能将定量和定性信息有效地融合与复杂系统的仿真建模之中的特点.  相似文献   

8.
9.
The paper presents the LABAQM system developed for the analysis of laboratory animal behaviours. It is based on qualitative modelling of animal motions. We are dealing with the cognitive phase of the laboratory animal behaviour analysis as a part of the pharmacological experiments. The system is based on the quantitative data from the tracking application and incomplete domain background knowledge. The LABAQM system operates in two main phases: behaviour learning and behaviour analysis. The behaviour learning and behaviour analysis phase are based on symbol sequences, obtained by the transformation of the quantitative data. Behaviour learning phase includes supervised learning procedure, unsupervised learning procedure and their combination. We have shown that the qualitative model of behaviour can be modelled by hidden Markov models. The fusion of supervised and unsupervised learning procedures produces more robust models of characteristic behaviours, which are used in the behaviour analysis phase.  相似文献   

10.
基于归纳逻辑程序设计的学习方法及其实现的研究   总被引:1,自引:0,他引:1  
归纳逻辑程序设计是机器学习领域中的一个新方法,它研究的是从实例和背景知识进行逻辑程序(新知识)的构造.本文介绍了归纳逻辑程序设计的基本理论和方法,并介绍了这种学习方法在专家系统中的应用情况.  相似文献   

11.
We present a new method and tool for activity modelling through qualitative sequential data analysis. In particular, we address the question of constructing a symbolic abstract representation of an activity from an activity trace. We use knowledge engineering techniques to help the analyst build an ontology of the activity, that is, a set of symbols and hierarchical semantics that supports the construction of activity models. The ontology construction is pragmatic, evolutionist and driven by the analyst in accordance with their modelling goals and their research questions. Our tool helps the analyst define transformation rules to process the raw trace into abstract traces based on the ontology. The analyst visualizes the abstract traces and iteratively tests the ontology, the transformation rules and the visualization format to confirm the models of activity. With this tool and this method, we found innovative ways to represent a car‐driving activity at different levels of abstraction from activity traces collected from an instrumented vehicle. As examples, we report two new strategies of lane changing on motorways that we have found and modelled with this approach.  相似文献   

12.
Scaling Up Inductive Logic Programming by Learning from Interpretations   总被引:4,自引:0,他引:4  
When comparing inductive logic programming (ILP) and attribute-value learning techniques, there is a trade-off between expressive power and efficiency. Inductive logic programming techniques are typically more expressive but also less efficient. Therefore, the data sets handled by current inductive logic programming systems are small according to general standards within the data mining community. The main source of inefficiency lies in the assumption that several examples may be related to each other, so they cannot be handled independently.Within the learning from interpretations framework for inductive logic programming this assumption is unnecessary, which allows to scale up existing ILP algorithms. In this paper we explain this learning setting in the context of relational databases. We relate the setting to propositional data mining and to the classical ILP setting, and show that learning from interpretations corresponds to learning from multiple relations and thus extends the expressiveness of propositional learning, while maintaining its efficiency to a large extent (which is not the case in the classical ILP setting).As a case study, we present two alternative implementations of the ILP system TILDE (Top-down Induction of Logical DEcision trees): TILDEclassic, which loads all data in main memory, and TILDELDS, which loads the examples one by one. We experimentally compare the implementations, showing TILDELDS can handle large data sets (in the order of 100,000 examples or 100 MB) and indeed scales up linearly in the number of examples.  相似文献   

13.
Inductive logic programming (ILP) is a sub‐field of machine learning that provides an excellent framework for multi‐relational data mining applications. The advantages of ILP have been successfully demonstrated in complex and relevant industrial and scientific problems. However, to produce valuable models, ILP systems often require long running times and large amounts of memory. In this paper we address fundamental issues that have direct impact on the efficiency of ILP systems. Namely, we discuss how improvements in the indexing mechanisms of an underlying logic programming system benefit ILP performance. Furthermore, we propose novel data structures to reduce memory requirements and we suggest a new lazy evaluation technique to search the hypothesis space more efficiently. These proposals have been implemented in the April ILP system and evaluated using several well‐known data sets. The results observed show significant improvements in running time without compromising the accuracy of the models generated. Indeed, the combined techniques achieve several order of magnitudes speedup in some data sets. Moreover, memory requirements are reduced in nearly half of the data sets. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

14.
Nearly two decades of research in the area of Inductive Logic Programming (ILP) have seen steady progress in clarifying its theoretical foundations and regular demonstrations of its applicability to complex problems in very diverse domains. These results are necessary, but not sufficient, for ILP to be adopted as a tool for data analysis in an era of very large machine-generated scientific and industrial datasets, accompanied by programs that provide ready access to complex relational information in machine-readable forms (ontologies, parsers, and so on). Besides the usual issues about the ease of use, ILP is now confronted with questions of implementation. We are concerned here with two of these, namely: can an ILP system construct models efficiently when (a) Dataset sizes are too large to fit in the memory of a single machine; and (b) Search space sizes becomes prohibitively large to explore using a single machine. In this paper, we examine the applicability to ILP of a popular distributed computing approach that provides a uniform way for performing data and task parallel computations in ILP. The MapReduce programming model allows, in principle, very large numbers of processors to be used without any special understanding of the underlying hardware or software involved. Specifically, we show how the MapReduce approach can be used to perform the coverage-test that is at the heart of many ILP systems, and to perform multiple searches required by a greedy set-covering algorithm used by some popular ILP systems. Our principal findings with synthetic and real-world datasets for both data and task parallelism are these: (a) Ignoring overheads, the time to perform the computations concurrently increases with the size of the dataset for data parallelism and with the size of the search space for task parallelism. For data parallelism this increase is roughly in proportion to increases in dataset size; (b) If a MapReduce implementation is used as part of an ILP system, then benefits for data parallelism can only be expected above some minimal dataset size, and for task parallelism can only be expected above some minimal search-space size; and (c) The MapReduce approach appears better suited to exploit data-parallelism in ILP.  相似文献   

15.
Inductive logic programming (ILP) induces concepts from a set of positive examples, a set of negative examples, and background knowledge. ILP has been applied on tasks such as natural language processing, finite element mesh design, network mining, robotics, and drug discovery. These data sets usually contain numerical and multivalued categorical attributes; however, only a few relational learning systems are capable of handling them in an efficient way. In this paper, we present an evolutionary approach, called Grouping and Discretization for Enriching the Background Knowledge (GDEBaK), to deal with numerical and multivalued categorical attributes in ILP. This method uses evolutionary operators to create and test numerical splits and subsets of categorical values in accordance with a fitness function. The best subintervals and subsets are added to the background knowledge before constructing candidate hypotheses. We implemented GDEBaK embedded in Aleph and compared it to lazy discretization in Aleph and discretization in Top‐down Induction of Logical Decision Trees (TILDE) systems. The results obtained showed that our method improves accuracy and reduces the number of rules in most cases. Finally, we discuss these results and possible lines for future work.  相似文献   

16.
The growth of machine-generated relational databases, both in the sciences and in industry, is rapidly outpacing our ability to extract useful information from them by manual means. This has brought into focus machine learning techniques like Inductive Logic Programming (ILP) that are able to extract human-comprehensible models for complex relational data. The price to pay is that ILP techniques are not efficient: they can be seen as performing a form of discrete optimisation, which is known to be computationally hard; and the complexity is usually some super-linear function of the number of examples. While little can be done to alter the theoretical bounds on the worst-case complexity of ILP systems, some practical gains may follow from the use of multiple processors. In this paper we survey the state-of-the-art on parallel ILP. We implement several parallel algorithms and study their performance using some standard benchmarks. The principal findings of interest are these: (1) of the techniques investigated, one that simply constructs models in parallel on each processor using a subset of data and then combines the models into a single one, yields the best results; and (2) sequential (approximate) ILP algorithms based on randomized searches have lower execution times than (exact) parallel algorithms, without sacrificing the quality of the solutions found. This is an extended version of the paper entitled Strategies to Parallelize ILP Systems, published in the Proceedings of the 15th International Conference on Inductive Logic Programming (ILP 2005), vol. 3625 of LNAI, pp. 136–153, Springer-Verlag.  相似文献   

17.
Several applications of Inductive Logic Programming (ILP) are presented. These belong to various areas of engineering, including mechanical, environmental, software, and dynamical systems engineering. The particular applications are finite element mesh design, biological classification of river water quality, data reification, inducing program invariants, learning qualitative models of dynamic systems, and learning control rules for dynamic systems. A number of other applications are briefly mentioned. Finally, a discussion of the advantages and disadvantages of ILP as compared to other approaches to machine learning is given.  相似文献   

18.
Inductive Logic Programming (ILP) combines rule-based and statistical artificial intelligence methods, by learning a hypothesis comprising a set of rules given background knowledge and constraints for the search space. We focus on extending the XHAIL algorithm for ILP which is based on Answer Set Programming and we evaluate our extensions using the Natural Language Processing application of sentence chunking. With respect to processing natural language, ILP can cater for the constant change in how we use language on a daily basis. At the same time, ILP does not require huge amounts of training examples such as other statistical methods and produces interpretable results, that means a set of rules, which can be analysed and tweaked if necessary. As contributions we extend XHAIL with (i) a pruning mechanism within the hypothesis generalisation algorithm which enables learning from larger datasets, (ii) a better usage of modern solver technology using recently developed optimisation methods, and (iii) a time budget that permits the usage of suboptimal results. We evaluate these improvements on the task of sentence chunking using three datasets from a recent SemEval competition. Results show that our improvements allow for learning on bigger datasets with results that are of similar quality to state-of-the-art systems on the same task. Moreover, we compare the hypotheses obtained on datasets to gain insights on the structure of each dataset.  相似文献   

19.
It is well known that many hard tasks considered in machine learning and data mining can be solved in a rather simple and robust way with an instance- and distance-based approach. In this work we present another difficult task: learning, from large numbers of complex performances by concert pianists, to play music expressively. We model the problem as a multi-level decomposition and prediction task. We show that this is a fundamentally relational learning problem and propose a new similarity measure for structured objects, which is built into a relational instance-based learning algorithm named DISTALL. Experiments with data derived from a substantial number of Mozart piano sonata recordings by a skilled concert pianist demonstrate that the approach is viable. We show that the instance-based learner operating on structured, relational data outperforms a propositional k-NN algorithm. In qualitative terms, some of the piano performances produced by DISTALL after learning from the human artist are of substantial musical quality; one even won a prize in an international ‘computer music performance’ contest. The experiments thus provide evidence of the capabilities of ILP in a highly complex domain such as music. Editors: Tamás Horváth and Akihiro Yamamoto  相似文献   

20.
To date, Inductive Logic Programming (ILP) systems have largely assumed that all data needed for learning have been provided at the onset of model construction. Increasingly, for application areas like telecommunications, astronomy, text processing, financial markets and biology, machine-generated data are being generated continuously and on a vast scale. We see at least four kinds of problems that this presents for ILP: (1) it may not be possible to store all of the data, even in secondary memory; (2) even if it were possible to store the data, it may be impractical to construct an acceptable model using partitioning techniques that repeatedly perform expensive coverage or subsumption-tests on the data; (3) models constructed at some point may become less effective, or even invalid, as more data become available (exemplified by the “drift” problem when identifying concepts); and (4) the representation of the data instances may need to change as more data become available (a kind of “language drift” problem). In this paper, we investigate the adoption of a stream-based on-line learning approach to relational data. Specifically, we examine the representation of relational data in both an infinite-attribute setting, and in the usual fixed-attribute setting, and develop implementations that use ILP engines in combination with on-line model-constructors. The behaviour of each program is investigated using a set of controlled experiments, and performance in practical settings is demonstrated by constructing complete theories for some of the largest biochemical datasets examined by ILP systems to date, including one with a million examples; to the best of our knowledge, the first time this has been empirically demonstrated with ILP on a real-world data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号