首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Inductive logic programming (ILP) is a sub‐field of machine learning that provides an excellent framework for multi‐relational data mining applications. The advantages of ILP have been successfully demonstrated in complex and relevant industrial and scientific problems. However, to produce valuable models, ILP systems often require long running times and large amounts of memory. In this paper we address fundamental issues that have direct impact on the efficiency of ILP systems. Namely, we discuss how improvements in the indexing mechanisms of an underlying logic programming system benefit ILP performance. Furthermore, we propose novel data structures to reduce memory requirements and we suggest a new lazy evaluation technique to search the hypothesis space more efficiently. These proposals have been implemented in the April ILP system and evaluated using several well‐known data sets. The results observed show significant improvements in running time without compromising the accuracy of the models generated. Indeed, the combined techniques achieve several order of magnitudes speedup in some data sets. Moreover, memory requirements are reduced in nearly half of the data sets. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

3.
Problems concerned with learning the relationships between molecular structure and activity have been important test-beds for Inductive Logic programming (ILP) systems. In this paper we examine these applications and empirically evaluate the extent to which a first-order representation was required. We compared ILP theories with those constructed using standard linear regression and a decision-tree learner on a series of progressively more difficult problems. When a propositional encoding is feasible for the feature-based algorithms, we show that such algorithms are capable of matching the predictive accuracies of an ILP theory. However, as the complexity of the compounds considered increased, propositional encodings becomes intractable. In such cases, our results show that ILP programs can still continue to construct accurate, understandable theories. Based on this evidence, we propose future work to realise fully the potential of ILP in structure-activity problem.  相似文献   

4.
Scaling Up Inductive Logic Programming by Learning from Interpretations   总被引:4,自引:0,他引:4  
When comparing inductive logic programming (ILP) and attribute-value learning techniques, there is a trade-off between expressive power and efficiency. Inductive logic programming techniques are typically more expressive but also less efficient. Therefore, the data sets handled by current inductive logic programming systems are small according to general standards within the data mining community. The main source of inefficiency lies in the assumption that several examples may be related to each other, so they cannot be handled independently.Within the learning from interpretations framework for inductive logic programming this assumption is unnecessary, which allows to scale up existing ILP algorithms. In this paper we explain this learning setting in the context of relational databases. We relate the setting to propositional data mining and to the classical ILP setting, and show that learning from interpretations corresponds to learning from multiple relations and thus extends the expressiveness of propositional learning, while maintaining its efficiency to a large extent (which is not the case in the classical ILP setting).As a case study, we present two alternative implementations of the ILP system TILDE (Top-down Induction of Logical DEcision trees): TILDEclassic, which loads all data in main memory, and TILDELDS, which loads the examples one by one. We experimentally compare the implementations, showing TILDELDS can handle large data sets (in the order of 100,000 examples or 100 MB) and indeed scales up linearly in the number of examples.  相似文献   

5.
Rough Problem Settings for ILP Dealing With Imperfect Data   总被引:1,自引:0,他引:1  
This paper applies rough set theory to Inductive Logic Programming (ILP, a relatively new method in machine learning) to deal with imperfect data occurring in large real-world applications. We investigate various kinds of imperfect data in ILP and propose rough problem settings to deal with incomplete background knowledge (where essential predicates/clauses are missing), indiscernible data (where some examples belong to both sets of positive and negative training examples), missing classification (where some examples are unclassified) and too strong declarative bias (hence the failure in searching for solutions). The rough problem settings relax the strict requirements in the standard normal problem setting for ILP, so that rough but useful hypotheses can be induced from imperfect data. We give simple measures of learning quality for the rough problem settings. For other kinds of imperfect data (noise data, too sparse data, missing values, real-valued data, etc.), while referring to their traditional handling techniques, we also point out the possibility of new methods based on rough set theory.  相似文献   

6.
刘振  张志政 《计算机科学》2015,42(1):220-226
动作模型学习可以使Agent主动适应动态环境中的变化,从而提高Agent的自治性,同时也可为动态域建模提供一个初步模型,为后期的模型完善和修改提供了基础.通过结合归纳逻辑程序设计(Inductive Logic Program-ming,ILP)和回答集程序设计(Answer Set Programming,ASP),设计了一个学习B语言描述的动作模型算法,该算法可以在混合规模的动态域中进行学习,并采用经典规划实例验证了该学习算法的有效性.  相似文献   

7.
基于归纳逻辑程序设计的学习方法及其实现的研究   总被引:1,自引:0,他引:1  
归纳逻辑程序设计是机器学习领域中的一个新方法,它研究的是从实例和背景知识进行逻辑程序(新知识)的构造.本文介绍了归纳逻辑程序设计的基本理论和方法,并介绍了这种学习方法在专家系统中的应用情况.  相似文献   

8.
针对目前归纳逻辑程序设计(inductive logic programming,ILP)系统要求训练数据充分且无法利用无标记数据的不足,提出了一种利用无标记数据学习一阶规则的算法——关系tri-training(relational-tri-training,R-tri-training)算法。该算法将基于命题逻辑表示的半监督学习算法tri-training的思想引入到基于一阶逻辑表示的ILP系统,在ILP框架下研究如何利用无标记样例信息辅助分类器训练。R-tri-training算法首先根据标记数据和背景知识初始化三个不同的ILP系统,然后迭代地用无标记样例对三个分类器进行精化,即如果两个分类器对一个无标记样例的标记结果一致,则在一定条件下该样例将被标记给另一个分类器作为新的训练样例。标准数据集上实验结果表明:R-tri-training能有效地利用无标记数据提高学习性能,且R-tri-training算法性能优于GILP(genetic inductive logic programming)、NFOIL、KFOIL和ALEPH。  相似文献   

9.
Inductive Logic Programming (ILP) studies learning from examples, within the framework provided by clausal logic. ILP has become a popular subject in the field of data mining due to its ability to discover patterns in relational domains. Several ILP-based concept discovery systems are developed which employs various search strategies, heuristics and language pattern limitations. LINUS, GOLEM, CIGOL, MIS, FOIL, PROGOL, ALEPH and WARMR are well-known ILP-based systems. In this work, firstly introductory information about ILP is given, and then the above-mentioned systems and an ILP-based concept discovery system called C2D are briefly described and the fundamentals of their mechanisms are demonstrated on a running example. Finally, a set of experimental results on real-world problems are presented in order to evaluate and compare the performance of the above-mentioned systems.  相似文献   

10.
Inductive Logic Programming (ILP) combines rule-based and statistical artificial intelligence methods, by learning a hypothesis comprising a set of rules given background knowledge and constraints for the search space. We focus on extending the XHAIL algorithm for ILP which is based on Answer Set Programming and we evaluate our extensions using the Natural Language Processing application of sentence chunking. With respect to processing natural language, ILP can cater for the constant change in how we use language on a daily basis. At the same time, ILP does not require huge amounts of training examples such as other statistical methods and produces interpretable results, that means a set of rules, which can be analysed and tweaked if necessary. As contributions we extend XHAIL with (i) a pruning mechanism within the hypothesis generalisation algorithm which enables learning from larger datasets, (ii) a better usage of modern solver technology using recently developed optimisation methods, and (iii) a time budget that permits the usage of suboptimal results. We evaluate these improvements on the task of sentence chunking using three datasets from a recent SemEval competition. Results show that our improvements allow for learning on bigger datasets with results that are of similar quality to state-of-the-art systems on the same task. Moreover, we compare the hypotheses obtained on datasets to gain insights on the structure of each dataset.  相似文献   

11.
Relational learning can be described as the task of learning first-order logic rules from examples. It has enabled a number of new machine learning applications, e.g. graph mining and link analysis. Inductive Logic Programming (ILP) performs relational learning either directly by manipulating first-order rules or through propositionalization, which translates the relational task into an attribute-value learning task by representing subsets of relations as features. In this paper, we introduce a fast method and system for relational learning based on a novel propositionalization called Bottom Clause Propositionalization (BCP). Bottom clauses are boundaries in the hypothesis search space used by ILP systems Progol and Aleph. Bottom clauses carry semantic meaning and can be mapped directly onto numerical vectors, simplifying the feature extraction process. We have integrated BCP with a well-known neural-symbolic system, C-IL2P, to perform learning from numerical vectors. C-IL2P uses background knowledge in the form of propositional logic programs to build a neural network. The integrated system, which we call CILP++, handles first-order logic knowledge and is available for download from Sourceforge. We have evaluated CILP++ on seven ILP datasets, comparing results with Aleph and a well-known propositionalization method, RSD. The results show that CILP++ can achieve accuracy comparable to Aleph, while being generally faster, BCP achieved statistically significant improvement in accuracy in comparison with RSD when running with a neural network, but BCP and RSD perform similarly when running with C4.5. We have also extended CILP++ to include a statistical feature selection method, mRMR, with preliminary results indicating that a reduction of more than 90 % of features can be achieved with a small loss of accuracy.  相似文献   

12.
Structured machine learning: the next ten years   总被引:4,自引:1,他引:3  
The field of inductive logic programming (ILP) has made steady progress, since the first ILP workshop in 1991, based on a balance of developments in theory, implementations and applications. More recently there has been an increased emphasis on Probabilistic ILP and the related fields of Statistical Relational Learning (SRL) and Structured Prediction. The goal of the current paper is to consider these emerging trends and chart out the strategic directions and open problems for the broader area of structured machine learning for the next 10 years.  相似文献   

13.

Inductive logic programming combines both machine learning and logic programming techniques. ILP uses first-order predicate logic restricted to Horn clauses as an underlying language. Thus, programs induced by an ILP system inherit the classical limitations of PROLOG programs. Constraint logic programming avoids some of the limitations of logic programming, and so ILP aims to induce programs that employ this paradigm. Current ILP systems that induce constrained logic programs extend systems based on the normal semantics ofILP. In this article we introduce IC-Log, a new system that induces constrained logic programs and relies on an extension ofa nonmonotonic semantics-based system. We then present an application of IC-Log in the field ofcomputer-aided publishing.  相似文献   

14.
15.
刘宙  程学先  刘宇 《微机发展》2006,16(11):28-31
语义网络数据挖掘是基于语义网络环境的数据挖掘,它给数据挖掘技术的应用研究提出了新的课题。归纳逻辑程序设计是由机器学习与逻辑程序设计交叉所形成的一个研究领域,它为知识工程等人工智能的应用领域提供了新的强有力的技术支持。分析了现有几种常用数据挖掘技术在语义Web环境下应用的局限性,提出了采用归纳逻辑程序设计(ILP)作为语义Web上适合的数据挖掘技术,给出了应用这种技术的算法描述,通过具体实例验证了其可行性。  相似文献   

16.
Attribute-value based representations, standard in today's data mining systems, have a limited expressiveness. Inductive Logic Programming provides an interesting alternative, particularly for learning from structured examples whose parts, each with its own attributes, are related to each other by means of first-order predicates. Several subsets of first-order logic (FOL) with different expressive power have been proposed in Inductive Logic Programming (ILP). The challenge lies in the fact that the more expressive the subset of FOL the learner works with, the more critical the dimensionality of the learning task. The Datalog language is expressive enough to represent realistic learning problems when data is given directly in a relational database, making it a suitable tool for data mining. Consequently, it is important to elaborate techniques that will dynamically decrease the dimensionality of learning tasks expressed in Datalog, just as Feature Subset Selection (FSS) techniques do it in attribute-value learning. The idea of re-using these techniques in ILP runs immediately into a problem as ILP examples have variable size and do not share the same set of literals. We propose here the first paradigm that brings Feature Subset Selection to the level of ILP, in languages at least as expressive as Datalog. The main idea is to first perform a change of representation, which approximates the original relational problem by a multi-instance problem. The representation obtained as the result is suitable for FSS techniques which we adapted from attribute-value learning by taking into account some of the characteristics of the data due to the change of representation. We present the simple FSS proposed for the task, the requisite change of representation, and the entire method combining those two algorithms. The method acts as a filter, preprocessing the relational data, prior to the model building, which outputs relational examples with empirically relevant literals. We discuss experiments in which the method was successfully applied to two real-world domains.  相似文献   

17.
李艳娟  郭茂祖 《电脑学习》2012,2(3):13-17,22
归纳逻辑程序设计是机器学习与逻辑程序设计交叉所形成的一个研究领域,克服了传统机器学习方法的两个主要限制:即知识表示的限制和背景知识利用的限制,成为机器学习的前沿研究课题。首先从归纳逻辑程序设计的产生背景、定义、应用领域及问题背景介绍了归纳逻辑程序设计系统的概貌,对归纳逻辑程序设计方法的研究现状进行了总结和分析,最后探讨了该领域的进一步的研究方向。  相似文献   

18.
Rsim: simulating shared-memory multiprocessors with ILP processors   总被引:1,自引:0,他引:1  
The early 1990s saw several announcements of commercial shared-memory systems using processors that aggressively exploited instruction-level parallelism (ILP), including the MIPS R10000, Hewlett-Packard PA8000, and Intel Pentium Pro. These processors could potentially reduce memory read stalls by overlapping read latency with other operations, possibly changing the nature of performance bottlenecks in the system. The authors' experience with Rsim demonstrates that modeling ILP features is important even in shared-memory multiprocessor systems. In particular, current simple processor-based approximations cannot model significant performance effects for applications exhibiting parallel read misses. Further, recent shared-memory designs such as aggressive implementations of sequential consistency use the aggressive ILP-enhancing features of modern processors that simple processor-based simulators do not model. As microprocessor systems become more complex, the availability of shared infrastructure source code is likely to become increasingly crucial. The authors plan to release a new Rsim version shortly that will include instruction caches, TLBs, multimedia extensions, simultaneous multithreading, Rabbit fast simulation mode, and ports to Linux platforms  相似文献   

19.
To date, Inductive Logic Programming (ILP) systems have largely assumed that all data needed for learning have been provided at the onset of model construction. Increasingly, for application areas like telecommunications, astronomy, text processing, financial markets and biology, machine-generated data are being generated continuously and on a vast scale. We see at least four kinds of problems that this presents for ILP: (1) it may not be possible to store all of the data, even in secondary memory; (2) even if it were possible to store the data, it may be impractical to construct an acceptable model using partitioning techniques that repeatedly perform expensive coverage or subsumption-tests on the data; (3) models constructed at some point may become less effective, or even invalid, as more data become available (exemplified by the “drift” problem when identifying concepts); and (4) the representation of the data instances may need to change as more data become available (a kind of “language drift” problem). In this paper, we investigate the adoption of a stream-based on-line learning approach to relational data. Specifically, we examine the representation of relational data in both an infinite-attribute setting, and in the usual fixed-attribute setting, and develop implementations that use ILP engines in combination with on-line model-constructors. The behaviour of each program is investigated using a set of controlled experiments, and performance in practical settings is demonstrated by constructing complete theories for some of the largest biochemical datasets examined by ILP systems to date, including one with a million examples; to the best of our knowledge, the first time this has been empirically demonstrated with ILP on a real-world data set.  相似文献   

20.
The growth of machine-generated relational databases, both in the sciences and in industry, is rapidly outpacing our ability to extract useful information from them by manual means. This has brought into focus machine learning techniques like Inductive Logic Programming (ILP) that are able to extract human-comprehensible models for complex relational data. The price to pay is that ILP techniques are not efficient: they can be seen as performing a form of discrete optimisation, which is known to be computationally hard; and the complexity is usually some super-linear function of the number of examples. While little can be done to alter the theoretical bounds on the worst-case complexity of ILP systems, some practical gains may follow from the use of multiple processors. In this paper we survey the state-of-the-art on parallel ILP. We implement several parallel algorithms and study their performance using some standard benchmarks. The principal findings of interest are these: (1) of the techniques investigated, one that simply constructs models in parallel on each processor using a subset of data and then combines the models into a single one, yields the best results; and (2) sequential (approximate) ILP algorithms based on randomized searches have lower execution times than (exact) parallel algorithms, without sacrificing the quality of the solutions found. This is an extended version of the paper entitled Strategies to Parallelize ILP Systems, published in the Proceedings of the 15th International Conference on Inductive Logic Programming (ILP 2005), vol. 3625 of LNAI, pp. 136–153, Springer-Verlag.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号