首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Relational rule learning algorithms are typically designed to construct classification and prediction rules. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach was successfully applied to standard ILP problems (East-West trains, King-Rook-King chess endgame and mutagenicity prediction) and two real-life problems (analysis of telephone calls and traffic accident analysis). Editors: Hendrik Blockeel, David Jensen and Stefan Kramer An erratum to this article is available at .  相似文献   

2.
We present a new margin-based approach to first-order rule learning. The approach addresses many of the prominent challenges in first-order rule learning, such as the computational complexity of optimization and capacity control. Optimizing the mean of the margin minus its variance, we obtain an algorithm linear in the number of examples and a handle for capacity control based on error bounds. A useful parameter in the optimization problem tunes how evenly the weights are spread among the rules. Moreover, the search strategy for including new rules can be adjusted flexibly, to perform variants of propositionalization or relational learning. The implementation of the system includes plugins for logical queries, graphs and mathematical terms. In extensive experiments, we found that, at least on the most commonly used toxicological datasets, overfitting is hardly an issue. In another batch of experiments, a comparison with margin-based ILP approaches using kernels turns out to be favorable. Finally, an experiment shows how many features are needed by propositionalization and relational learning approaches to reach a certain predictive performance. Editors: Stephen Muggleton, Ramon Otero, Simon Colton.  相似文献   

3.
We describe an algorithm for constructing a set of tree-like conjunctive relational features by combining smaller conjunctive blocks. Unlike traditional level-wise approaches which preserve the monotonicity of frequency, our block-wise approach preserves monotonicity of feature reducibility and redundancy, which are important in propositionalization employed in the context of classification learning. With pruning based on these properties, our block-wise approach efficiently scales to features including tens of first-order atoms, far beyond the reach of state-of-the art propositionalization or inductive logic programming systems.  相似文献   

4.
归纳逻辑程序设计(ILP)是机器学习的一个重要分支,给定一个样例集和相关背景知识,ILP研究如何构建与其相一致的逻辑程序,这些逻辑程序由有限一阶子句组成。文章描述了一种综合当前一些ILP方法多方面优势的算法ICCR,ICCR溶合了以FOIL为代表的自顶向下搜索策略和以GOLEM为代表的自底向上搜索策略,并能根据需要发明新谓词、学习递归逻辑程序,对比实验表明,对相同的样例及背景知识,ICCR比FOIL和GOLEM能学到精度更高的目标逻辑程序。  相似文献   

5.
Inductive Logic Programming (ILP) studies learning from examples, within the framework provided by clausal logic. ILP has become a popular subject in the field of data mining due to its ability to discover patterns in relational domains. Several ILP-based concept discovery systems are developed which employs various search strategies, heuristics and language pattern limitations. LINUS, GOLEM, CIGOL, MIS, FOIL, PROGOL, ALEPH and WARMR are well-known ILP-based systems. In this work, firstly introductory information about ILP is given, and then the above-mentioned systems and an ILP-based concept discovery system called C2D are briefly described and the fundamentals of their mechanisms are demonstrated on a running example. Finally, a set of experimental results on real-world problems are presented in order to evaluate and compare the performance of the above-mentioned systems.  相似文献   

6.

Inductive logic programming combines both machine learning and logic programming techniques. ILP uses first-order predicate logic restricted to Horn clauses as an underlying language. Thus, programs induced by an ILP system inherit the classical limitations of PROLOG programs. Constraint logic programming avoids some of the limitations of logic programming, and so ILP aims to induce programs that employ this paradigm. Current ILP systems that induce constrained logic programs extend systems based on the normal semantics ofILP. In this article we introduce IC-Log, a new system that induces constrained logic programs and relies on an extension ofa nonmonotonic semantics-based system. We then present an application of IC-Log in the field ofcomputer-aided publishing.  相似文献   

7.
This paper presents the Connectionist Inductive Learning and Logic Programming System (C-IL2P). C-IL2P is a new massively parallel computational model based on a feedforward Artificial Neural Network that integrates inductive learning from examples and background knowledge, with deductive learning from Logic Programming. Starting with the background knowledge represented by a propositional logic program, a translation algorithm is applied generating a neural network that can be trained with examples. The results obtained with this refined network can be explained by extracting a revised logic program from it. Moreover, the neural network computes the stable model of the logic program inserted in it as background knowledge, or learned with the examples, thus functioning as a parallel system for Logic Programming. We have successfully applied C-IL2P to two real-world problems of computational biology, specifically DNA sequence analyses. Comparisons with the results obtained by some of the main neural, symbolic, and hybrid inductive learning systems, using the same domain knowledge, show the effectiveness of C-IL2P.  相似文献   

8.
Inductive logic programming (ILP) induces concepts from a set of positive examples, a set of negative examples, and background knowledge. ILP has been applied on tasks such as natural language processing, finite element mesh design, network mining, robotics, and drug discovery. These data sets usually contain numerical and multivalued categorical attributes; however, only a few relational learning systems are capable of handling them in an efficient way. In this paper, we present an evolutionary approach, called Grouping and Discretization for Enriching the Background Knowledge (GDEBaK), to deal with numerical and multivalued categorical attributes in ILP. This method uses evolutionary operators to create and test numerical splits and subsets of categorical values in accordance with a fitness function. The best subintervals and subsets are added to the background knowledge before constructing candidate hypotheses. We implemented GDEBaK embedded in Aleph and compared it to lazy discretization in Aleph and discretization in Top‐down Induction of Logical Decision Trees (TILDE) systems. The results obtained showed that our method improves accuracy and reduces the number of rules in most cases. Finally, we discuss these results and possible lines for future work.  相似文献   

9.
Feature selection methods often improve the performance of attribute-value learning. We explore whether also in relational learning, examples in the form of clauses can be reduced in size to speed up learning without affecting the learned hypothesis. To this end, we introduce the notion of safe reduction: a safely reduced example cannot be distinguished from the original example under the given hypothesis language bias. Next, we consider the particular, rather permissive bias of bounded treewidth clauses. We show that under this hypothesis bias, examples of arbitrary treewidth can be reduced efficiently. We evaluate our approach on four data sets with the popular system Aleph and the state-of-the-art relational learner nFOIL. On all four data sets we make learning faster in the case of nFOIL, achieving an order-of-magnitude speed up on one of the data sets, and more accurate in the case of Aleph.  相似文献   

10.
Many domains in the field of Inductive Logic Programming (ILP) involve highly unbalanced data. A common way to measure performance in these domains is to use precision and recall instead of simply using accuracy. The goal of our research is to find new approaches within ILP particularly suited for large, highly-skewed domains. We propose Gleaner, a randomized search method that collects good clauses from a broad spectrum of points along the recall dimension in recall-precision curves and employs an “at least L of these K clauses” thresholding method to combine sets of selected clauses. Our research focuses on Multi-Slot Information Extraction (IE), a task that typically involves many more negative examples than positive examples. We formulate this problem into a relational domain, using two large testbeds involving the extraction of important relations from the abstracts of biomedical journal articles. We compare Gleaner to ensembles of standard theories learned by Aleph, finding that Gleaner produces comparable testset results in a fraction of the training time. Editor: Rui Camacho  相似文献   

11.
Attribute-value based representations, standard in today's data mining systems, have a limited expressiveness. Inductive Logic Programming provides an interesting alternative, particularly for learning from structured examples whose parts, each with its own attributes, are related to each other by means of first-order predicates. Several subsets of first-order logic (FOL) with different expressive power have been proposed in Inductive Logic Programming (ILP). The challenge lies in the fact that the more expressive the subset of FOL the learner works with, the more critical the dimensionality of the learning task. The Datalog language is expressive enough to represent realistic learning problems when data is given directly in a relational database, making it a suitable tool for data mining. Consequently, it is important to elaborate techniques that will dynamically decrease the dimensionality of learning tasks expressed in Datalog, just as Feature Subset Selection (FSS) techniques do it in attribute-value learning. The idea of re-using these techniques in ILP runs immediately into a problem as ILP examples have variable size and do not share the same set of literals. We propose here the first paradigm that brings Feature Subset Selection to the level of ILP, in languages at least as expressive as Datalog. The main idea is to first perform a change of representation, which approximates the original relational problem by a multi-instance problem. The representation obtained as the result is suitable for FSS techniques which we adapted from attribute-value learning by taking into account some of the characteristics of the data due to the change of representation. We present the simple FSS proposed for the task, the requisite change of representation, and the entire method combining those two algorithms. The method acts as a filter, preprocessing the relational data, prior to the model building, which outputs relational examples with empirically relevant literals. We discuss experiments in which the method was successfully applied to two real-world domains.  相似文献   

12.
归纳逻辑程序设计综述   总被引:4,自引:1,他引:4  
归纳逻辑程序设计是由机器学习与逻辑程序设计交叉所形成的一个研究领域,是机器学习的前沿研究课题。该文首先从归纳逻辑程序设计的问题背景、类型划分和搜索程序子句三个方面介绍了归纳逻辑程序设计系统的概貌;然后结合实验室的相关研究工作,回顾了归纳逻辑程序设计研究的发展;之后介绍了归纳逻辑程序设计领域中需要深入研究的若干问题,并提出了新的解决思路;最后是总结,以引起读者对归纳逻辑程序设计领域研究的进一步关注。  相似文献   

13.
针对目前归纳逻辑程序设计(inductive logic programming,ILP)系统要求训练数据充分且无法利用无标记数据的不足,提出了一种利用无标记数据学习一阶规则的算法——关系tri-training(relational-tri-training,R-tri-training)算法。该算法将基于命题逻辑表示的半监督学习算法tri-training的思想引入到基于一阶逻辑表示的ILP系统,在ILP框架下研究如何利用无标记样例信息辅助分类器训练。R-tri-training算法首先根据标记数据和背景知识初始化三个不同的ILP系统,然后迭代地用无标记样例对三个分类器进行精化,即如果两个分类器对一个无标记样例的标记结果一致,则在一定条件下该样例将被标记给另一个分类器作为新的训练样例。标准数据集上实验结果表明:R-tri-training能有效地利用无标记数据提高学习性能,且R-tri-training算法性能优于GILP(genetic inductive logic programming)、NFOIL、KFOIL和ALEPH。  相似文献   

14.
Inductive logic programming (ILP) is a sub‐field of machine learning that provides an excellent framework for multi‐relational data mining applications. The advantages of ILP have been successfully demonstrated in complex and relevant industrial and scientific problems. However, to produce valuable models, ILP systems often require long running times and large amounts of memory. In this paper we address fundamental issues that have direct impact on the efficiency of ILP systems. Namely, we discuss how improvements in the indexing mechanisms of an underlying logic programming system benefit ILP performance. Furthermore, we propose novel data structures to reduce memory requirements and we suggest a new lazy evaluation technique to search the hypothesis space more efficiently. These proposals have been implemented in the April ILP system and evaluated using several well‐known data sets. The results observed show significant improvements in running time without compromising the accuracy of the models generated. Indeed, the combined techniques achieve several order of magnitudes speedup in some data sets. Moreover, memory requirements are reduced in nearly half of the data sets. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

15.
A Multistrategy Approach to Relational Knowledge Discovery in Databases   总被引:1,自引:0,他引:1  
When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on real-world databases. The opposite extreme is to select a small data set, thereby being able to learn very expressive (first-order logic) hypotheses. A multistrategy approach allows one to include most of these advantages and exclude most of the disadvantages. Simpler learning algorithms detect hierarchies which are used to structure the hypothesis space for a more complex learning algorithm. The better structured the hypothesis space is, the better learning can prune away uninteresting or losing hypotheses and the faster it becomes.We have combined inductive logic programming (ILP) directly with a relational database management system. The ILP algorithm is controlled in a model-driven way by the user and in a data-driven way by structures that are induced by three simple learning algorithms.  相似文献   

16.
17.
One of the obstacles to widely using first-order logic languages is the fact that relational inference is intractable in the worst case. This paper presents an any-time relational inference algorithm: it proceeds by stochastically sampling the inference search space, after this space has been judiciously restricted using strongly-typed logic-like declarations.We present a relational learner producing programs geared to stochastic inference, named STILL, to enforce the potentialities of this framework. STILL handles examples described as definite or constrained clauses, and uses sampling-based heuristics again to achieve any-time learning.Controlling both the construction and the exploitation of logic programs yields robust relational reasoning, where deductive biases are compensated for by inductive biases, and vice versa.  相似文献   

18.
We propose a general framework to incorporate first-order logic (FOL) clauses, that are thought of as an abstract and partial representation of the environment, into kernel machines that learn within a semi-supervised scheme. We rely on a multi-task learning scheme where each task is associated with a unary predicate defined on the feature space, while higher level abstract representations consist of FOL clauses made of those predicates. We re-use the kernel machine mathematical apparatus to solve the problem as primal optimization of a function composed of the loss on the supervised examples, the regularization term, and a penalty term deriving from forcing real-valued constraints deriving from the predicates. Unlike for classic kernel machines, however, depending on the logic clauses, the overall function to be optimized is not convex anymore. An important contribution is to show that while tackling the optimization by classic numerical schemes is likely to be hopeless, a stage-based learning scheme, in which we start learning the supervised examples until convergence is reached, and then continue by forcing the logic clauses is a viable direction to attack the problem. Some promising experimental results are given on artificial learning tasks and on the automatic tagging of bibtex entries to emphasize the comparison with plain kernel machines.  相似文献   

19.
Scaling Up Inductive Logic Programming by Learning from Interpretations   总被引:4,自引:0,他引:4  
When comparing inductive logic programming (ILP) and attribute-value learning techniques, there is a trade-off between expressive power and efficiency. Inductive logic programming techniques are typically more expressive but also less efficient. Therefore, the data sets handled by current inductive logic programming systems are small according to general standards within the data mining community. The main source of inefficiency lies in the assumption that several examples may be related to each other, so they cannot be handled independently.Within the learning from interpretations framework for inductive logic programming this assumption is unnecessary, which allows to scale up existing ILP algorithms. In this paper we explain this learning setting in the context of relational databases. We relate the setting to propositional data mining and to the classical ILP setting, and show that learning from interpretations corresponds to learning from multiple relations and thus extends the expressiveness of propositional learning, while maintaining its efficiency to a large extent (which is not the case in the classical ILP setting).As a case study, we present two alternative implementations of the ILP system TILDE (Top-down Induction of Logical DEcision trees): TILDEclassic, which loads all data in main memory, and TILDELDS, which loads the examples one by one. We experimentally compare the implementations, showing TILDELDS can handle large data sets (in the order of 100,000 examples or 100 MB) and indeed scales up linearly in the number of examples.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号