首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 953 毫秒
1.
Learning rules from incomplete training examples by rough sets   总被引:1,自引:0,他引:1  
Machine learning can extract desired knowledge from existing training examples and ease the development bottleneck in building expert systems. Most learning approaches derive rules from complete data sets. If some attribute values are unknown in a data set, it is called incomplete. Learning from incomplete data sets is usually more difficult than learning from complete data sets. In the past, the rough-set theory was widely used in dealing with data classification problems. In this paper, we deal with the problem of producing a set of certain and possible rules from incomplete data sets based on rough sets. A new learning algorithm is proposed, which can simultaneously derive rules from incomplete data sets and estimate the missing values in the learning process. Unknown values are first assumed to be any possible values and are gradually refined according to the incomplete lower and upper approximations derived from the given training examples. The examples and the approximations then interact on each other to derive certain and possible rules and to estimate appropriate unknown values. The rules derived can then serve as knowledge concerning the incomplete data set.  相似文献   

2.
To date, Inductive Logic Programming (ILP) systems have largely assumed that all data needed for learning have been provided at the onset of model construction. Increasingly, for application areas like telecommunications, astronomy, text processing, financial markets and biology, machine-generated data are being generated continuously and on a vast scale. We see at least four kinds of problems that this presents for ILP: (1) it may not be possible to store all of the data, even in secondary memory; (2) even if it were possible to store the data, it may be impractical to construct an acceptable model using partitioning techniques that repeatedly perform expensive coverage or subsumption-tests on the data; (3) models constructed at some point may become less effective, or even invalid, as more data become available (exemplified by the “drift” problem when identifying concepts); and (4) the representation of the data instances may need to change as more data become available (a kind of “language drift” problem). In this paper, we investigate the adoption of a stream-based on-line learning approach to relational data. Specifically, we examine the representation of relational data in both an infinite-attribute setting, and in the usual fixed-attribute setting, and develop implementations that use ILP engines in combination with on-line model-constructors. The behaviour of each program is investigated using a set of controlled experiments, and performance in practical settings is demonstrated by constructing complete theories for some of the largest biochemical datasets examined by ILP systems to date, including one with a million examples; to the best of our knowledge, the first time this has been empirically demonstrated with ILP on a real-world data set.  相似文献   

3.
基于不完备信息系统的Rough Set决策规则提取方法   总被引:2,自引:0,他引:2  
对象信息的不完备性是从实例中归纳学习的最大障碍。针对不完备的信息,研究了基于不完备信息系统的粗糙集决策规则提取方法,利用分层递减约简算法,通过实例有效地分析和处理了含有缺省数据和不精确数据的信息系统,扩展了粗糙集的应用领域。  相似文献   

4.
在数据挖掘以及机器学习等领域,都需要涉及一个数据预处理过程,以消除数据中所包含的错误、噪声、不一致数据或缺失值。其中,缺失值的填充是一个非常具有挑战性的任务,因为填充效果的好坏会极大的影响学习算法及挖掘算法的后续处理过程。目前已有的一些填充算法,如基于粗糙集的和基于最近邻法的算法等,在一定程度上能够处理缺失值问题。与以上方法不同,提出了一种扩展的基于信息增益的缺失值填充算法,它充分利用数据集中各属性之间隐含的关系对缺失的数据进行填充。大量的实验表明,提出的扩展的基于信息增益的缺失值填充算法是有效的。  相似文献   

5.
实际应用中获取到的数据集通常是动态增加的,且随着数据获取工具的迅速发展,新数据通常会一组一组地增加。为此,针对含有缺失数据的动态数据集,基于粗糙集理论,提出了一种组增量式的粗糙特征选择算法。首先分析、证明了信息熵的组增量计算公式,并以信息熵作为特征重要度的度量,在此基础上设计了基于信息熵的可有效处理含有缺失数据的动态数据集的组增量式特征选择算法。实验结果进一步证明了新算法的可行性和高效性。  相似文献   

6.
One problem which frequently surfaces when applying explanation-based learning (EBL) to imperfect theories is themultiple inconsistent explanation problem. The multiple inconsistent explanation problem occurs when a domain theory produces multiple explanations for a training instance, only some of which are correct. Domain theories which suffer from the multiple inconsistent explanation problem can occur in many different contexts, such as when some information is missing and must be assumed: since such assumptions can be incorrect, incorrect explanations can be constructed. This paper proposes an extension of explanation-based learning, calledabductive explanation-based learning (A-EBL) which solves the multiple inconsistent explanation problem by using set covering techniques and negative examples to choose among the possible explanations of a training example. It is shown by formal analysis that A-EBL has convergence properties that are only logarithmically worse than EBL/TS, a formalization of a certain type of knowledge-level EBL; A-EBL is also proven to be computationally efficient, assuming that the domain theory is tractable. Finally, experimental results are reported on an application of A-EBL to learning correct rules for opening bids in the game of contract bridge given examples and an imperfect domain theory.  相似文献   

7.
现如今生活当中的数据大多都是动态变化的,并且在数据动态增加的过程中,许多特征中包含有缺失数据。如何处理动态变化的含有缺失数据的数据集的特征选择成为一个亟需解决的问题。为此,基于粗糙集理论,通过更新互补信息熵在含有缺失数据的数据集维数增加时的更新机制,进而提出一种缺失数据维数增量式特征选择算法,并通过实验进一步验证了算法的可行性与高效性。  相似文献   

8.
Attribute-value based representations, standard in today's data mining systems, have a limited expressiveness. Inductive Logic Programming provides an interesting alternative, particularly for learning from structured examples whose parts, each with its own attributes, are related to each other by means of first-order predicates. Several subsets of first-order logic (FOL) with different expressive power have been proposed in Inductive Logic Programming (ILP). The challenge lies in the fact that the more expressive the subset of FOL the learner works with, the more critical the dimensionality of the learning task. The Datalog language is expressive enough to represent realistic learning problems when data is given directly in a relational database, making it a suitable tool for data mining. Consequently, it is important to elaborate techniques that will dynamically decrease the dimensionality of learning tasks expressed in Datalog, just as Feature Subset Selection (FSS) techniques do it in attribute-value learning. The idea of re-using these techniques in ILP runs immediately into a problem as ILP examples have variable size and do not share the same set of literals. We propose here the first paradigm that brings Feature Subset Selection to the level of ILP, in languages at least as expressive as Datalog. The main idea is to first perform a change of representation, which approximates the original relational problem by a multi-instance problem. The representation obtained as the result is suitable for FSS techniques which we adapted from attribute-value learning by taking into account some of the characteristics of the data due to the change of representation. We present the simple FSS proposed for the task, the requisite change of representation, and the entire method combining those two algorithms. The method acts as a filter, preprocessing the relational data, prior to the model building, which outputs relational examples with empirically relevant literals. We discuss experiments in which the method was successfully applied to two real-world domains.  相似文献   

9.
基于限制非对称相似关系模型的规则获取算法研究   总被引:1,自引:0,他引:1  
粗糙集理论在不完备信息系统中的应用,是将粗糙集理论进一步推向实用的关键之一,而经典的粗糙集理论对不完备信息系统的处理显得束手无策.在分析研究已有的扩充粗糙集理论模型的基础上,进一步提出基于限制非对称相似关系模型,并将经典的可辨识关系矩阵加以扩充,定义了限制非对称相似关系下的可辨识关系矩阵,采用布尔推理方法,直接从不完备信息系统中提取规则而无需改变初始不完备信息系统的结构.实验结果表明,所获得的决策规则简洁,与缺省值无关.  相似文献   

10.
Inductive logic programming (ILP) induces concepts from a set of positive examples, a set of negative examples, and background knowledge. ILP has been applied on tasks such as natural language processing, finite element mesh design, network mining, robotics, and drug discovery. These data sets usually contain numerical and multivalued categorical attributes; however, only a few relational learning systems are capable of handling them in an efficient way. In this paper, we present an evolutionary approach, called Grouping and Discretization for Enriching the Background Knowledge (GDEBaK), to deal with numerical and multivalued categorical attributes in ILP. This method uses evolutionary operators to create and test numerical splits and subsets of categorical values in accordance with a fitness function. The best subintervals and subsets are added to the background knowledge before constructing candidate hypotheses. We implemented GDEBaK embedded in Aleph and compared it to lazy discretization in Aleph and discretization in Top‐down Induction of Logical Decision Trees (TILDE) systems. The results obtained showed that our method improves accuracy and reduces the number of rules in most cases. Finally, we discuss these results and possible lines for future work.  相似文献   

11.
Semantics-preserving dimensionality reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition, and signal processing. This has found successful application in tasks that involve data sets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and Web content classification. One of the many successful applications of rough set theory has been to this feature selection area. This paper reviews those techniques that preserve the underlying semantics of the data, using crisp and fuzzy rough set-based methodologies. Several approaches to feature selection based on rough set theory are experimentally compared. Additionally, a new area in feature selection, feature grouping, is highlighted and a rough set-based feature grouping technique is detailed.  相似文献   

12.
We address the problem of metric learning for multi-view data. Many metric learning algorithms have been proposed, most of them focus just on single view circumstances, and only a few deal with multi-view data. In this paper, motivated by the co-training framework, we propose an algorithm-independent framework, named co-metric, to learn Mahalanobis metrics in multi-view settings. In its implementation, an off-the-shelf single-view metric learning algorithm is used to learn metrics in individual views of a few labeled examples. Then the most confidently-labeled examples chosen from the unlabeled set are used to guide the metric learning in the next loop. This procedure is repeated until some stop criteria are met. The framework can accommodate most existing metric learning algorithms whether types-of-side-information or example-labels are used. In addition it can naturally deal with semi-supervised circumstances under more than two views. Our comparative experiments demonstrate its competiveness and effectiveness.  相似文献   

13.
归纳逻辑程序设计(ILP)是机器学习的一个重要分支,给定一个样例集和相关背景知识,ILP研究如何构建与其相一致的逻辑程序,这些逻辑程序由有限一阶子句组成。文章描述了一种综合当前一些ILP方法多方面优势的算法ICCR,ICCR溶合了以FOIL为代表的自顶向下搜索策略和以GOLEM为代表的自底向上搜索策略,并能根据需要发明新谓词、学习递归逻辑程序,对比实验表明,对相同的样例及背景知识,ICCR比FOIL和GOLEM能学到精度更高的目标逻辑程序。  相似文献   

14.
针对无线传感器网络中不精确、不确定数据问题,提出了将信息处理和粗糙集技术融为一体的新研究思路,并基于分层簇结构给出了一种层次型智能信息处理方法.无线传感器网络实时森林火灾监测的示例与分析表明,该方法在实际应用中,通过从三个层次进行知识约简等智能数据分析,挖掘实用决策规则,使传感器节点仅自动获取和传送有效的最小数据集信息,实现了智能信息处理、能量消耗和系统性能之间的平衡.  相似文献   

15.
针对无线传感器网络中不精确、不确定数据问题,提出了将信息处理和粗糙集技术融为一体的新研究思路,并基于分层簇结构给出了一种层次型智能信息处理方法。无线传感器网络实时森林火灾监测的示例与分析表明,该方法在实际应用中,通过从三个层次进行知识约简等智能数据分析,挖掘实用决策规则,使传感器节点仅自动获取和传送有效的最小数据集信息,实现了智能信息处理、能量消耗和系统性能之间的平衡。  相似文献   

16.
粗糙集理论是一种有效的数据挖掘工具,覆盖粗糙集理论是粗糙集理论中的重要部分。给出了一对覆盖近似算子随数据对象增加的更新方法,并以实例说明了所提出的更新方法的有效性。  相似文献   

17.
By introducing the rough set theory into the support vector machine (SVM), a rough margin based SVM (RMSVM) is proposed to deal with the overfitting problem due to outliers. Similar to the classical SVM, the RMSVM searches for the separating hyper-plane that maximizes the rough margin, defined by the lower and upper margin. In this way, more data points are adaptively considered rather than the few extreme value points used in the classical SVM. In addition, different support vectors may have different effects on the learning of the separating hyper-plane depending on their positions in the rough margin. Points in the lower margin have more effects than those in the boundary of the rough margin. From experimental results on six benchmark datasets, the classification accuracy of this algorithm is improved without additional computational expense compared with the classical ν-SVM.  相似文献   

18.
A rough set theory is a new mathematical tool to deal with uncertainty and vagueness of decision system and it has been applied successfully in all the fields. It is used to identify the reduct set of the set of all attributes of the decision system. The reduct set is used as preprocessing technique for classification of the decision system in order to bring out the potential patterns or association rules or knowledge through data mining techniques. Several researchers have contributed variety of algorithms for computing the reduct sets by considering different cases like inconsistency, missing attribute values and multiple decision attributes of the decision system. This paper focuses on the review of the techniques for dimensionality reduction under rough set theory environment. Further, the rough sets hybridization with fuzzy sets, neural network and metaheuristic algorithms have also been reviewed. The performance analysis of the algorithms has been discussed in connection with the classification.  相似文献   

19.
基于传统的不分明关系的粗糙集理论是无法处理不完备信息系统的。针对不完备的信息,研究了基于不完备信息系统的粗糙分类的方法,通过实例,有效地分析、处理了含有缺省数据和不精确数据的信息系统。  相似文献   

20.
基于相容矩阵的粗计算   总被引:9,自引:0,他引:9  
黄兵  何新  周献中 《自动化学报》2004,30(3):364-370
基于等价关系的经典粗糙集理论已取得了极大进展.但现实中的等价关系要求过于严 格.因此,可将其放宽为相容关系.粗糙集理论中的粗计算方法一直是该理论的重要研究内容.本 文在基于相容关系的基础上提出了相容矩阵的概念,建立了相容关系和相容矩阵间的一一对应 关系,通过矩阵计算来刻画粗分析中的一系列计算方法;并利用相容矩阵提出了不完备信息系统 的属性约简启发式算法,分析了算法的时间复杂度.通过实例说明了该方法是适用而有效的.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号