首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH   总被引:49,自引:0,他引:49  
Knowledge discovery in databases, or dala mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriented rough set approach has been developed for knowledge discovery in databases. The method integrates machine-learning paradigm, especially learning-from-examples techniques, with rough set techniques. An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes. Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques, and the unimportant or irrelevant attributes are eliminated. Thus concise and strong rules with little or no redundant information can be learned efficiently. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for knowledge discovery in database systems.  相似文献   

2.
The development of efficient algorithms for learning from large relational databases is an important task in applicative machine learning. In this paper, we study knowledge discovery in relational databases and develop an attribute-oriented learning method which extracts generalization rules from relational databases. The method adopts the artificial intelligence “learning-from-examples” paradigm and applies in the learning process an attribute-oriented concept tree ascending technique which integrates database operations with the learning process and provides a simple and efficient way of learning from databases. The method learns both characteristic rules and classification rules of a learning concept, where a characteristic rule characterizes the properties shared by all the facts of the class being learned; while a classification rule characterizes the properties that distinguish the class being learned from other classes. The learning result could be a conjunctive rule or a rule with a small number of disjuncts. Moreover, learning can be performed with databases containing noisy data and exceptional cases using database statistics. Our analysis of the algorithms shows that attribute-oriented induction substantially reduces the computational complexity of the database learning process. Le développement d'algorithmes efficaces permettant l'apprentissage à partir de bases de donnees relationnelles est une fonction importante de l'apprentissage automatique applicatif. Dans cet article, les auteurs examinent la découverte des connaissances dans les bases de données relationnelles et élaborent une méthode d'apprentissage orientée sur l'attribut qui extrait des bases de données relationnelles les règies de généralisation. La méthode adopte le paradigme d'apprentissage à partir d'exemples et applique au processus d'apprentissage la technique de l'arbre des concepts orientés sur l'attribut qui incorpore les opérations de base de données au processus d'apprentissage, ce qui permet d'obtenir une méthode simple et efficace d'apprentissage à partir des bases de données. La méthode fait l'apprentissage des règies caractéristiques et des règies de classification d'un concept d'apprentissage; la règie caractéristique qualifie les pro-priétés communes à tous les faits d'une categorie faisant l'objet d'un apprentissage alors que la règie de classification caractérise les propriétés qui distinguent la catégorie faisant l'objet d'un apprentissage des autres catégories. Le résultat peut ětre une règie conjonctive ou une règie ayant un petit nombre de disjonctifs. Qui plus est, 1′apprentissage peut se faire avec des bases de données contenant des donnees bruitees et des cas exceptionnels utilisant des statistiques de bases de données. L'analyse des algorithmes démontre que l'induction orientée sur l'attribut réduit considérablement la complexité informàtique du processus d'apprentissage des bases de données.  相似文献   

3.
面向属性的归纳与概念聚类   总被引:2,自引:0,他引:2  
面向属性的归纳是新近提出的一种广泛用于数据库中的知识发现的方法,提出这种方法与一种机器学习方法--概念聚类之间的紧密联系,并描述如何使用一个概念聚类算法进行面向属性的归纳。  相似文献   

4.
面向属性的归纳与概念聚类   总被引:3,自引:1,他引:3  
伍小荣  谢立宏 《计算机工程》2003,29(5):92-93,123
面向属性的归纳是新近提出的一种广泛用于数据库中知识发现的方法,文章指出这种方法与一种机器学习方法-概念聚类之间的紧密联系,并描述如何使用一个概念聚类算法进行面向属性的归纳。  相似文献   

5.
文章采用数据挖掘技术中的判定树归纳与面向属性归纳相组合的方法,对“昆明建筑工程交易中心信息管理计算机网络系统”中施工企业关系数据库中的细节数据进行概念分层和特征提取,构造出容易解释和效率较高的判定树和分类规则,可为建设单位和建筑管理机构的决策者提供更直观地、更深入地了解施工企业的决策信息,据此选择具备承担招标项目能力、资信良好的企业,另一方面又能增强业主的工程施工风险防范能力,提高对建筑业施工企业的管理水平。  相似文献   

6.
空间数据挖掘是从空间数据库中抽取隐含知识、空间关系及空间数据库中存储的其它信息的方法。空间关联规则是空间数据挖掘的一个重要研究领域,利用空间关联规则把空间数据库中的数据转化为知识是一个很好的方法。在分析空间关联规则的基础上,用基于关联规则的逐步求精挖掘算法,得出空间数据库中的隐含知识,通过实例证明其方法的可行性。  相似文献   

7.
In this article we investigate an attribute-oriented induction approach for acquisition of abstract knowledge from data stored in a fuzzy database environment. We utilize a proximity-based fuzzy database schema as the medium carrying the original information, where lack of precise information about an entity can be reflected via multiple attribute values, and the classical equivalence relation is replaced with the broader fuzzy proximity relation. We analyze in detail the process of attribute-oriented induction by concept hierarchies, utilizing the original properties of fuzzy databases to support this established data mining technique. In our approach we take full advantage of the implicit knowledge about the similarity of original attribute values, included by default in the investigated fuzzy database schemas. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 763–779, 2007.  相似文献   

8.
基于关联规则的空间数据知识发现及实现   总被引:4,自引:0,他引:4  
空间数据挖掘就是从空间数据库中抽取隐含知识、空间关系及空间数据库中存储的其它模式的方法。空间关联规则是空间数据挖掘的一个重要表现形式,利用空间关联规则把空间数据库中的数据转化为知识是一个很好的方法。本文在分析空间关联规则的基础上,用基于关联规则的逐步求精挖掘算法,得出空间数据库中的知识,通过实例证明其方法的可行性。  相似文献   

9.
Many applications of knowledge discovery and data mining such as rule discovery for semantic query optimization, database integration and decision support, require the knowledge to be consistent with the data. However, databases usually change over time and make machine-discovered knowledge inconsistent. Useful knowledge should be robust against database changes so that it is unlikely to become inconsistent after database updates. This paper defines this notion of robustness in the context of relational databases and describes how robustness of first-order Horn-clause rules can be estimated. Experimental results show that our estimation approach can accurately identify robust rules. We also present a rule antecedent pruning algorithm that improves the robustness and applicability of machine discovered rules to demonstrate the usefulness of robustness estimation.  相似文献   

10.
针对面向属性的归纳方法及粗糙集方法对知识粒性连续性的特点,将两者有机结合,利用面向属性归纳方法对数据进行泛化,再用属性的信息增益技术寻找泛化属性之间的数据依赖关系,能快速地在数据集中挖掘分类规则。将其应用于经典的仿真算例中,仿真结果合理、可靠。  相似文献   

11.
基于同义词表的异构数据库-本体映射   总被引:1,自引:0,他引:1  
本体是信息在语义层和知识层的描述。将现有的其它数据形式转换为本体表示将有利于实现数据共享和语义推理。目前计算机系统中的数据大多以关系数据库形式存在,因此基于关系数据库构建本体的研究具有重要意义。本文首先分析了单一数据库-本体映射问题,扩充了Ghawi规则以覆盖更多类型的数据关系,其次针对同质异构数据库-本体映射问题,进一步扩充Ghawi规则,给出了基于同义词表的异构数据库-本体映射规则集,实现了异构数据库-本体映射。  相似文献   

12.
Attribute-oriented induction (AOI) is a useful data mining method for extracting generalized knowledge from relational data and users’ background knowledge. Concept hierarchies can be integrated with the AOI method to induce multi-level generalized knowledge. However, the existing AOI approaches are only capable of mining positive knowledge from databases; thus, rare but important negative generalized knowledge that is unknown, unexpected, or contradictory to what the user believes, can be missed. In this study, we propose a global negative attribute-oriented induction (GNAOI) approach that can generate comprehensive and multiple-level negative generalized knowledge at the same time. Two pruning properties, the downward level closure property and the upward superset closure property, are employed to improve the efficiency of the algorithm, and a new interest measure, nim(cl), is exploited to measure the degree of the negative relation. Experiment results from a real-life dataset show that the proposed method is effective in finding global negative generalized knowledge.  相似文献   

13.
ESTIMATING DBLEARN'S POTENTIAL FOR KNOWLEDGE DISCOVERY IN DATABASES   总被引:1,自引:0,他引:1  
We propose a procedure for estimating DBLEARN's potential for knowledge discovery, given a relational database and concept hierarchies. This procedure is most useful for evaluating alternative concept hierarchies for the same database. The DBLEARN knowledge discovery program uses an attribute-oriented inductive-inference method to discover potentially significant high-level relationships in a database. A concept forest, with at most one concept hierarchy for each attribute, defines the possible generalizations that DBLEARN can make for a database. The potential for discovery in a database is estimated by examining the complexity of the corresponding concept forest. Two heuristic measures are defined based on the number, depth, and height of the interior nodes. Higher values for these measures indicate more complex concept forests and arguably more potential for discovery. Experimental results using a variety of concept forests and four commercial databases show that in practice both measures permit quite reliable decisions to be made; thus, the simplest may be most appropriate.  相似文献   

14.

During the last decade, databases have been growing rapidly in size and number as a result of rapid advances in database capacity and management techniques. This expansive growth in data and databases has caused a pressing need for the development of more powerful techniques to convert the vast pool of data into valuable information. For the purpose of strategic and decision-making, many companies and researchers have recognized mining useful information and knowledge from large databases as a key research topic and as an opportunity for major revenues and improving competitiveness. In this paper, we will explore a new rule generation algorithm (based on rough sets theory) that can generate a minimal set of rule reducts, and a rule generation and rule induction program (RGRIP) which can efficiently induce decision rules from conflicting information systems. All the methods will also be illustrated with numerical examples.  相似文献   

15.
印勇  田逢春 《计算机测量与控制》2002,10(11):759-761,770
利用粗集理论分析了关系数据库中属性间的因果关系,研究了从关系数据库中挖掘规则的方法,对该方法中条件属性的简化、提取规则的最小简化策略进行了详细讨论,给出了相应的算法。为从数据库中进行知识获取提供了一种新的途径。  相似文献   

16.
A method for learning knowledge from a database is used to address the bottleneck of manual knowledge acquisition. An attempt is made to improve representation with the assistance of experts and from computer resident knowledge. The knowledge representation is described in the framework of a conceptual schema consisting of a semantic model and an event model. A concept classifies a domain into different subdomains. As a method of knowledge acquisition, inductive learning techniques are used for rule generation. The theory of rough sets is used in designing the learning algorithm. Examples of certain concepts are used to induce general specifications of the concepts called classification rules. The basic approach is to partition the information into equivalence classes and to derive conclusions based on equivalence relations. In a sense, what is involved is a data-reduction process, where the goal is to reduce a large database of information to a small number of rules describing the domain. This completely integrated approach includes user interface, semantics, constraints, representations of temporal events, induction, etc  相似文献   

17.
Resolving domain incompatibility among independently developed databases often involves uncertain information. DeMichiel (1989) showed that uncertain information can be generated by the mapping of conflicting attributes to a common domain, based on some domain knowledge. We show that uncertain information can also arise when the database integration process requires information not directly represented in the component databases, but can be obtained through some summary of data. We therefore propose an extended relational model based on Dempster-Shafer theory of evidence to incorporate such uncertain knowledge about the source databases. The extended relation uses evidence sets to represent uncertainty in information, which allow probabilities to be attached to subsets of possible domain values. We also develop a full set of extended relational operations over the extended relations. In particular, an extended union operation has been formalized to combine two extended relations using Dempster's rule of combination. The closure and boundedness properties of our proposed extended operations are formulated. We also illustrate the use of extended operations by some query examples  相似文献   

18.

Traditional association-rule mining (ARM) considers only the frequency of items in a binary database, which provides insufficient knowledge for making efficient decisions and strategies. The mining of useful information from quantitative databases is not a trivial task compared to conventional algorithms in ARM. Fuzzy-set theory was invented to represent a more valuable form of knowledge for human reasoning, which can also be applied and utilized for quantitative databases. Many approaches have adopted fuzzy-set theory to transform the quantitative value into linguistic terms with its corresponding degree based on defined membership functions for the discovery of FFIs, also known as fuzzy frequent itemsets. Only linguistic terms with maximal scalar cardinality are considered in traditional fuzzy frequent itemset mining, but the uncertainty factor is not involved in past approaches. In this paper, an efficient fuzzy mining (EFM) algorithm is presented to quickly discover multiple FFIs from quantitative databases under type-2 fuzzy-set theory. A compressed fuzzy-list (CFL)-structure is developed to maintain complete information for rule generation. Two pruning techniques are developed for reducing the search space and speeding up the mining process. Several experiments are carried out to verify the efficiency and effectiveness of the designed approach in terms of runtime, the number of examined nodes, memory usage, and scalability under different minimum support thresholds and different linguistic terms used in the membership functions.

  相似文献   

19.
The analysis of relationships in databases for rule derivation   总被引:2,自引:0,他引:2  
Owing to the rapid growth in the sizes of databases, potentially useful information may be embeded in a large amount of data. Knowledge discovery is the search for semantic relationships which exist in large databases. One of the main problems for knowledge discovery is that the number of possible relationships can be very large, thus searching for interesting relationships and reducing the search complexity are important. The relationships can be represented as rules which can be used in efficient query processing. We present a technique to analyze relationships among attribute values and to derive compact rule set. We also propose a mechanism and some heuristics to reduce the search complexity for the rule derivation process. An evaluation model is presented to evaluate the quality of the derived rules. Moreover, in real world, databases may contain uncertain data. We also propose a technique to analyze the relationships among uncertain data and derive probabilistic rules.  相似文献   

20.
在分析从关系数据库学习本体方法现状与不足,考虑到结构化范例库蕴含更丰富语义信息且拥有大量可复用的领域知识的基础上,提出一个低时间复杂度的结构化范例库向OWL本体自动映射的算法,并阐述该算法的流程。其优点在于不仅能获取范例库中蕴含的语义信息,而且将范例、规则知识项直接映射为对应的OWL个体,从而实现最大限度的知识复用。应用实例证实了该算法的有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号