首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
This paper describes a graphical user-interface for database-oriented knowledge discovery systems, DBLEARN, which has been developed for extracting knowledge rules from relational databases. The interface, designed using a query-by-example approach, provides a graphical means of specifying knowledge-discovery tasks. The interface supplies a graphical browsing facility to help users to perceive the nature of the target database structure. In order to guide users' task specification, a cooperative, menu-based guidance facility has been integrated into the interface. The interface also supplies a graphical interactive adjusting facility for helping users to refine the task specification to improve the quality of learned knowledge rules.  相似文献   

2.
Intelligent query answering by knowledge discovery techniques   总被引:3,自引:0,他引:3  
Knowledge discovery facilitates querying database knowledge and intelligent query answering in database systems. We investigate the application of discovered knowledge, concept hierarchies, and knowledge discovery tools for intelligent query answering in database systems. A knowledge-rich data model is constructed to incorporate discovered knowledge and knowledge discovery tools. Queries are classified into data queries and knowledge queries. Both types of queries can be answered directly by simple retrieval or intelligently by analyzing the intent of query and providing generalized, neighborhood or associated information using stored or discovered knowledge. Techniques have been developed for intelligent query answering using discovered knowledge and/or knowledge discovery tools, which includes generalization, data summarization, concept clustering, rule discovery, query rewriting, deduction, lazy evaluation, application of multiple-layered databases, etc. Our study shows that knowledge discovery substantially broadens the spectrum of intelligent query answering and may have deep implications on query answering in data- and knowledge-base systems  相似文献   

3.
EDM: A general framework for Data Mining based on Evidence Theory   总被引:16,自引:0,他引:16  
Data Mining or Knowledge Discovery in Databases [1, 15, 23] is currently one of the most exciting and challenging areas where database techniques are coupled with techniques from Artificial Intelligence and mathematical sub-disciplines to great potential advantage. It has been defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patterns which are hidden below the surface in databases. However, most of the work being done in this field has been problem-specific and no general framework has yet been proposed for Data Mining. In this paper we seek to remedy this by proposing, EDM — Evidence-based Data Mining — a general framework for Data Mining based on Evidence Theory.

Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discoveryd by another discovery process to be incorporated into the discovery process. A common knowledge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most discovery processes, e.g. incorporating domain knowledge and dealing with missing values.

The framework presented in this paper has the following additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets — a necessity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex. Also, the parallelism within the framework allows its use in parallel, distributed and heterogeneous databases. The framework is easily updated and new discovery methods can be readily incorporated within the framework, making it ‘general’ in the functional sense in addition to the representational sense considered above. The framework provides an intuitive way of dealing with missing data during the discovery process using the concept of Ignorance borrowed from Evidence Theory.

The framework consists of a method for representing data and knowledge, and methods for data manipulation or knowledge discovery. We suggest an extension of the conventional definition of mass functions in Evidence Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each operation is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by them and discuss aspects of the induction, domain and combination operator classes.

The application of EDM to two separate Data Mining tasks is also addressed, highlighting the advantages of using a general framework for Data Mining in general and, in particular, using one that is based on Evidence Theory.  相似文献   


4.
Summary discovery is one of the major components of knowledge discovery in databases, which provides the user with comprehensive information for grasping the essence from a large amount of information in a database. In this paper, we propose an interactive top-down summary discovery process which utilizes fuzzy ISA hierarchies as domain knowledge. We define a generalized tuple as a representational form of a database summary including fuzzy concepts. By virtue of fuzzy ISA hierarchies where fuzzy ISA relationships common in actual domains are naturally expressed, the discovery process comes up with more accurate database summaries. We also present an informativeness measure for distinguishing generalized tuples that delivers much information to users, based on Shannon's information theory.  相似文献   

5.
Database summarization using fuzzy ISA hierarchies   总被引:3,自引:0,他引:3  
Summary discovery is one of the major components of knowledge discovery in databases, which provides the user with comprehensive information for grasping the essence from a large amount of information in a database. We propose an interactive top down summary discovery process which utilizes fuzzy ISA hierarchies as domain knowledge. We define a generalized tuple as a representational form of a database summary including fuzzy concepts. By virtue of fuzzy ISA hierarchies where fuzzy ISA relationships common in actual domains are naturally expressed, the discovery process comes up with more accurate database summaries. We also present an informativeness measure for distinguishing generalized tuples that delivers much information to users, based on C. Shannon's (1948) information theory.  相似文献   

6.
This paper describes the nature of mathematical discovery (including concept definition and exploration, example generation, and theorem conjecture and proof), and considers how such an intelligent process can be simulated by a machine. Although the material is drawn primarily from graph theory, the results are immediately relevant to research in mathematical discovery and learning.The thought experiment, a protocol paradigm for the empirical study of mathematical discovery, highlights behavioral objectives for machine simulation. This thought experiment provides an insightful account of the discovery process, motivates a framework for describing mathematical knowledge in terms of object classes, and is a rich source of advice on the design of a system to perform discovery in graph theory. The evaluation criteria for a discovery system, it is argued, must include both a set of behavior to display (behavioral objectives) and a target set of facts to be discovered (factual knowledge).Cues from the thought experiment are used to formulate two hierarchies of representational languages for graphy theory. The first hierarchy is based on the superficial terminology and knowledge of the thought experiment. Generated by formal grammars with set-theoretic semantics, this eminently reasonable approach ultimately fails to meet the factual knowledge criteria. The second hierarchy uses declarative expressions, each of which has a semantic interpretation as a stylized, recursive algorithm that defines a class by generating it correctly and completely. A simple version of one such representation is validated by a successful, implemented system called Graph Theorist (GT) for mathematical research in graph theory. GT generates correct examples, defines and explores new graph theory properties, and conjectures and proves theorems.Several themes run through this paper. The first is the dual goals, behavioral objectives and factural knowledge to be discovered, and the multiplicity of their demands on a discovery system. The second theme is the central role of object classes to knowledge representation. The third is the increased power and flexibility of a constructive (generator) definition over the more traditional predicate (tester) definition. The final theme is the importance of examples and recursion in mathematical knowledge. The results provide important guidance for further research in the simulation of mathematical discovery.  相似文献   

7.
基于规则面向属性的数据库归纳的无回溯算法   总被引:8,自引:0,他引:8  
周生炳  张钹  成栋 《软件学报》1999,10(7):673-678
该文提出了基于规则的面向属性知识发现方法的无回溯算法.把背景知识理解为特殊的逻辑程序,并把它的子句展开为完全归结子句,然后按照用户要求,定义并确定每个属性的恰当层次.每个属性的多个值归纳为恰当层次中的值,只需一遍扫描,因此无需回溯.  相似文献   

8.
High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity.  相似文献   

9.
A Multistrategy Approach to Relational Knowledge Discovery in Databases   总被引:1,自引:0,他引:1  
When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on real-world databases. The opposite extreme is to select a small data set, thereby being able to learn very expressive (first-order logic) hypotheses. A multistrategy approach allows one to include most of these advantages and exclude most of the disadvantages. Simpler learning algorithms detect hierarchies which are used to structure the hypothesis space for a more complex learning algorithm. The better structured the hypothesis space is, the better learning can prune away uninteresting or losing hypotheses and the faster it becomes.We have combined inductive logic programming (ILP) directly with a relational database management system. The ILP algorithm is controlled in a model-driven way by the user and in a data-driven way by structures that are induced by three simple learning algorithms.  相似文献   

10.
LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH   总被引:49,自引:0,他引:49  
Knowledge discovery in databases, or dala mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriented rough set approach has been developed for knowledge discovery in databases. The method integrates machine-learning paradigm, especially learning-from-examples techniques, with rough set techniques. An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes. Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques, and the unimportant or irrelevant attributes are eliminated. Thus concise and strong rules with little or no redundant information can be learned efficiently. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for knowledge discovery in database systems.  相似文献   

11.
Large databases can be a source of useful knowledge. Yet this knowledge is implicit in the data. It must be mined and expressed in a concise, useful form of statistical patterns, equations, rules, conceptual hierarchies, and the like. Automation of knowledge discovery is important because databases are growing in size and number, and standard data analysis techniques are not designed for exploration of huge hypotheses spaces. We concentrate on discovery of regularities, defining a regularity by a pattern and the range in which that pattern holds. We argue that two types of patterns are particularly important: contingency tables and equations, and we present Forty-Niner (49er), a general-purpose database mining system which conducts large-scale search for those patterns in many subsets of data, conducting a more costly search for equations only when data indicate a functional relationship. 49er can refine the initial regularities to yield stronger and more general regularities and more useful concepts. 49er combines several searches, each contributing to a different aspect of a regularity. Correspondence between the components of search and the structure of regularities makes the system easy to understand, use, and expand. Finally, we discuss 49er's performance in four categories of tests: (1) open exploration of new databases; (2) reproduction of human findings (limited because databases which have been extensively explored are very rare); (3) hide- and -seek testing on artificially created data, to evaluate 49er on large scale against known results; (4) exploration of randomly generated databases.  相似文献   

12.
用探测性的归纳学习方法从空间数据库发现知识   总被引:6,自引:0,他引:6       下载免费PDF全文
将探测性数据分析,面向属性的归纳和Rough集方法结合起来,形成了一种灵活通用的探测性归纳米学习方法EIL,可以从空间数据库中发现普遍知识,属性依赖,分类知识等多种知识,同时提出了和总结了多种生成空间数据库概念层次结构的方法用于归纳学习,用中国分省农业统计数据的发掘试验说明了EIL的可行性和有效性。  相似文献   

13.
Recent association-mining research has led to the development of techniques that allow the accommodation of concept hierarchies within the mining process. This extension results in the discovery of rules which associate not only groups of items but which are also influenced by the hierarchies within which an item may reside. Given this, there then arises a need for techniques whereby such hierarchical associations can be presented to the user. Current association rule visualisation techniques are limited, as they do not effectively incorporate or enable the visualisation of hierarchical semantics. This paper presents a review of current hierarchical and association visualisation techniques and introduces a novel technique for visualising hierarchical association rules.  相似文献   

14.
软件组件技术与知识发现系统   总被引:3,自引:0,他引:3  
文中介绍了组件技术及发现的概念,阐述了在软件构架支持下,利用组件堆积木似的搭建知识发现系统的特点,实现了软件的大粒度复用,大大缩短开发周期,降低维护成本,且可即插即用。  相似文献   

15.
概念指导的关联规则的挖掘   总被引:4,自引:0,他引:4  
关联规则是数据依赖关系泊有效描述方法,是知识发现研究的重要内容,传统的关联规则挖掘算法缺少挖掘的针对性,挖掘速度慢,挖掘效果难于理解,挖掘析数量巨大,需要进行大量的筛选以便抽取出有用规则,文中提出了将概念融入挖掘过程中,提高挖掘的效率和挖掘的针对性的方法,给出了概念指导的关联规则挖掘算法CGARM和大数据库中概念的交互式生成方法。算法CGARM是对基于分类的挖掘算法的拓展。实验结果表明,算法CGA  相似文献   

16.
传统的关联规则表示方法无法展示概念之间的本质关系,缺少对概念层面的认识,忽略了知识发现结果的共享等问题,而概念格作为一种能够生动简洁地体现概念之间泛化和例化关系的数据结构,在对关联规则可视化和发现潜在知识方面也有着独特的优势。提出了以概念格为背景的关联规则可视化方法,以概念为查找单元,在概念格中寻找需要展示的关联规则路径,将属性之间的关联关系扩展到概念层面,并给出了相对应的多模式规则的可视化的策略与算法。结合某校图书馆借书记录数据,进行关联规则分析与可视化实现。实验结果表明,该可视化方法在知识发现和共享方面具有良好的效果。  相似文献   

17.
Automatic generation of concept hierarchies using WordNet   总被引:2,自引:1,他引:1  
This paper examines and proposes the automatic generation of concept hierarchies using WordNet. Existing researches have mostly explored the utilization of concept hierarchies, but have not addressed the prohibitive cost occurred in building large hierarchies manually. Several studies have examined the automatic generation of concept hierarchies for the numerical type data from a database. However, very little is known about the automatic generation of concept hierarchies for the nominal type data from a database, which is the subject of this paper. We propose the WordNet library method that first eliminates the ambiguity of the senses of nominal data values, constructs the concept hierarchy by overlapping the hypernym of the remaining senses, and lastly adjusts the resultant concept hierarchy to the preference of users. The proposed method is tested with a faculty employment database of a university. The automatic generation of hierarchies turns out to save efforts of experts or designers who build the concept hierarchies, and makes the hierarchy more objectively built than it is manually done.  相似文献   

18.
Many applications of knowledge discovery and data mining such as rule discovery for semantic query optimization, database integration and decision support, require the knowledge to be consistent with the data. However, databases usually change over time and make machine-discovered knowledge inconsistent. Useful knowledge should be robust against database changes so that it is unlikely to become inconsistent after database updates. This paper defines this notion of robustness in the context of relational databases and describes how robustness of first-order Horn-clause rules can be estimated. Experimental results show that our estimation approach can accurately identify robust rules. We also present a rule antecedent pruning algorithm that improves the robustness and applicability of machine discovered rules to demonstrate the usefulness of robustness estimation.  相似文献   

19.
基于知识库的知识发现的研究   总被引:1,自引:0,他引:1  
传统的知识发现都是基于数据库进行的,但数据库中存储的数据缺乏语义性,对知识的发现缺乏全面性.将本体和移动代理技术引入知识发现中,首先从数据中获取语义信息并将其与知识一起存储进知识库,然后基于知识库提出了一种的新的知识发现系统,为知识发现提供了一种新的思路和方法.  相似文献   

20.
Consistency and Completeness in Rough Sets   总被引:4,自引:0,他引:4  
Consistency and completeness are defined in the context of rough set theory and shown to be related to the lower approximation and upper approximation, respectively. A member of a composed set (union of elementary sets) that is consistent with respect to a concept, surely belongs to the concept. An element that is not a member of a composed set that is complete with respect to a concept, surely does not belong to the concept. A consistent rule and a complete rule are useful in addition to any other rules learnt to describe a concept. When an element satisfies the consistent rule, it surely belongs to the concept, and when it does not satisfy the complete rule, it surely does not belong to the concept. In other cases, the other learnt rules are used. The results in the finite universe are extended to the infinite universe, thus introducing a rough set model for the learning from examples paradigm. The results in this paper have application in knowledge discovery or learning from database environments that are inconsistent, but at the same time demand accurate and definite knowledge. This study of consistency and completeness in rough sets also lays the foundation for related work at the intersection of rough set theory and inductive logic programming.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号