首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract-driven pattern discovery in databases   总被引:6,自引:0,他引:6  
The problem of discovering interesting patterns in large volumes of data is studied. Patterns can be expressed not only in terms of the database schema but also in user-defined terms, such as relational views and classification hierarchies. The user-defined terminology is stored in a data dictionary that maps it into the language of the database schema. A pattern is defined as a deductive rule expressed in user-defined terms that has a degree of uncertainty associated with it. Methods are presented for discovering interesting patterns based on abstracts which are summaries of the data expressed in the language of the user  相似文献   

2.
This article introduces the idea of using nonmonotonic inheritance networks for the storage and maintenance of knowledge discovered in data (revisable knowledge discovery in databases). While existing data mining strategies for knowledge discovery in databases typically involve initial structuring through the use of identification trees and the subsequent extraction of rules from these trees for use in rule-based expert systems, such strategies have difficulty in coping with additional information which may conflict with that already used for the automatic generation of rules. In the worst case, the entire automatic sequence may have to be repeated. If nonmonotonic inheritance networks are used instead of rules for storing knowledge discovered in databases, additional conflicting information can be inserted directly into such structures, thereby bypassing the need for recompilation. © 1996 John Wiley & Sons, Inc.  相似文献   

3.
An approach to knowledge discovery in complex molecular databases is described. The machine learning paradigm used is structured concept formation, in which object's described in terms of components and their interrelationships are clustered and organized in a knowledge base. Symbolic images are used to represent classes of structured objects. A discovered molecular knowledge base is successfully used in the interpretation of a high resolution electron density map  相似文献   

4.
关联规则是数据库中的知识发现(KDD)领域的重要研究课题。模糊关联规则可以用自然语言来表达人类知识,近年来受到KDD研究人员的普遍关注。但是,目前大多数模糊关联规则发现方法仍然沿用经典关联规则发现中常用的支持度和置信度测度。事实上,模糊关联规则可以有不同的解释,而且不同的解释对规则发现方法有很大影响。从逻辑的观点出发,定义了模糊逻辑规则、支持度、蕴含度及其相关概念,提出了模糊逻辑规则发现算法,该算法结合了模糊逻辑概念和Apriori算法,从给定的定量数据库中发现模糊逻辑规则。  相似文献   

5.
Systems for knowledge discovery in databases   总被引:7,自引:0,他引:7  
Knowledge-discovery systems face challenging problems from real-world databases, which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. These problems are addressed and some techniques for handling them are described. A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems. This model is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench. The deficiencies of existing systems relative to the model reveal several open problems for future research  相似文献   

6.
Knowledge discovery in databases using lattices   总被引:3,自引:0,他引:3  
The rapid pace at which data gathering, storage and distribution technologies are developing is outpacing our advances in techniques for helping humans to analyse, understand, and digest the vast amounts of resulting data. This has led to the birth of knowledge discovery in databases (KDD) and data mining—a process that has the goal to selectively extract knowledge from data. A range of techniques, including neural networks, rule-based systems, case-based reasoning, machine learning, statistics, etc. can be applied to the problem. We discuss the use of concept lattices, to determine dependences in the data mining process. We first define concept lattices, after which we show how they represent knowledge and how they are formed from raw data. Finally, we show how the lattice-based technique addresses different processes in KDD, especially visualization and navigation of discovered knowledge.  相似文献   

7.
Knowledge discovery in time series databases   总被引:13,自引:0,他引:13  
Adding the dimension of time to databases produces time series databases (TSDB) and introduces new aspects and difficulties to data mining and knowledge discovery. In this correspondence, we introduce a general methodology for knowledge discovery in TSDB. The process of knowledge discovery in TSDR includes cleaning and filtering of time series data, identifying the most important predicting attributes, and extracting a set of association rules that can be used to predict the time series behavior in the future. Our method is based on signal processing techniques and the information-theoretic fuzzy approach to knowledge discovery. The computational theory of perception (CTP) is used to reduce the set of extracted rules by fuzzification and aggregation. We demonstrate our approach on two types of time series: stock-market data and weather data.  相似文献   

8.
The proliferation of large masses of data has created many new opportunities for those working in science, engineering and business. The field of data mining (DM) and knowledge discovery from databases (KDD) has emerged as a new discipline in engineering and computer science. In the modern sense of DM and KDD the focus tends to be on extracting information characterized as knowledge from data that can be very complex and in large quantities. Industrial engineering, with the diverse areas it comprises, presents unique opportunities for the application of DM and KDD, and for the development of new concepts and techniques in this field. Many industrial processes are now automated and computerized in order to ensure the quality of production and to minimize production costs. A computerized process records large masses of data during its functioning. This real-time data which is recorded to ensure the ability to trace production steps can also be used to optimize the process itself. A French truck manufacturer decided to exploit the data sets of measures recorded during the test of diesel engines manufactured on their production lines. The goal was to discover knowledge in the data of the test engine process in order to significantly reduce (by about 25%) the processing time. This paper presents the study of knowledge discovery utilizing the KDD method. All the steps of the method have been used and two additional steps have been needed. The study allowed us to develop two systems: the discovery application is implemented giving a real-time prediction model (with a real reduction of 28%) and the discovery support environment now allows those who are not experts in statistics to extract their own knowledge for other processes.  相似文献   

9.
Efficient discovery of interesting statements in databases   总被引:3,自引:0,他引:3  
The Explora system supportsDiscovery in Databases by large scale search for interesting instances of statistical patterns. In this paper we describe how Explora assessesinterestingness and achievescomputational efficiency. These problems arise because of the variety of patterns and the immense combinatorial possibilities of generating instances when studying relations between variables in subsets of data. First, the user must be saved from getting overwhelmed with a deluge of findings. To restrict the search with respect to the analysis goals, the user can focus each discovery task performed during an interactive and iterative exploration process. Some basic organization principles of search can further limit the search effort. One principle is to organize search hierarchically and to evaluate first the statistical or information theoretic evidence of the general hypotheses. Then more special hypotheses can be eliminated from further search, if a more general hypothesis was already verified. But this approach alone has some drawbacks and even in moderately sized data does not prevent large sets of findings. Therefore, in a second evaluation phase, further aspects of interestingness are assessed. A refinement strategy selects the most interesting of the statistically significant statements. A second problem for discovery systems is efficiency. Each hypothesis evaluation requires many data accesses. We describe strategies that reduce data accesses and speed up computation.  相似文献   

10.
This paper describes a graphical user-interface for database-oriented knowledge discovery systems, DBLEARN, which has been developed for extracting knowledge rules from relational databases. The interface, designed using a query-by-example approach, provides a graphical means of specifying knowledge-discovery tasks. The interface supplies a graphical browsing facility to help users to perceive the nature of the target database structure. In order to guide users' task specification, a cooperative, menu-based guidance facility has been integrated into the interface. The interface also supplies a graphical interactive adjusting facility for helping users to refine the task specification to improve the quality of learned knowledge rules.  相似文献   

11.
The Web has profoundly reshaped our vision of information management and processing, enlightening the power of a collaborative model of information production and consumption. This new vision influences the Knowledge Discovery in Databases domain as well. In this paper we propose a service-oriented, semantic-supported approach to the development of a platform for sharing and reuse of resources (data processing and mining techniques), enabling the management of different implementations of the same technique and characterized by a community-centered attitude, with functionalities for both resource production and consumption, facilitating end-users with different skills as well as resource providers with different technical and domain specific capabilities. We first describe the semantic framework underlying the approach, then we demonstrate how this framework is exploited to give different functionalities to users through the presentation of the platform functionalities.  相似文献   

12.
Data-driven discovery of quantitative rules in relational databases   总被引:9,自引:0,他引:9  
A quantitative rule is a rule associated with quantitative information which assesses the representativeness of the rule in the database. An efficient induction method is developed for learning quantitative rules in relational databases. With the assistance of knowledge about concept hierarchies, data relevance, and expected rule forms, attribute-oriented induction can be performed on the database, which integrates database operations with the learning process and provides a simple, efficient way of learning quantitative rules from large databases. The method involves the learning of both characteristic rules and classification rules. Quantitative information facilitates quantitative reasoning, incremental learning, and learning in the presence of noise. Moreover, learning qualitative rules can be treated as a special case of learning quantitative rules. It is shown that attribute-oriented induction provides an efficient and effective mechanism for learning various kinds of knowledge rules from relational databases  相似文献   

13.
The search for similar subsequences is a core module for various analytical tasks in sequence databases. Typically, the similarity computations require users to set a length. However, there is no robust means by which to define the proper length for different application needs. In this study, we examine a new query that is capable of returning the longest-lasting highly correlated subsequences in a sequence database, which is particularly helpful to analyses without prior knowledge regarding the query length. A baseline, yet expensive, solution is to calculate the correlations for every possible subsequence length. To boost performance, we study a space-constrained index that provides a tight correlation bound for subsequences of similar lengths and offset by intraobject and interobject grouping techniques. To the best of our knowledge, this is the first index to support a normalized distance metric of arbitrary length subsequences. In addition, we study the use of a smart cache for disk-resident data (e.g., millions of sequence objects) and a graph processing unit-based parallel processing technique for frequently updated data (e.g., nonindexable streaming sequences) to compute the longest-lasting highly correlated subsequences. Extensive experimental evaluation on both real and synthetic sequence datasets verifies the efficiency and effectiveness of our proposed methods.  相似文献   

14.

In this editorial we briefly discuss past research on discovering first-order-logic patterns in databases. After a short introduction on the fields of Knowledge Discovery in Databases (KDD) and Inductive Logic Programming (ILP), we highlight some important areas of current research in the intersection of these two areas. Our goal is to provide readers that are not experts in these areas with a minimal background that will help them put the four contributions in the remainder of this special issue into the proper context.  相似文献   

15.
Distributed databases allow us to integrate data from different sources which have not previously been combined. The Dempster–Shafer theory of evidence and evidential reasoning are particularly suited to the integration of distributed databases. Evidential functions are suited to represent evidence from different sources. Evidential reasoning is carried out by the well‐known orthogonal sum. Previous work has defined linguistic summaries to discover knowledge by using fuzzy set theory and using evidence theory to define summaries. In this paper we study linguistic summaries and their applications to knowledge discovery in distributed databases. © 2000 John Wiley & Sons, Inc.  相似文献   

16.
We describe the Knowledge Discovery Workbench, an interactive system for database exploration. We then illustrate KDW capabilities in data clustering, summarization, classification, and discovery of changes. We also examine extracting dependencies from data and using them to order the multitude of data patterns. © 1992 John Wiley & Sons, Inc.  相似文献   

17.
Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, the discovery of foreign keys turns out to be an important and challenging task. The underlying problem is known to be the inclusion dependency (IND) inference problem. In this paper, data-mining algorithms are devised for IND inference in a given database. We propose a two-step approach. In the first step, unary INDs are discovered thanks to a new preprocessing stage which leads to a new algorithm and to an efficient implementation. In the second step, n-ary IND inference is achieved. This step fits in the framework of levelwise algorithms used in many data-mining algorithms. Since real-world databases can suffer from some data inconsistencies, approximate INDs, i.e. INDs which almost hold, are considered. We show how they can be safely integrated into our unary and n-ary discovery algorithms. An implementation of these algorithms has been achieved and tested against both synthetic and real-life databases. Up to our knowledge, no other algorithm does exist to solve this data-mining problem.  相似文献   

18.
A framework for knowledge discovery and evolution in databases   总被引:8,自引:0,他引:8  
A concept for knowledge discovery and evolution in databases is described. The key issues include: using a database query to discover new rules; using not only positive examples (answer to a query), but also negative examples to discover new rules; and harmonizing existing rules with the new rules. A tool for characterizing the exceptions in databases and evolving knowledge as a database evolves is developed  相似文献   

19.

Whereas topology optimization has achieved immense success, it involves an intrinsic difficulty. That is, optimized structures obtained by topology optimization strongly depend on the settings of the objective and constraint functions, i.e., the formulation. Nevertheless, the appropriate formulation is not usually obvious when considering structural design problems. Although trial-and-error to determine appropriate formulations are implicitly performed in several studies on topology optimization, it is important to explicitly support the process of trial-and-error. Therefore, in this study, we propose a new framework for topology optimization to determine appropriate formulations. The basic idea of this framework is incorporating knowledge discovery in databases (KDD) and topology optimization. Thus, we construct a database by collecting various and numerous material distributions that are obtained by solving various structural design problems with topology optimization, and find useful knowledge with respect to appropriate formulations from the database on the basis of KDD. An issue must be resolved when realizing the above idea, namely the material distribution in the design domain of a data record must be converted to conform to the design domain of the target design problem wherein an appropriate formulation should be determined. For this purpose, we also propose a material distribution-converting method termed as design domain mapping (DDM). Several numerical examples are used to demonstrate that the proposed framework including DDM successfully and explicitly supports the process of trial-and-error to determine the appropriate formulation.

  相似文献   

20.
With the increased acceptance of electronic health records, we can observe the increasing interest in the application of data mining approaches within this field. This study introduces a novel approach for exploring and comparing temporal trends within different in-patient subgroups, which is based on associated rule mining using Apriori algorithm and linear model-based recursive partitioning. The Nationwide Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP), Agency for Healthcare Research and Quality was used to evaluate the proposed approach. This study presents a novel approach where visual analytics on big data is used for trend discovery in form of a regression tree with scatter plots in the leaves of the tree. The trend lines are used for directly comparing linear trends within a specified time frame. Our results demonstrate the existence of opposite trends in relation to age and sex based subgroups that would be impossible to discover using traditional trend-tracking techniques. Such an approach can be employed regarding decision support applications for policy makers when organizing campaigns or by hospital management for observing trends that cannot be directly discovered using traditional analytical techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号