首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Knowledge discovery in databases using lattices   总被引:3,自引:0,他引:3  
The rapid pace at which data gathering, storage and distribution technologies are developing is outpacing our advances in techniques for helping humans to analyse, understand, and digest the vast amounts of resulting data. This has led to the birth of knowledge discovery in databases (KDD) and data mining—a process that has the goal to selectively extract knowledge from data. A range of techniques, including neural networks, rule-based systems, case-based reasoning, machine learning, statistics, etc. can be applied to the problem. We discuss the use of concept lattices, to determine dependences in the data mining process. We first define concept lattices, after which we show how they represent knowledge and how they are formed from raw data. Finally, we show how the lattice-based technique addresses different processes in KDD, especially visualization and navigation of discovered knowledge.  相似文献   

2.
An approach to knowledge discovery in complex molecular databases is described. The machine learning paradigm used is structured concept formation, in which object's described in terms of components and their interrelationships are clustered and organized in a knowledge base. Symbolic images are used to represent classes of structured objects. A discovered molecular knowledge base is successfully used in the interpretation of a high resolution electron density map  相似文献   

3.
Distributed databases allow us to integrate data from different sources which have not previously been combined. In this article, we are concerned with the situation where the data sources are held in a distributed database. Integration of the data is then accomplished using the Dempster–Shafer representation of evidence. The weighted sum operator is developed and this operator is shown to provide an appropriate mechanism for the integration of such data. This representation is particularly suited to statistical samples which may include missing values and be held at different levels of aggregation. Missing values are incorporated into the representation to provide lower and upper probabilities for propositions of interest. The weighted sum operator facilitates combination of samples with different classification schemes. Such a capability is particularly useful for knowledge discovery when we are searching for rules within the concept hierarchy, defined in terms of probabilities or associations. By integrating information from different sources, we may thus be able to induce new rules or strengthen rules which have already been obtained. We develop a framework for describing such rules and show how we may then integrate rules at a high level without having to resort to the raw data, a useful facility for knowledge discovery where efficiency is of the essence. © 1997 John Wiley & Sons, Inc.  相似文献   

4.
Knowledge discovery in time series databases   总被引:13,自引:0,他引:13  
Adding the dimension of time to databases produces time series databases (TSDB) and introduces new aspects and difficulties to data mining and knowledge discovery. In this correspondence, we introduce a general methodology for knowledge discovery in TSDB. The process of knowledge discovery in TSDR includes cleaning and filtering of time series data, identifying the most important predicting attributes, and extracting a set of association rules that can be used to predict the time series behavior in the future. Our method is based on signal processing techniques and the information-theoretic fuzzy approach to knowledge discovery. The computational theory of perception (CTP) is used to reduce the set of extracted rules by fuzzification and aggregation. We demonstrate our approach on two types of time series: stock-market data and weather data.  相似文献   

5.
The proliferation of large masses of data has created many new opportunities for those working in science, engineering and business. The field of data mining (DM) and knowledge discovery from databases (KDD) has emerged as a new discipline in engineering and computer science. In the modern sense of DM and KDD the focus tends to be on extracting information characterized as knowledge from data that can be very complex and in large quantities. Industrial engineering, with the diverse areas it comprises, presents unique opportunities for the application of DM and KDD, and for the development of new concepts and techniques in this field. Many industrial processes are now automated and computerized in order to ensure the quality of production and to minimize production costs. A computerized process records large masses of data during its functioning. This real-time data which is recorded to ensure the ability to trace production steps can also be used to optimize the process itself. A French truck manufacturer decided to exploit the data sets of measures recorded during the test of diesel engines manufactured on their production lines. The goal was to discover knowledge in the data of the test engine process in order to significantly reduce (by about 25%) the processing time. This paper presents the study of knowledge discovery utilizing the KDD method. All the steps of the method have been used and two additional steps have been needed. The study allowed us to develop two systems: the discovery application is implemented giving a real-time prediction model (with a real reduction of 28%) and the discovery support environment now allows those who are not experts in statistics to extract their own knowledge for other processes.  相似文献   

6.
We describe the Knowledge Discovery Workbench, an interactive system for database exploration. We then illustrate KDW capabilities in data clustering, summarization, classification, and discovery of changes. We also examine extracting dependencies from data and using them to order the multitude of data patterns. © 1992 John Wiley & Sons, Inc.  相似文献   

7.

Whereas topology optimization has achieved immense success, it involves an intrinsic difficulty. That is, optimized structures obtained by topology optimization strongly depend on the settings of the objective and constraint functions, i.e., the formulation. Nevertheless, the appropriate formulation is not usually obvious when considering structural design problems. Although trial-and-error to determine appropriate formulations are implicitly performed in several studies on topology optimization, it is important to explicitly support the process of trial-and-error. Therefore, in this study, we propose a new framework for topology optimization to determine appropriate formulations. The basic idea of this framework is incorporating knowledge discovery in databases (KDD) and topology optimization. Thus, we construct a database by collecting various and numerous material distributions that are obtained by solving various structural design problems with topology optimization, and find useful knowledge with respect to appropriate formulations from the database on the basis of KDD. An issue must be resolved when realizing the above idea, namely the material distribution in the design domain of a data record must be converted to conform to the design domain of the target design problem wherein an appropriate formulation should be determined. For this purpose, we also propose a material distribution-converting method termed as design domain mapping (DDM). Several numerical examples are used to demonstrate that the proposed framework including DDM successfully and explicitly supports the process of trial-and-error to determine the appropriate formulation.

  相似文献   

8.
Deductive databases have the ability to deduce new facts from a set of existing facts by using a set of rules. They are also useful in the integration of artificial intelligence and databases. However, when recursive rules are involved, the number of deduced facts can become too large to be practically stored, viewed or analyzed. This seriously hinders the usefulness of deductive databases. In order to overcome this problem, we propose four methods to discover characteristic rules from a large number of deduction results without actually having to store all the deduction results. This paper presents the first step in the application of knowledge discovery techniques to deductive databases with large numbers of deduction results  相似文献   

9.
10.
Abstract-driven pattern discovery in databases   总被引:6,自引:0,他引:6  
The problem of discovering interesting patterns in large volumes of data is studied. Patterns can be expressed not only in terms of the database schema but also in user-defined terms, such as relational views and classification hierarchies. The user-defined terminology is stored in a data dictionary that maps it into the language of the database schema. A pattern is defined as a deductive rule expressed in user-defined terms that has a degree of uncertainty associated with it. Methods are presented for discovering interesting patterns based on abstracts which are summaries of the data expressed in the language of the user  相似文献   

11.
This article introduces the idea of using nonmonotonic inheritance networks for the storage and maintenance of knowledge discovered in data (revisable knowledge discovery in databases). While existing data mining strategies for knowledge discovery in databases typically involve initial structuring through the use of identification trees and the subsequent extraction of rules from these trees for use in rule-based expert systems, such strategies have difficulty in coping with additional information which may conflict with that already used for the automatic generation of rules. In the worst case, the entire automatic sequence may have to be repeated. If nonmonotonic inheritance networks are used instead of rules for storing knowledge discovered in databases, additional conflicting information can be inserted directly into such structures, thereby bypassing the need for recompilation. © 1996 John Wiley & Sons, Inc.  相似文献   

12.
Allocating fragments in distributed databases   总被引:2,自引:0,他引:2  
For a distributed database system to function efficiently, the fragments of the database need to be located, judiciously at various sites across the relevant communications network. The problem of allocating these fragments to the most appropriate sites is a difficult one to solve, however, with most approaches available relying on heuristic techniques. Optimal approaches are usually based on mathematical programming, and formulations available for this problem are based on the linearization of nonlinear binary integer programs and have been observed to be ineffective except on very small problems. This paper presents new integer programming formulations for the nonredundant version of the fragment allocation problem. This formulation is extended to address problems which have both storage and processing capacity constraints; the approach is observed to be particularly effective in the presence of capacity restrictions. Extensive computational tests conducted over a variety of parameter values indicate that the reformulations are very effective even on relatively large problems, thereby reducing the need for heuristic approaches.  相似文献   

13.
The problem of finding optimal distribution of a database over a computer network to facilitate parallel searching for a set of database queries is analysed in this paper. The parallel searching of multiple segments required by the queries lowers the response time considerably. Procedures for finding the optimal distributions in a network to maximally exploit the parallel search capability with or without redundancy of segment types are proposed.  相似文献   

14.
Sequential pattern mining algorithms can often produce more accurate results if they work with specific constraints in addition to the support threshold. Many systems implement time-independent constraints by selecting qualified patterns. This selection cannot implement time-dependent constraints, because the support computation process must validate the time attributes of every data sequence during mining. Therefore, we propose a memory time-indexing approach, called METISP, to discover sequential patterns with time constraints including minimum-gap, maximum-gap, exact-gap, sliding window, and duration constraints. METISP scans the database into memory and constructs time-index sets for effective processing. METISP uses index sets and a pattern-growth strategy to mine patterns without generating any candidates or sub-databases. The index sets narrow down the search space to the sets of designated in-memory data sequences, and speed up the counting of potential items within the indicated ranges. Our comprehensive experiments show that METISP has better efficiency, even with low support and large databases, than the well-known GSP and DELISP algorithms. METISP scales up linearly with respect to database size.  相似文献   

15.
TheNielsen Opportunity Explorer tmproduct can be used by sales and trade marketing personnel within consumer packaged goods manufacturers to understand how their products are performing in the market place and find opportunities to sell more product, more profitably to the retailers. Opportunity Explorer uses data collected at the point-of-sale terminals, and by auditors of A. C. Nielsen. Opportunity Explorer uses a knowledge-base of market research expertise to analyze large databases and generate interactive reports using knowledge discovery templates, converting a large space of data into concise, inter-linkedinformation frames. Each information frame addresses specific business issues, and leads the user to seek related information by means of dynamically created hyperlinks.  相似文献   

16.
关联规则是数据库中的知识发现(KDD)领域的重要研究课题。模糊关联规则可以用自然语言来表达人类知识,近年来受到KDD研究人员的普遍关注。但是,目前大多数模糊关联规则发现方法仍然沿用经典关联规则发现中常用的支持度和置信度测度。事实上,模糊关联规则可以有不同的解释,而且不同的解释对规则发现方法有很大影响。从逻辑的观点出发,定义了模糊逻辑规则、支持度、蕴含度及其相关概念,提出了模糊逻辑规则发现算法,该算法结合了模糊逻辑概念和Apriori算法,从给定的定量数据库中发现模糊逻辑规则。  相似文献   

17.
Systems for knowledge discovery in databases   总被引:7,自引:0,他引:7  
Knowledge-discovery systems face challenging problems from real-world databases, which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. These problems are addressed and some techniques for handling them are described. A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems. This model is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench. The deficiencies of existing systems relative to the model reveal several open problems for future research  相似文献   

18.
The development and investigation of efficient methods of parallel processing of very large databases using the columnar data representation designed for computer cluster is discussed. An approach that combines the advantages of relational and column-oriented DBMSs is proposed. A new type of distributed column indexes fragmented based on the domain-interval principle is introduced. The column indexes are auxiliary structures that are constantly stored in the distributed main memory of a computer cluster. To match the elements of a column index to the tuples of the original relation, surrogate keys are used. Resource hungry relational operations are performed on the corresponding column indexes rather than on the original relations of the database. As a result, a precomputation table is obtained. Using this table, the DBMS reconstructs the resulting relation. For basic relational operations on column indexes, methods for their parallel decomposition that do not require massive data exchanges between the processor nodes are proposed. This approach improves the class OLAP query performance by hundreds of times.  相似文献   

19.
A reduced cover set of the set of full reducer semijoin programs for an acyclic query graph for a distributed database system is given. An algorithm is presented that determines the minimum cost full reducer program. The computational complexity of finding the optimal full reducer for a single relation is of the same order as that of finding the optimal full reducer for all relations. The optimization algorithm is able to handle query graphs where more than one attribute is common between the relations. A method for determining the optimum profitable semijoin program is presented. A low-cost algorithm which determines a near-optimal profitable semijoin program is outlined. This is done by converting a semijoin program into a partial order graph. This graph also allows one to maximize the concurrent processing of the semijoins. It is shown that the minimum response time is given by the largest cost path of the partial order graph. This reducibility is used as a post optimizer for the SSD-1 query optimization algorithm. It is shown that the least upper bound on the length of any profitable semijoin program is N(N-1) for a query graph of N nodes  相似文献   

20.
The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号