共查询到20条相似文献,搜索用时 0 毫秒
1.
Data mining is defined as the process of discovering significant and potentially useful patterns in large volumes of data. Discovering associations between items in a large database is one such data mining activity. In finding associations, support is used as an indicator as to whether an association is interesting. In this paper, we discuss three alternative interest measures for associations: any-confidence, all-confidence, and bond. We prove that the important downward closure property applies to both all-confidence and bond. We show that downward closure does not hold for any-confidence. We also prove that, if associations have a minimum all-confidence or minimum bond, then those associations will have a given lower bound on their minimum support and the rules produced from those associations will have a given lower bound on their minimum confidence as well. However, associations that have that minimum support (and likewise their rules that have minimum confidence) may not satisfy the minimum all-confidence or minimum bond constraint. We describe the algorithms that efficiently find all associations with a minimum all-confidence or minimum bond and present some experimental results. 相似文献
2.
Chris Giannella 《Information Processing Letters》2003,85(3):153-158
We consider the problem of defining a normalized approximation measure for multi-valued dependencies in relational database theory. An approximation measure is a function mapping relation instances to real numbers. The number to which an instance is mapped, intuitively, describes the strength of the dependency in that instance. A normalized approximation measure for functional dependencies has been proposed previously: the minimum number of tuples that need be removed for the functional dependency to hold divided by the total number of tuples. This leads naturally to a normalized measure for multi-valued dependencies: the minimum number of tuples that need be removed for the multi-valued dependency to hold divided by the total number of tuples.The measure for functional dependencies can be computed efficiently, O(|r|log(|r|)) where |r| is the relation instance. However, we show that an efficient algorithm for computing the analogous measure for multi-valued dependencies is not likely to exist. A polynomial time algorithm for computing the measure would lead to a polynomial time algorithm for an NP-complete problem (proven by a reduction from the maximum edge biclique problem in graph theory). Hence, we argue that it is not a good measure. We propose an alternate measure based on the lossless join characterization of multi-valued dependencies. This measure is efficiently computable, O(|r|2). 相似文献
3.
XML已经成为数据表示和交换的数据格式标准。随着大量XML文档的出现,应用数据库技术实现对XML数据的管理引起了越来越多研究者的兴趣。作为研究XML数据库技术的一个开始点,通过与关系数据库比较,可以深刻理解XML数据库与关系数据库的异同,进而为解决XML数据库所面临的问题,如为数据冗余控制、并发访问控制等提供必要的基础。两种数据库的比较是从数据模型、查询路径、完整性约束和规范化5个方面进行的,由于数据模型是数据库的基石,二者的数据模型从构造机制、名字的惟一性、空值、实体标识、实体问关系、文档顺序、数据结构的规则性、递归、数据自描述性等9个方面进行了详细讨论。 相似文献
4.
One attractive approach to object databases is to see them as potentially an evolutionary development from relational databases. This paper concentrates on substantiating the technical basis for this claim, and illustrates it in some detail with an upwards-compatible extension of ANSI SQL2 for conventional objects. This could serve as a foundation for the development of higher-level facilities for more complex objects. 相似文献
5.
《The Journal of Logic Programming》1988,5(1):33-60
I discuss my experiences, some of the work that I have done, and related work that influenced me, concerning deductive databases, over the last 30 years. I divide this time period into three roughly equal parts: 1957–1968, 1969–1978, 1979–present. For the first I describe how my interest started in deductive databases in 1957, at a time when the field of databases did not even exist. I describe work in the beginning years, leading to the start of deductive databases about 1968 with the work of Cordell Green and Bertram Raphael. The second period saw a great deal of work in theorem providing as well as the introduction of logic programming. The existence and importance of deductive databases as a formal and viable discipline received its impetus at a workshop held in Toulouse, France, in 1977, which culminated in the book Logic and Data Bases. The relationship of deductive databases and logic programming was recognized at that time. During the third period we have seen formal theories of databases come about as an outgrowth of that work, and the recognition that artificial intelligence and deductive databases are closely related, at least through the so-called expert database systems. I expect that the relationships between techniques from formal logic, databases, logic programming, and artificial intelligence will continue to be explored and the field of deductive databases will become a more prominent area of computer science in coming years. 相似文献
6.
The class of non-Horn, function-free databases is investigated and several aspects of the problem of using theorem proving techniques for such databases are considered. This includes exploring the treatment of negative information and extending the existing method, suggested by Minker, to accept non-unit negative clauses. It is shown that the algorithms based on the existing methods for the treatment of negative information can be highly inefficient. An alternative approach is suggested and a simpler algorithm based on it is given. The problems associated with query answering in non-Horn databases are addressed and compared with those for the Horn case. It is shown that the query evaluation process can be computationaly difficult in the general case. Conditions under which the process is simplified are discussed. The topic of non-Horn general laws is considered and some guidelines are suggested to divide such laws into derivation rules and integrity constraints. The effect of such a division on the query evaluation process is discussed. 相似文献
7.
Multimedia databases 总被引:2,自引:0,他引:2
A.Desai Narasimhalu 《Multimedia Systems》1996,4(5):226-249
The rapidly growing interest in building multimedia tools and applications has created a need for the development of multimedia
database management systems (MMDBMSs) as a tool for efficient organization, storage and retrieval of multimedia objects. We
begin with a word about traditional database management systems (DBMSs). Then we present an overview of the MMDBMS research
issues, challenges, methods, models, and architectures. We review the state of the art and research contributions from related
disciplines. Finally, we consider possibilities and probabilities for MMDBMS research in the future. 相似文献
8.
Effective timestamping in databases 总被引:3,自引:0,他引:3
Kristian Torp Christian S. Jensen Richard T. Snodgrass 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):267-288
Many existing database applications place various timestamps on their data, rendering temporal values such as dates and times prevalent in database tables. During the past two decades, several dozen temporal data models have appeared, all with timestamps being integral components. The models have used timestamps for encoding two specific temporal aspects of database facts, namely transaction time, when the facts are current in the database, and valid time, when the facts are true in the modeled reality. However, with few exceptions, the assignment of timestamp values has been considered only in the context of individual modification statements. This paper takes the next logical step: It considers the use of timestamping for capturing transaction and valid time in the context of transactions. The paper initially identifies and analyzes several problems with straightforward timestamping, then proceeds to propose a variety of techniques aimed at solving these problems. Timestamping the results of a transaction with the commit time of the transaction is a promising approach. The paper studies how this timestamping may be done using a spectrum of techniques. While many database facts are valid until now, the current time, this value is absent from the existing temporal types. Techniques that address this problem using different substitute values are presented. Using a stratum architecture, the performance of the different proposed techniques are studied. Although querying and modifying time-varying data is accompanied by a number of subtle problems, we present a comprehensive approach that provides application programmers with simple, consistent, and efficient support for modifying bitemporal databases in the context of user transactions. Received: March 11, 1998 / Accepted July 27, 1999 相似文献
9.
10.
11.
12.
Real-time databases 总被引:27,自引:0,他引:27
Krithi Ramamritham 《Distributed and Parallel Databases》1993,1(2):199-226
Data in real-time databases has to be logically consistent as well as temporally consistent. The latter arises from the need to preserve the temporal validity of data items that reflect the state of the environment that is being controlled by the system. Some of the timing constraints on the transactions that process real-time data come from this need. These constraints, in turn, necessitate time-cognizant transaction processing so that transactions can be processed to meet their deadlines.This paper explores the issues in real-time database systems and presents an overview of the state of the art. After introducing the characteristics of data and transactions in real-time databases, we discuss issues that relate to the processing of time-constrained transactions. Specifically, we examine different approaches to resolving contention over data and processing resources. We also explore the problems of recovery, managing I/O, and handling overloads. Real-time databases have the potential to trade off the quality of the result of a query or a transaction for its timely processing. Quality can be measured in terms of the completeness, accuracy, currency, and consistency of the results. Several aspects of this trade-off are also considered. 相似文献
13.
Chin-Wan Chung 《Journal of Systems Integration》1995,5(3):253-274
Currently relational databases are widely used, while object-oriented databases are emerging as a new generation of database technology. This paper presents a methodology to provide effective sharing of information in object-oriented databases and relational databases. The object-oriented data model is selected as a common data model to build an integrated view of the diverse databases. An object-oriented query language is used as a standard query language. A method is developed to transform a relational data definition to an equivalent object-oriented data definition and to integrate local data definitions. Two distributed query processing methods are derived. One is for general queries and the other for a special class of restricted queries. Using the methods developed, it is possible to access distributed object-oriented databases and relational databases such that the locations and the structural differences of the databases are transparent to users. 相似文献
14.
Joaquin Vanschoren Hendrik Blockeel Bernhard Pfahringer Geoffrey Holmes 《Machine Learning》2012,87(2):127-158
Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further research, or reproduce them to verify the claims made. In this paper, we present a collaboration framework designed to easily share machine learning experiments with the community, and automatically organize them in public databases. This enables immediate reuse of experiments for subsequent, possibly much broader investigation and offers faster and more thorough analysis based on a large set of varied results. We describe how we designed such an experiment database, currently holding over 650,000 classification experiments, and demonstrate its use by answering a wide range of interesting research questions and by verifying a number of recent studies. 相似文献
15.
Functional dependencies (FDs) and inclusion dependencies (INDs) convey most of data semantics in relational databases and are very useful in practice since they generalize keys and foreign keys. Nevertheless, FDs and INDs are often not available, obsolete or lost in real-life databases. Several algorithms have been proposed for mining these dependencies, but the output is always in the same format: a simple list of dependencies, hard to understand for the user. In this paper, we define informative Armstrong databases (IADBs) from databases as being small subsets of an existing database, satisfying exactly the same FDs and INDs. They are an extension of the classical notion of Armstrong databases, but more suitable for the understanding of dependencies, since tuples are real-world tuples. The main result of this paper is to bound the size of an IADB in the case of non-circular INDs. A constructive proof of this result is given, from which an algorithm has been devised. An implementation and experiments against a real-life database were performed; the obtained database contains 0.6% of the initial database tuples only. More importantly, such semantic sampling of databases appear to be a key feature for the understanding of existing databases at the logical level. 相似文献
16.
This paper surveys research on enabling keyword search in relational databases. We present fundamental characteristics and
discuss research dimensions, including data representation, ranking, efficient processing, query representation, and result
presentation. Various approaches for developing the search system are described and compared within a common framework. We
discuss the evolution of new research strategies to resolve the issues associated with probabilistic models, efficient top-k query processing, and schema analysis in relational databases. 相似文献
17.
Chomicki J. Goldin D. Kuper G. Toman D. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(6):1422-1436
In this paper, we study constraint databases with variable independence conditions (vics). Such databases occur naturally in the context of temporal and spatiotemporal database applications. Using computational geometry techniques, we show that variable independence is decidable for linear constraint databases. We also present a set of rules for inferring vics in relational algebra expressions. Using vics, we define a subset of relational algebra that is closed under restricted aggregation. 相似文献
18.
Controlled query evaluation (CQE) preserves confidentiality in information systems at runtime. A confidentiality policy specifies the information a certain user is not allowed to know. At each query, a censor checks whether the answer would enable the user to learn any classified information. In that case, the answer is distorted, either by lying or by refusal. We introduce a framework in which CQE can be analyzed wrt. possibly incomplete logic databases. For each distortion method, lying and refusal, a class of confidentiality-preserving mechanisms is presented. Furthermore, we specify a third approach that combines lying and refusal and compensates the disadvantages of the respective uniform methods. The enforcement methods are compared to the existing methods for complete databases. This work was partially funded by the German Research Foundation (DFG) under Grant No. BI-311/12-1. An extended abstract was presented at the LICS’05 Affiliated Workshop on Foundations of Computer Security, and is available from . 相似文献
19.
Jaiwei Han 《Knowledge and Data Engineering, IEEE Transactions on》1995,7(2):261-273
Many popularly studied recursions in deductive databases can be compiled into one or a set of highly regular chain generating paths, each of which consists of one or a set of connected predicates. Previous studies on chain-based query evaluation in deductive databases take a chain generating path as an inseparable unit in the evaluation. However, some recursions, especially many functional recursions whose compiled chain consists of infinitely evaluable function(s), should be evaluated by chain-split evaluation, which splits a chain generating path into two portions in the evaluation: an immediately evaluable portion and a delayed-evaluation portion. The necessity of chain-split evaluation is examined from the points of view of both efficiency and finite evaluation, and three chain-split evaluation techniques: magic sets, buffered evaluation, and partial evaluation are developed. Our study shows that chain-split evaluation is a primitive recursive query evaluation technique for different kinds of recursions, and it can be implemented efficiently in deductive databases by extensions to the existing recursive query evaluation methods 相似文献
20.
Abstract-driven pattern discovery in databases 总被引:6,自引:0,他引:6
The problem of discovering interesting patterns in large volumes of data is studied. Patterns can be expressed not only in terms of the database schema but also in user-defined terms, such as relational views and classification hierarchies. The user-defined terminology is stored in a data dictionary that maps it into the language of the database schema. A pattern is defined as a deductive rule expressed in user-defined terms that has a degree of uncertainty associated with it. Methods are presented for discovering interesting patterns based on abstracts which are summaries of the data expressed in the language of the user 相似文献