首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental integrity constraints that arise in practice in relational databases. We introduce null inclusion dependencies (NINDs) to cater for the situation when a database is incomplete and contains null values. We show that the implication problem for NINDs is the same as that for INDs. We then present a sound and complete axiom system for null functional dependencies (NFDs) and NINDs, and prove that the implication problem for NFDs and NINDs is decidable and EXPTIME-complete. By contrast, when no nulls are allowed, this implication problem is undecidable. This undecidability result has motivated several researchers to restrict their attention to FDs and noncircular INDs in which case the implication problem was shown to be EXPTIME- complete. Our results imply that when considering nulls in relational database design we need not assume that NINDs are noncircular.  相似文献   

2.
We are interested in specifying functional dependencies (FDs) for data-centric XML documents (XML documents that are used mainly for data storage). FDs are a natural constraint. Specifying FDs for XML documents is more difficult because unlike relational databases, XML documents do not have uniform structures. This paper introduces XML Template Functional Dependencies (XTFDs), which are able to specify FDs for XML documents. This paper also presents a necessary and sufficient condition for an XTFD to cause data redundancy in XML documents. Further, we propose Attribute Rule and Text String Rule as two procedures that can be repeatedly applied to remove redundancy caused by XTFDs. In addition, we prove that if an XML document has data redundancy with respect to an FD specified by using the tree tuple approach, it would have data redundancy with respect to an XTFD and show by example that XTFDs can specify some FDs for XML documents that the tree tuple approach cannot.  相似文献   

3.
Functional dependencies (FDs) and inclusion dependencies (INDs) convey most of data semantics in relational databases and are very useful in practice since they generalize keys and foreign keys. Nevertheless, FDs and INDs are often not available, obsolete or lost in real-life databases. Several algorithms have been proposed for mining these dependencies, but the output is always in the same format: a simple list of dependencies, hard to understand for the user. In this paper, we define informative Armstrong databases (IADBs) from databases as being small subsets of an existing database, satisfying exactly the same FDs and INDs. They are an extension of the classical notion of Armstrong databases, but more suitable for the understanding of dependencies, since tuples are real-world tuples. The main result of this paper is to bound the size of an IADB in the case of non-circular INDs. A constructive proof of this result is given, from which an algorithm has been devised. An implementation and experiments against a real-life database were performed; the obtained database contains 0.6% of the initial database tuples only. More importantly, such semantic sampling of databases appear to be a key feature for the understanding of existing databases at the logical level.  相似文献   

4.
条件函数依赖是函数依赖在语义上的扩充,可以应用于数据清洗工作,在数据库一致性的修复上应用广泛。讨论了条件函数依赖的相关语义规则,重点研究了基于条件函数依赖对违反数据库一致性元组的检测工作,并引入置信度评价机制,对相关的检测规则进行了改进。改进后的检测方法在基于多个函数依赖的检测中显示出了优越性,使得检测工作更为精简,检测标准更加明确。  相似文献   

5.
Constraint relational databases use constraints to both model and query data. A constraint relation contains a finite set of generalized tuples. Each generalized tuple is represented by a conjunction of constraints on a given logical theory and, depending on the logical theory and the specific conjunction of constraints, it may possibly represent an infinite set of relational tuples. For their characteristics, constraint databases are well suited to model multidimensional and structured data, like spatial and temporal data. The definition of an algebra for constraint relational databases is important in order to make constraint databases a practical technology. We extend the previously defined constraint algebra (called generalized relational algebra). First, we show that the relational model is not the only possible semantic reference model for constraint relational databases and we show how constraint relations can be interpreted under the nested relational model. Then, we introduce two distinct classes of constraint algebras, one based on the relational algebra, and one based on the nested relational algebra, and we present an algebra of the latter type. The algebra is proved equivalent to the generalized relational algebra when input relations are modified by introducing generalized tuple identifiers. However, from a user point of view, it is more suitable. Thus, the difference existing between such algebras is similar to the difference existing between the relational algebra and the nested relational algebra, dealing with only one level of nesting. We also show how external functions can be added to the proposed algebra  相似文献   

6.
Data refinement in a state-based language such as Z is defined using a relational model in terms of the behaviour of abstract programs. Downward and upward simulation conditions form a sound and jointly complete methodology to verify relational data refinements, which can be checked on an event-by-event basis rather than per trace. In models of concurrency, refinement is often defined in terms of sets of observations, which can include the events a system is prepared to accept or refuse, or depend on explicit properties of states and transitions. By embedding such concurrent semantics into a relational framework, eventwise verification methods for such refinement relations can be derived. In this paper, we continue our program of deriving simulation conditions for process algebraic refinement by defining further embeddings into our relational model: traces, completed traces, failure traces and extension. We then extend our framework to include various notions of automata based refinement.  相似文献   

7.
The nested relational model allows relations that are not in first normal form. This paper gives an extension of Datalog rules for nested relations. In our approach, nested Datalog is a natural extension of Datalog introduced for the relational data model. A nested Datalog program has a hierarchical structure of rules and subprograms to manipulate relation values of nested relations. We introduce a new category of predicate symbols, the variable predicate symbols to refer to tuples of subrelations. The notion of soundness, safety and consistency is defined to avoid undesirable nested Datalog programs. The evaluation of nested Datalog is given in terms of the nested relational algebra. Finally, we relate the expressive power of nonrecursive nested Datalog to the power of nested relational algebra and safe nested tuple relational calculus.  相似文献   

8.
In this paper, we describe the notion of a ranked relation that incorporates to the relational data model the notion of rank, i.e. ordering among tuples or objects. The ordering of tuples may be based on a single rank information, or multiple ranks combined together. We show that such relations arise naturally in many applications, especially in applications that query outside sources and return ranked relations as answers to content based queries. We introduce an algebra for querying ranked relations and give examples of its use for various applications. We then prove various properties of the algebra with special emphasis on the preservation of the coherence property, which shows when different rank columns are guaranteed to induce the same ordering among tuples. We show how these properties can be used to produce approximate early returns. Finally, we give experimental results based on Internet search engines for our early returns method and show that our method provides meaningful and fast answers to the user.  相似文献   

9.
As the information available to naïve users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as QPIAD aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values—which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this paper, we present a principled probabilistic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. We learn this distribution in terms of Bayesian networks. Our approach involves mining/“learning” Bayesian networks from a sample of the database, and using it to do both imputation (predict a missing value) and query rewriting (retrieve relevant results with incompleteness on the query-constrained attributes, when the data sources are autonomous). We present empirical studies to demonstrate that (i) at higher levels of incompleteness, when multiple attribute values are missing, Bayesian networks do provide a significantly higher classification accuracy and (ii) the relevant possible answers retrieved by the queries reformulated using Bayesian networks provide higher precision and recall than AFDs while keeping query processing costs manageable.  相似文献   

10.
This paper contains extensions to words on countable scattered linear orderings of two well-known results of characterization of languages of finite words. We first extend a theorem of Schützenberger establishing that the star-free sets of finite words are exactly the languages recognized by finite aperiodic semigroups. This gives an algebraic characterization of star-free sets of words over countable scattered linear orderings. Contrarily to the case of finite words, first-order definable languages are strictly included into the star-free languages when countable scattered linear orderings are considered. Second, we extend the variety theorem of Eilenberg for finite words: there is a one-to-one correspondence between varieties of languages of words on countable scattered linear orderings and pseudo-varieties of algebras. The star-free sets are an example of such a variety of languages.  相似文献   

11.
This paper describes a database model based on the original rough sets theory. Its rough relations permit the representation of a rough set of tuples not definable in terms of the elementary classes, except through use of lower and upper approximations. The rough relational database model also incorporates indiscernibility in the representation and in all the operators of the rough relational algebra. This indiscernibility is based strictly on equivalence classes which must be defined for every attribute domain. There are several obvious applications for which the rough relational database model can more accurately model an enterprise than does the standard relational model. These include systems involving ambiguous, imprecise, or uncertain data. Retrieval over mismatched domains caused by the merging of one or more applications can be facilitated by the use of indiscernibility, and naive system users can achieve greater recall with the rough relational database. In addition, applications inherently “rough” could be more easily implemented and maintained in the rough relational database.  相似文献   

12.
Modern applications increasingly require the storage of data beyond relational structure. The challenge of providing well-founded data models that can handle complex objects such as lists, sets, multisets, unions and references has not been met yet in a completely satisfactory way. The success of such data models will greatly depend on the existence of automated database design techniques that generalise achievements from relational databases. In this paper, we study the implication problem of functional dependencies (FDs) in the presence of records, sets, multisets and lists. Database schemata are defined as nested attributes, database instances as nested relations and FDs are defined in terms of subattributes of the database schema. The expressiveness of FDs deviates fundamentally from previous approaches in different data models including the nested relational data model and XML.  相似文献   

13.
Two kinds of fuzziness in attribute values of the fuzzy relational databases can be distinguished: One is that attribute values are possibility distributions, and the other is that there are resemblance relations in attribute domains. The fuzzy relational databases containing these two kinds of fuzziness simultaneously are called extended possibility‐based fuzzy relational databases. In this paper, we focus on such fuzzy relational databases. We classify two kinds of fuzzy data redundancies and define their removal. On this basis, we define fuzzy relational operations in relational algebra, which, being similar to the conventional relational databases, are complete and sound. In particular, we investigate fuzzy querying strategies and give the form of fuzzy querying with SQL. © 2002 Wiley Periodicals, Inc.  相似文献   

14.
牟晨琪 《计算机应用》2012,32(11):2977-2980
编码理论中的BMS算法具有良好的解码效率与纠错能力,目前的研究通常集中于分次项序下的情形。通过分析字典序与分次项序的本质特征,利用与BMS算法密切相关的Gr?bner基的消去性质,设计出字典序下BMS算法的终止条件,并给出了基于该条件的易于实现的具体算法描述。实验结果表明,该终止条件切实有效,与算法中的原始理论终止条件完全吻合。  相似文献   

15.
Paraconsistent information is information that is incomplete and/or inconsistent. A data model for handling paraconsistent information in relational databases has recently been developed. In this paper, we show that a DBMS based on paraconsistent relations must be capable of handling infinite relations. We also identify classes of infinite paraconsistent relations whose members can be effectively represented and manipulated. We show that the classes of REGULAR and, under different conditions, CONTEXT-SENSITIVE as well as PSPACE paraconsistent relations are such. We also show that the CONTEXT-FREE and R.E. classes do not have the desired properties, while P, NP, LOGSPACE and NLOGSPACE also probably do not. These results help identify the kinds of relational DBMS that can be constructed for handling incomplete and inconsistent information about tuples. We finally show that all operations for the aforementioned PSPACE and CONTEXT-SENSITIVE cases can be carried out efficiently in polynomial time.  相似文献   

16.
A common task of Web users is querying structured information from Web pages. For realizing this interesting scenario we propose a novel query processor for systematically discovering instances of semantic relations in Web search results and joining these relation instances into complex result tuples with conjunctive queries. Our query processor transforms a structured user query into keyword queries that are submitted to a search engine, forwards search results to a relation extractor, and then combines relations into complex result tuples. The processor automatically learns discriminative and effective keywords for different types of semantic relations. Thereby, our query processor leverages the index of a search engine to query potentially billions of pages. Unfortunately, relation extractors may fail to return a relation for a result tuple. Moreover, user defined data sources may not return at least k complete result tuples. Therefore we propose an adaptive routing model based on information theory for retrieving missing attributes of incomplete result tuples. The model determines the most promising next incomplete tuple and attribute type for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our query processor returns complete result tuples while processing only very few Web pages.  相似文献   

17.
Justification for inclusion dependency normal form   总被引:3,自引:0,他引:3  
Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental integrity constraints that arise in practice in relational databases. In this paper, we address the issue of normalization in the presence of FDs and INDs and, in particular, the semantic justification for an inclusion dependency normal form (IDNF), which combines the Boyce-Codd normal form with the restriction on the INDs that they be noncircular and key-based. We motivate and formalize three goals of database design in the presence of FDs and INDs: noninteraction between FDs and INDs, elimination of redundancy and update anomalies, and preservation of entity integrity. We show that (as for FDs), in the presence of INDs, being free of redundancy is equivalent to being free of update anomalies. Then, for each of these properties, we derive equivalent syntactic conditions on the database design. Individually, each of these syntactic conditions is weaker than IDNF and the restriction that an FD is not embedded in the right-hand side of an IND is common to three of the conditions. However, we also show that, for these three goals of database design to be satisfied simultaneously, IDNF is both a necessary and a sufficient condition  相似文献   

18.
A structure for a relational database system is described which involves a new structure called a linkage. A linkage is a set of interdependent relations. Together with relations, tuples, attributes and attribute-values, this provides a hierarchy of structures within which one may specify and generate test data which are not only valid with respect to attribute domains, but preserve dependencies. The discussion refers to an experimental system QIKSYS in which these ideas have been implemented and some of the features of this system are described.  相似文献   

19.
The answer to a top-k query is an ordered set of tuples, where the ordering is based on how closely each tuple matches the query. In the context of middleware systems, new algorithms to answer top-k queries have been recently proposed. Among these, the threshold algorithm (TA) is the most well-known instance due to its simplicity and memory requirements. TA is based on an early-termination condition and can evaluate top-k queries without examining all the tuples. This top-k query model is prevalent not only over middleware systems, but also over plain relational data. In this work, we analyze the challenges that must be addressed to adapt TA to a relational database system. We show that, depending on the available indices, many alternative TA strategies can be used to answer a given query. Choosing the best alternative requires a cost model that can be seamlessly integrated with that of current optimizers. In this work, we address these challenges and conduct an extensive experimental evaluation of the resulting techniques by characterizing which scenarios can take advantage of TA-like algorithms to answer top-k queries in relational database systems  相似文献   

20.
Assuming data domains are partially ordered, we define the partially ordered relational algebra (PORA) by allowing the ordering predicate ? to be used in formulae of the selection operator σ. We apply Paredaens and Bancilhon's Theorem to examine the expressiveness of the PORA, and show that the PORA expresses exactly the set of all possible relations which are invariant under order-preserving automorphisms of databases. The extension is consistent with the two important extreme cases of unordered and linearly ordered domains. We also investigate the three hierarchies of: (1) computable queries, (2) query languages and (3) partially ordered domains, and show that there is a one-to-one correspondence between them.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号