首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
When a multidatabase system contains textual database systems (i.e., information retrieval systems), queries against the global schema of the multidatabase system may contain a new type of joins-joins between attributes of textual type. Three algorithms for processing such a type of joins are presented and their I/O costs are analyzed in this paper. Since such a type of joins often involves document collections of very large size, it is very important to find efficient algorithms to process them. The three algorithms differ on whether the documents themselves or the inverted files on the documents are used to process the join. Our analysis and the simulation results indicate that the relative performance of these algorithms depends on the input document collections, system characteristics, and the input query. For each algorithm, the type of input document collections with which the algorithm is likely to perform well is identified. An integrated algorithm that automatically selects the best algorithm to use is also proposed  相似文献   

2.
多数据库系统中查询分解算法的研究   总被引:1,自引:0,他引:1  
多数据库系统允许用户使用一个集成模式和简单的全局查询语言同时访问多个异构的、自治的数据库系统。全局查询分解处理是多数据库系统中的一个很重要的问题。本文给出了一种多数据库环境中的模式信息管理方法,基于这些模式信息,我们提出一种易于实现的查询分解算法。由于多数据库查询分解处理与模式集成的实现紧密相关,所以本文对多数据库系统的模式集成作了一些描述。  相似文献   

3.
In a multidatabase system, the participating databases are autonomous. The schemas of these databases may be different in various ways, while the same information is represented. A global query issued against the global database needs to be translated to a proper form before it can be executed in a local database. Since data requested by a query (or a part of a query) is sometimes available in multiple sites, the site (database) that processes the query with the least cost is the desired query processing site. The authors study the effect of differences in schemas on the cost of query processing in a multidatabase environment. They first classify schema conflicts to different types. For each type of conflict, they show how much more or less complex a translated query can become in comparison with the originally user-issued global query. Based on this observation, they propose an analytical method that considers the conflicts between local databases and finds the database(s) that renders the least execution cost in processing a global query. This research introduces a new level of query optimization (termed the schema-level optimization) in multidatabase environments. The results provide a new dimension of enhancement for the capability of a query optimizer in multidatabase systems  相似文献   

4.
This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper.  相似文献   

5.
A multidatabase system (MDBS) integrates information from multiple autonomous local databases. Performing global query optimization to achieve efficient query processing in such a system is challenging due to local autonomy of the data sources. Dynamic factors in the environment make the problem even more difficult. In this paper, we present two techniques, i.e., contention space partitioning and cost error controlling, to perform global query optimization in a dynamic MDBS. Both techniques generate an execution plan with multiple versions for a query in a dynamic MDBS, utilizing the multistate cost models built for the dynamic environment via our previous multistate query sampling method. The first technique partitions the contention space of a dynamic multidatabase environment into a given number of subspaces and chooses a good query execution plan version for each subspace, while the second technique selects a set of execution plan versions by using a given error tolerance to control query execution costs. Experiments demonstrate that the proposed techniques are quite promising for performing global query optimization in a dynamic MDBS. Compared with related work on dynamic query optimization, our approach has an advantage of avoiding the high overhead for modifying or re-generating an execution plan for a query based on dynamic runtime information. Research was supported by the US National Science Foundation under Grant # IIS-9811980 and The University of Michigan.  相似文献   

6.
New applications of information systems need to integrate a large number of heterogeneous databases over computer networks. Answering a query in these applications usually involves selecting relevant information sources and generating a query plan to combine the data automatically. As significant progress has been made in source selection and plan generation, the critical issue has been shifting to query optimization. This paper presents a semantic query optimization (SQO) approach to optimizing query plans of heterogeneous multidatabase systems. This approach provides global optimization for query plans as well as local optimization for subqueries that retrieve data from individual database sources. An important feature of our local optimization algorithm is that we prove necessary and sufficient conditions to eliminate an unnecessary join in a conjunctive query of arbitrary join topology. This feature allows our optimizer to utilize more expressive relational rules to provide a wider range of possible optimizations than previous work in SQO. The local optimization algorithm also features a new data structure called AND-OR implication graphs to facilitate the search for optimal queries. These features allow the global optimization to effectively use semantic knowledge to reduce the data transmission cost. We have implemented this approach in the PESTO (Plan Enhancement by SemanTic Optimization) query plan optimizer as a part of the SIMS information mediator. Experimental results demonstrate that PESTO can provide significant savings in query execution cost over query plan execution without optimization  相似文献   

7.
Foreign functions have been considered in the advanced database systems to support complex applications. We consider optimizing queries with foreign functions in a distributed environment. In traditional distributed query processing, selection operations are locally processed before joins as much as possible so that the size of relations being transmitted and joined can be reduced. However, if selection predicates involve foreign functions, the cost of evaluating selections cannot be ignored. As a result, the execution order of selections and joins becomes significant, and the trade-off for reducing the costs of data transmission, join processing, and selection predicate evaluation needs to be carefully considered in query optimization. A response time model is developed for estimating the cost of distributed query processing involving foreign functions. We explore the property of the problem and find an optimal algorithm with polynomial complexity for a special case of it. However, finding the optimal execution plan for the general case is NP-hard. We propose an efficient heuristic algorithm for solving the problem and the simulation result shows its good quality. The research result can also be applied to the advanced database systems and the multidatabase systems where the conversion function defined for the need of schema integration can be considered a type of foreign functions  相似文献   

8.
To meet users' growing needs for accessing pre-existing heterogeneous databases, a multidatabase system (MDBS) integrating multiple databases has attracted many researchers recently. A key feature of an MDBS is local autonomy. For a query retrieving data from multiple databases, global query optimization should be performed to achieve good system performance. There are a number of new challenges for global query optimization in an MDBS. Among them, a major one is that some local optimization information, such as local cost parameters, may not be available at the global level because of local autonomy. It creates difficulties for finding a good decomposition of a global query during query optimization. To tackle this challenge, a new query sampling method is proposed in this paper. The idea is to group component queries into homogeneous classes, draw a sample of queries from each class, and use observed costs of sample queries to derive a cost formula for each class by multiple regression. The derived formulas can be used to estimate the cost of a query during query optimization. The relevant issues, such as query classification rules, sampling procedures, and cost model development and validation, are explored in this paper. To verify the feasibility of the method, experiments were conducted on three commercial database management systems supported in an MDBS. Experimental results demonstrate that the proposed method is quite promising in estimating local cost parameters in an MDBS.  相似文献   

9.
In a multidatabase system that consists of object databases, the same real-world entity can be stored as objects in different databases with incompatible object identifiers. How to identify and integrate these objects representing the same entities such that (a) object duplication in the query result can be avoided, (b) information for the entity can be gathered, and (c) the specialization of multiple classes can be built is an important issue to provide a well structured global object schema and a more informative query result. In this paper, we extend our results on probabilistic query processing and joining relations on incompatible keys to solve the problem. Various data and schema conflicts such as missing data, inconsistent data and domain mismatch which may exist in classes from different databases are considered in the process of identification.Recommended by: Amit Sheth  相似文献   

10.
基于分布对象Web的主流数据库集成系统   总被引:2,自引:0,他引:2  
面向新一代的分布式对象Web体系结构,将全局数据库技术,全局数据库事务管理技术,主流数据库对象化组件技术和先进的CORBA对象技术综合成一体,在支持异构分布式系统的CORBA机制,全局数据库模式,全局查询语言的定义,语法和语义分析,查询优化,主流数据库的对象化表示,事务的管理,并发控制,安全管理,全局事务的完整性处理,主流数据库的集成工具,多数据库系统的管理和维护策略等方面,针对实用化的目标开展研究并加以实现。  相似文献   

11.
Advances in networking and database technology have made global information sharing a reality. Multidatabase systems (MDBSs) represent a promising approach to addressing the challenges of achieving interoperability among multiple pre-existing databases that are highly autonomous and possibly heterogeneous. The performance of an MDBS is greatly dependent on effectiveness of multidatabase query optimization (MQO). However, the unavailability of and uncertainty in the statistics essential to query optimization have made multidatabase query optimization (MQO) significantly more challenging than distributed query optimization. This research undertook to develop a fuzzy statistics-based MQO approach to addressing statistics estimation and uncertainty problems in an MDBS environment. We analyzed the statistics needed in an MDBS environment and classified them into three categories: point-based, distribution-function-based and dependency-based. Fuzzy numbers were adopted to represent point-based statistics, and a fuzzy polynomial regression method was developed for estimating distribution function-based statistics (i.e., attribute or join selectivity) from a set of subquery results. For dependency-based statistics, a fuzzy regression method was employed for estimating logical-parameter-based local cost functions. Furthermore, methods for ranking the fuzzy numbers that are fundamental to fuzzy-statistics-based MQO were also discussed. The proposed fuzzy statistics estimation methods were illustrated using examples to demonstrate its applicability in supporting MQO.  相似文献   

12.
This paper presents an approach to query decomposition in a multidatabase environment. The unique aspect of this approach is that it is based on performing transformations over an object algebra that can be used as the basis for a global query language. In the paper, we first present our multidatabase environment and semantic framework, where a global conceptual schema based on the Object Data Management Group standard encompasses the information from heterogeneous data sources that include relational databases as well as object-oriented databases and flat file sources. The meta-data about the global schema is enhanced with information about virtual classes as well as virtual relationships and inheritance hierarchies that exist between multiple sources. The AQUA object algebra is used as the formal foundation for manipulation of the query expression over the multidatabase. AQUA is enhanced with distribution operators for dealing with data distribution issues. During query decomposition we perform an extensive analysis of traversals for path expressions that involve virtual relationships and hierarchies for access to several heterogeneous sources. The distribution operators defined in algebraic terms enhance the global algebra expression with semantic information about the structure, distribution, and localization of the data sources relevant to the solution of the query. By using an object algebra as the basis for query processing, we are able to define algebraic transformations and exploit rewriting techniques during the decomposition phase. Our use of an object algebra also provides a formal and uniform representation for dealing with an object-oriented approach to multidatabase query processing. As part of our query processing discussion, we include an overview of a global object identification approach for relating semantically equivalent objects from diverse data sources, illustrating how knowledge about global object identity is used in the decomposition and assembly processes.  相似文献   

13.
介绍了自行研制的panorama多数据库系统的查询优化的实现方法.提出了一种在分布式对象管理体系结构环境下的多数据库系统的动态查询优化技术.在查询优化的执行过程中.使用了基于多元线性回归模型的统计决策机制.由于多数据库的查询优化和模式集成的实现方式也有一定的关系.所以对多数据库系统的模式集成也作了一些描述。  相似文献   

14.
在大型强子对撞机(LHC)上紧凑型缪子螺线管探测器(CMS)实验的复杂数据环境下,有多个关系型数据源记录了关于数据组织和分布的信息。为实现数据查询系统的精确关键词查询功能,通过分析数据库模式图的方法,将关键词查询语言动态翻译成SQL语言,设计并实现一个跨数据库平台的关键词查询系统。针对动态翻译过程中存在的二义性问题,提出基于查询实体的模式图分析算法,以及基于最小权重树查找的动态连接算法。实验结果表明,该动态连接算法能为关键词查询正确生成所需数据库表的连接方式,使关键词查询系统具有较高的查询效率,以满足用户实时、精确查询的需求。  相似文献   

15.
We consider adaptive index utilization as a fine-grained problem in autonomic databases in which an existing index is dynamically determined to be used or not in query processing. As a special case, we study this problem for structural joins, the core operator in XML query processing, in the main memory. We find that index utilization is beneficial for structural joins only under certain join selectivity and distribution of matching elements. Therefore, we propose adaptive algorithms to decide whether to use an index probe or a data scan for each step of matching during the processing of a structural join operator. Our adaptive algorithms are based on the history, the look-ahead information, or both. We have developed a cost model to facilitate this adaptation and have conducted experiments with both synthetic and real-world data sets. Our results show that adaptively utilizing indexes in a structural join improves the performance by taking advantage of both sequential scans and index probes.  相似文献   

16.
We consider adaptive index utilization as a fine-grained problem in autonomic databases in which an existing index is dynamically determined to be used or not in query processing. As a special case, we study this problem for structural joins, the core operator in XML query processing, in the main memory. We find that index utilization is beneficial for structural joins only under certain join selectivity and distribution of matching elements. Therefore, we propose adaptive algorithms to decide whether to use an index probe or a data scan for each step of matching during the processing of a structural join operator. Our adaptive algorithms are based on the history, the look-ahead information, or both. We have developed a cost model to facilitate this adaptation and have conducted experiments with both synthetic and real-world data sets. Our results show that adaptively utilizing indexes in a structural join improves the performance by taking advantage of both sequential scans and index probes  相似文献   

17.
Existence of semantic conflicts between component databases severely impacts query processing in a multidatabase system. In this paper, we describe two types of semantic conflicts that have to be dealt with in the integration of databases modeling information about related sets of real-world entities. These are the entityidentification problem and theattribute value conflict problem. While thetwo-way outerjoin operation has been commonly used for resolving entity identification problem between two component relations, outerjoins using regular equality comparisons between component relation keys is shown to produce counter-intuitive entity identification result. We remedy this by defining a newkey-equality comparator in place of regular equality comparator, for outerjoins. For the attribute value conflict problem, we define aGeneralized Attribute Derivation (GAD) operation which allows user-defined attribute derivation functions to be used to compute new attributes from the component relations' attributes. By adding two-way outerjoin andGAD to the set of relational operations, the traditional algebraic transformation framework for relational queries is no longer adequate for multidatabase query processing and optimization. As a result, we introduceconstrained query tree as the multidatabase query representation. We show that some knowledge about query predicates and attribute derivation functions can be used to simplify queries. Such knowledge is modeled as an outerjoin graph attached to every outerjoin operation in the query tree. Based on this, we further extend the traditional algebraic transformation framework to include two-way outerjoins andGAD operations. Our framework demonstrates that properties of selection/join predicates and attribute derivation functions can be used to provide interesting transformation alternatives. This framework also serves as a formal ground for developing optimization strategies for multidatabase queries. Recommended by: Clement Yu  相似文献   

18.
王进鹏  张亚非  苗壮 《计算机科学》2010,37(12):134-137
为实现异构关系数据库的语义集成,针对传统集成技术存在的问题,在对语义网等相关技术进行分析的基础上,研究基于本体的关系数据集成系统中的查询处理问题,提出了一种基于本体的关系数据库集成框架。设计了基于本体的关系数据的描述方法,使用本体作为集成的全局模式来描述关系模式的语义。设计了查询重写算法,该算法可以将基于全局模式的SPARQL查询重写为针对具体关系数据库的查询,从而实现对异构关系数据库的集成。实验表明,该算法具有良好的可扩展性。  相似文献   

19.
Data warehouses are very large databases usually designed using the star schema. Queries defined on data warehouses are generally complex due to join operations involved. The performance of star schema queries in data warehouses is highly critical and its optimization is hard in general. Several query performance optimization methods exist, such as indexes and table partitioning. In this paper, we propose a new approach based on binary particle swarm optimization for solving the bitmap join index selection problem in data warehouses. This approach selects the optimal set of bitmap join indexes based on a mathematical cost model. Several experiments are performed to demonstrate the effectiveness of the proposed method on the bitmap join index selection problem. Further testing of the method is performed using a database environment specific cost function. The binary particle swarm optimization is found to be more effective than both the genetic algorithm and data mining based approaches.  相似文献   

20.
Heterogeneities exist in a multidatabase environment. For example, a real world entity may be differently represented in relations of different databases. In particular, keys of these relations may be incompatible. In this paper, we consider processing entity join queries when data transmission cost dominates. An entity join operation ‘integrates’ tuples representing the same entities from different relations in which inconsistent data may exist. A natural way to process the entity join is to transmit both relations to a site, resolve the possible conflicts between corresponding attributes and process the join, which is very costly. In this paper, an approach is proposed to correctly transform a global query into local subqueries to preprocess entity join queries in multiple sites with an attempt to lower the cost of data transmission. Besides, an extension of the traditional semijoin, named extended semijoin, is proposed to further reduce the cost of data transmission for entity join query processing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号