首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Transforming queries for efficient execution is particularly important in federated database systems since a more efficient execution plan can require many fewer data requests to be sent to the component databases. Also, it is important to do as much as possible of the selection and processing close to where the data are stored, making best use of facilities provided by the federation's component database management systems. In this paper we address the problem of processing complex queries including quantifiers, which have to be executed against different databases in an expanding heterogeneous federation. This is done by transforming queries within a mediator for global query improvement, and within wrappers to make the best use of the query processing capabilities of external databases. Our approach is based on pattern matching and query rewriting. We introduce a high level language for expressing rewrite rules declaratively, and demonstrate the use and flexibility of such rules in improving query performance for existentially quantified subqueries. Extensions to this language that allow generic rewrite rules to be expressed are also presented. The value of performing final transformations within a wrapper for a given remote database is shown in several examples that use AMOS II—an SQL3-like system.  相似文献   

2.
A key problem in providing ‘enterprise-wide’ information is the integration of databases that have been independently developed. An important requirement is to accommodate heterogeneity and maintain the autonomy of component databases. Myriad is a federated database prototype developed at the University of Minnesota, to provide a testbed for investigating alternatives in architecture and algorithms for database integration, query processing and optimization, and concurrency control and recovery. The system incorporates our group's research results in these areas. This paper describes our experiences in the design and implementation of Myriad, and in the project management. Special emphasis is given to discussing design alternatives and their impact on Myriad. This paper also presents the software engineering principles and the project management techniques we used in developing Myriad and the lessons we learned. We believe these lessons would be useful for practitioners who wish to develop a similar system. Handling heterogeneity and autonomy were prime objectives throughout the prototyping effort. We are convinced that a prototype federated database is an important infrastructural requirement for the overall goal of ‘enterprise-integration’, and believe Myriad to be a significant contribution towards this.  相似文献   

3.
Management of large quantities of complex data is essential in many advanced application areas. Object-oriented (OO) database management system have been developed to effectively model and process the complex domain knowledge. They have been shown to outperform some existing relational systems. The existing implementations of OO database management systems attempt to improve the efficiency of OO queries by explicitly capturing the relationships among objects. However, the execution of complex queries involving the retrieval of objects from many classes and relationships among them causes the existing system to operate inefficiently. In this paper, we present parallel algorithms for the processing of queries against a large OO database. The algorithms are based on a closed model of query processing pattern-based access instead of the conventional value-based access. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the explicitly stored object associations. Generation of large quantities of temporary data is avoided by marking objects using their identifiers and by employing a two-phase query processing strategy. A query is processed by concurrent multiple waves, thereby improving parallelism avoiding the complexities introduced in their sequential implementation. The correctness and the performance of the parallel algorithms have been tested and analyzed by running parallel programs on a 32-node transputer based parallel machine designed and developed at the IBM Research Center at Yorktown Heights, New York. Benchmark queries of different semantic complexities are generated, and their performance is analyzed for various data and query parameters  相似文献   

4.
Advanced application domains such as computer-aided design, computer-aided software engineering, and office automation are characterized by their need to store, retrieve, and manage large quantities of data having complex structures. A number of object-oriented database management systems (OODBMS) are currently available that can effectively capture and process the complex data. The existing implementations of OODBMS outperform relational systems by maintaining and querying cross-references among related objects. However, the existing OODBMS still do not meet the efficiency requirements of advanced applications that require the execution of complex queries involving the retrieval of a large number of data objects and relationships among them. Parallel execution can significantly improve the performance of complex OO queries. In this paper, we analyze the performance of parallel OO query processing algorithms for various benchmark application domains. The application domains are characterized by specific mixes of queries of different semantic complexities. The performance of the application domains has been analyzed for various system and data parameters by running parallel programs on a 32-node transputer based parallel machine developed at the IBM Research Center at Yorktown Heights. The parallel processing algorithms, data routing techniques, and query management and control strategies have been implemented to obtain accurate estimation of controlling and processing overheads. However, generation of large complex databases for the study was impractical. Hence, the data used in the simulation have been parameterized. The parallel OO query processing algorithms analyzed in this study are based on a query graph approach rather than the traditional query tree approach. Using the query graph approach, a query is processed by simultaneously initiating the execution at several object classes, thereby, improving the parallelism. During processing, the algorithms avoid the execution of time-consuming join operations by making use of the object references among the objects. Further, the algorithms do not generate any temporary data, thereby, reducing disk accesses. This is accomplished by marking the selected objects and by employing a two-phase query processing strategy.  相似文献   

5.
Wrappers export the schema and data of existing heterogeneous databases and support queries on them. In the context of cooperative information systems, we present a flexible approach to specify the derivation of object-oriented (OO) export databases from local relational databases. Our export database derivation consists of a set of extent derivation structures (EDS) which defines the extent and deep extent of export classes. Having well-defined semantics, the EDS can be readily used in transforming wrapper queries to local queries. Based on the EDS, we developed a wrapper query evaluation strategy which handles OO queries on the export databases. The strategy is unique in that it considers the limited query processing capabilities of local database systems as well as the language constraints on the local query languages.  相似文献   

6.
In this research, we address the query clustering problem which involves determining globally optimal execution strategies for a set of queries. The need to process a set of queries together often arises in deductive database systems, scientific database systems, large bibliographic retrieval systems and several other database applications. We address the optimization problem from the perspective of overlaps in data requirements, and model the batched operations using a set-partitioning approach. In this model, we first consider the case of m queries each involving a two-way join operation. We develop a recursive methodology to determine all the processing strategies in this case. Next, we establish certain dominance properties among the strategies, and develop exact as well as heuristic algorithms for selecting an appropriate strategy. We extend this analysis to a clustering approach, and outline a framework for optimizing multiway joins. The results show that the proposed approach is viable and efficient, and can easily be incorporated into the query processing component of most database systems  相似文献   

7.
异构数据源集成系统需要集成查询能力差别较大的各种数据源。为充分利用数据源的查询能力和简化包装器设计,本文提出使用复合法进行基于能力的查询分解和优化。在复合法中,仅需简单说明数据的基本查询能力,各数据源对查询处理的特殊限制封装在包装器中。文中给出描述数据源基本查询能力的方法,并且在讨论OIM对象代数操作的概括查询和过滤子的基础上,探讨数据源包装器对基本支持查询的处理方法。  相似文献   

8.
本文在分析联邦数据库系统成员异构的特点后,论述了一种基于输入/输出模式和概念模式的转换方法,通过该转换处理可以有效地解决模式异构的问题,这种方法已在SU-FDBS原型中实现。  相似文献   

9.
A federated database is comprised of multiple interconnected database systems that primarily operate independently but cooperate to a certain extent. Global integrity constraints can be very useful in federated databases, but the lack of global queries, global transaction mechanisms, and global concurrency control renders traditional constraint management techniques inapplicable. This paper presents a threefold contribution to integrity constraint checking in federated databases: (1) The problem of constraint checking in a federated database environment is clearly formulated. (2) A family of protocols for constraint checking is presented. (3) The differences across protocols in the family are analyzed with respect to system requirements, properties guaranteed by the protocols, and processing and communication costs. Thus, our work yields a suite of options from which a protocol can be chosen to suit the system capabilities and integrity requirements of a particular federated database environment.  相似文献   

10.
随着语义网的快速发展,为了实现科学数据的共享,越来越多的科学数据被加工发布为关联数据,进而应用于关联查询和关联发现。针对大规模关联数据的管理,本文通过构建 RDF 数据库集群来存储海量数据,设计了基于 SPARQL 端点的联合查询系统来解决用户跨机器透明查询的问题,分析了存储策略和联合查询系统的查询处理相关技术。实际运行表明,本平台易于集成使用,可以实现大规模 RDF 数据的可扩展性存储和有效查询。  相似文献   

11.
联邦SPARQL查询是通过构建查询计划来指导查询执行,数据摘要索引文件捕获了RDF数据集的结构和语义信息,对查询计划生成过程中子查询基数评估至关重要。现有的数据摘要生成方法需要远程遍历每个数据源的完整数据,该过程成本消耗较高,且在大部分环境中联邦查询无法完成对大数据集的统计工作。为在减少数据摘要索引文件生成时间和内存开销的同时捕获尽可能真实的计数信息,考虑主语和谓语的分布偏差,提出利用样图生成原始图近似数据摘要的方法。使用对RDF图出度特征加权的采样方法获取原始图的典型样图,通过改进的映射函数将样图中的信息映射到原始图上,从而生成原始图的近似数据摘要。实验结果表明,该方法相比于基线方法至少节省了70%的数据摘要索引文件生成时间,并且仅采样0.5%的原始图生成的近似数据摘要即可在查询正确率上与基线方法保持高度一致。  相似文献   

12.
13.
Redundant processing is a key problem in the translation of initial queries posed over an ontology into SQL queries, through mappings, as it is performed by ontology-based data access systems. Examples of such processing are duplicate answers obtained during query evaluation, which must finally be discarded, or common expressions evaluated multiple times from different parts of the same complex query. Many optimizations that aim to minimize this problem have been proposed and implemented, mostly based on semantic query optimization techniques, by exploiting ontological axioms and constraints defined in the database schema. However, data operations that introduce redundant processing are still generated in many practical settings, and this is a factor that impacts query execution. In this work we propose a cost-based method for query translation, which starts from an initial result and uses information about redundant processing in order to come up with an equivalent, more efficient translation. The method operates in a number of steps, by relying on certain heuristics indicating that we obtain a more efficient query in each step. Through experimental evaluation using the Ontop system for ontology-based data access, we exhibit the benefits of our method.  相似文献   

14.
Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest-neighbor queries are the most important query types. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typically explored by iteratively asking similarity queries for answers of previous similarity queries. We introduce a generic scheme for such data mining algorithms and we investigate two orthogonal approaches, reducing I/O cost as well as CPU cost, to speed-up the processing of multiple similarity queries. The proposed techniques apply to any type of similarity query and to an implementation based on an index or using a sequential scan. Parallelization yields an additional impressive speed-up. An extensive performance evaluation confirms the efficiency of our approach  相似文献   

15.
The rapidly increasing scale of data warehouses is challenging today’s data analytical technologies. A conventional data analytical platform processes data warehouse queries using a star schema — it normalizes the data into a fact table and a number of dimension tables, and during query processing it selectively joins the tables according to users’ demands. This model is space economical. However, it faces two problems when applied to big data. First, join is an expensive operation, which prohibits a parallel database or a MapReduce-based system from achieving efficiency and scalability simultaneously. Second, join operations have to be executed repeatedly, while numerous join results can actually be reused by different queries. In this paper, we propose a new query processing framework for data warehouses. It pushes the join operations partially to the pre-processing phase and partially to the postprocessing phase, so that data warehouse queries can be transformed into massive parallelized filter-aggregation operations on the fact table. In contrast to the conventional query processing models, our approach is efficient, scalable and stable despite of the large number of tables involved in the join. It is especially suitable for a large-scale parallel data warehouse. Our empirical evaluation on Hadoop shows that our framework exhibits linear scalability and outperforms some existing approaches by an order of magnitude.  相似文献   

16.
Query processing over object views of relational data   总被引:2,自引:0,他引:2  
This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in a relational database as if it was stored in an object-oriented (OO) database. A query against the object view is translated to one or several queries against the relational database. The results of these queries are then processed to form an answer to the initial query. The approach is not restricted to a ‘pure’ object view mechanism for the relational data, since the object view can also store its own data and methods. Therefore it must be possible to process queries that combine local data residing in the object view with data retrieved from the relational database. We discuss the key issues when object views of relational databases are developed, namely: how to map relational structures to sub-type/supertype hierarchies in the view, how to represent relational database access in OO query plans, how to provide the concept of object identity in the view, how to handle the fact that the extension of types in the view depends on the state of the relational database, and how to process and optimize queries against the object view. The results are based on experiences from a running prototype implementation. Edited by: M.T. ?zsu. Received April 12, 1995 / Accepted April 22, 1996  相似文献   

17.
Skyline query processing over uncertain data streams has attracted considerable attention in database community recently, due to its importance in helping users make intelligent decisions over complex data in many real applications. Although lots of recent efforts have been conducted to the skyline computation over data streams in a centralized environment typically with one processor, they cannot be well adapted to the skyline queries over complex uncertain streaming data, due to the computational complexity of the query and the limited processing capability. Furthermore, none of the existing studies on parallel skyline computation can effectively address the skyline query problem over uncertain data streams, as they are all developed to address the problem of parallel skyline queries over static certain data sets. In this paper, we formally define the parallel query problem over uncertain data streams with the sliding window streaming model. Particularly, for the first time, we propose an effective framework, named distributed parallel framework to address the problem based on the sliding window partitioning. Furthermore, we propose an efficient approach (parallel streaming skyline) to further optimize the parallel skyline computation with an optimized streaming item mapping strategy and the grid index. Extensive experiments with real deployment over synthetic and real data are conducted to demonstrate the effectiveness and efficiency of the proposed techniques.  相似文献   

18.
In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time.  相似文献   

19.
Many large programs operate on collection types. Extensive libraries are available in many programming languages, such as the C++ Standard Template Library, which make programming with collections convenient. Extending programming languages to provide collection queries as first class constructs in the language would not only allow programmers to write queries explicitly in their programs but it would also allow compilers to leverage the wealth of experience available from the database domain to optimize such queries. This paper describes an approach to reduce the run time of programs involving explicit collection queries by performing run time query optimization that is effective for single runs of a program. In addition, it also leverages a cache to store previously computed results. The proposed approach relies on histograms built from the data at run time to estimate the selectivity of joins and predicates in order to construct query plans. Information from earlier executions of the same query during run time is leveraged during the construction of the query plans, even when the data has changed between these executions. An effective cache policy is also determined for caching the results of join (sub) queries. The cache is maintained incrementally, when the underlying collections change, and use of the cache space is optimized by a cache replacement policy. Our approach has been implemented within the Java Query Language (JQL) framework using AspectJ. Our approach demonstrated that its run time query optimization in integration with caching sub query result significantly improves the run time of programs with explicit queries over equivalent programs performing collection operations by iterating over those collections. This paper evaluates our approach using synthetic as well as real world Robocode programs by comparing it to JQL as a benchmark. Experimental results show that our approach performs better than the JQL approach with respect to the program run time.  相似文献   

20.
In migrating a legacy relational database system to the object-oriented (OO) platform, when database migration completes, application modules are to be migrated, where embedded relational database operations are mapped into their OO correspondents. In this paper we study mapping relational update operations to their OO equivalents, which include UPDATE1, INSERT and DELETE operations. Relational update operation translation from relational to OO faces the touchy problem of transformation from a value-based relationship model to a reference-based model and maintaining the relational integrity constraints. Moreover, with a relational database where inheritance is expressed as attribute value subset relationship, changing of some attribute values may lead to the change of the position of an object in the class inheritance hierarchy, which we call object migration. Considering all these aspects, algorithms are given mapping relational UPDATE, INSERT and DELETE operations to their OO correspondents. Our work emphasize in examining the differences in the representation of the source schema's semantics resulting from the translation process, as well as differences in the inherent semantics of the two models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号