首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Scaling access to heterogeneous data sources with DISCO   总被引:5,自引:0,他引:5  
Accessing many data sources aggravates problems for users of heterogeneous distributed databases. Database administrators must deal with fragile mediators, that is, mediators with schemas and views that must be significantly changed to incorporate a new data source. When implementing translators of queries from mediators to data sources, database implementers must deal with data sources that do not support all the functionality required by mediators. Application programmers must deal with graceless failures for unavailable data sources. Queries simply return failure and no further information when data sources are unavailable for query processing. The Distributed Information Search COmponent (Disco) addresses these problems. Data modeling techniques manage the connections to data sources, and sources can be added transparently to the users and applications. The interface between mediators and data sources flexibly handles different query languages and different data source functionality. Query rewriting and optimization techniques rewrite queries so they are efficiently evaluated by sources. Query processing and evaluation semantics are developed to process queries over unavailable data sources. In this article, we describe: 1) the distributed mediator architecture of Disco; 2) the data model and its modeling of data source connections; 3) the interface to underlying data sources and the query rewriting process; and 4) query processing semantics. We describe several advantages of our system  相似文献   

2.
3.
In this work we present an architecture for XML‐based mediator systems and a framework for helping systems developers in the construction of mediator‐services for the integration of heterogeneous data sources. A unique feature of our architecture is its capability to manage (proprietary) user's software tools and algorithms, modelled as Extended Value Added Services (EVASs), and integrated in the data flow. The mediator offers a view of the system as a single data source where EVASs are readily available for enhancing query processing. A Web‐based graphic interface has been developed to allow dynamic and flexible EVASs inter‐connection, thus creating complex distributed bioinformatics machines. The feasibility and usefulness of our ideas has been validated by the development of a mediator system (Bio‐Broker) and by a diverse set of applications aimed at combining gene expression data with genomic, sequence‐based and structural information, so as to provide a general, transparent and powerful solution that integrates data analysis tools and algorithms. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

4.
New applications of information systems need to integrate a large number of heterogeneous databases over computer networks. Answering a query in these applications usually involves selecting relevant information sources and generating a query plan to combine the data automatically. As significant progress has been made in source selection and plan generation, the critical issue has been shifting to query optimization. This paper presents a semantic query optimization (SQO) approach to optimizing query plans of heterogeneous multidatabase systems. This approach provides global optimization for query plans as well as local optimization for subqueries that retrieve data from individual database sources. An important feature of our local optimization algorithm is that we prove necessary and sufficient conditions to eliminate an unnecessary join in a conjunctive query of arbitrary join topology. This feature allows our optimizer to utilize more expressive relational rules to provide a wider range of possible optimizations than previous work in SQO. The local optimization algorithm also features a new data structure called AND-OR implication graphs to facilitate the search for optimal queries. These features allow the global optimization to effectively use semantic knowledge to reduce the data transmission cost. We have implemented this approach in the PESTO (Plan Enhancement by SemanTic Optimization) query plan optimizer as a part of the SIMS information mediator. Experimental results demonstrate that PESTO can provide significant savings in query execution cost over query plan execution without optimization  相似文献   

5.
提出一种基于Schema的数据管理框架。该框架利用分层式的体系结构和全局视图(GAV)的集成方法,对分布式的异构数据源XML和RDF进行语集成。讨论了分层式体系结构的组成、映射过程和查询处理。实验结果表明了该框架的可行性。  相似文献   

6.
Flexible distributed query processing capabilities are an important prerequisite for building scalable Internet applications, such as electronic Business-to-Business (B2B) market places. Architecting an electronic market place in a conventional data warehouse-like approach by integrating all the data from all participating enterprises in one centralized repository incurs severe problems: stale data, data security threats, administration overhead, inflexibility during query processing, etc. In this paper we present a new framework for dynamic distributed query processing based on so-called HyperQueries which are essentially query evaluation sub-plans sitting behind hyperlinks. Our approach facilitates the pre-materialization of static data at the market place whereas the dynamic data remains at the data sources. In contrast to traditional data integration systems, our approach executes essential (dynamic) parts of the data-integrating views at the data sources. The other, more static parts of the data are integrated à priori at the central portal, e.g., the market place. The portal serves as an intermediary between clients and data providers which execute their sub-queries referenced via hyperlinks. The hyperlinks are embedded as attribute values within data objects of the intermediarys database. Retrieving such a virtual object will execute the referenced HyperQuery in order to materialize the missing data. We illustrate the flexibility of this distributed query processing architecture in the context of B2B electronic market places with an example derived from the car manufacturing industry.Based on these HyperQueries, we propose a reference architecture for building scalable and dynamic electronic market places. All administrative tasks in such a distributed B2B market place are modeled as Web services and are initiated decentrally by the participants. Thus, sensitive data remains under the full control of the data providers. We describe optimization and implementation issues to obtain an efficient and highly flexible data integration platform for electronic market places. All proposed techniques have been fully implemented in our QueryFlow prototype system which served as the platform for our performance evaluation.  相似文献   

7.
8.
不同数据源之间的数据表示方法不同且结构上存在冲突,导致分布式环境下的异构数据源查询成为一个难点。提出一种分布式环境下的多表查询转换算法,该算法在查询转换过程中对源查询进行分解、在目标数据源上进行目标查询的转换和重构,解决数据共享中的多表查询转换问题。实验结果证明了该算法的有效性。  相似文献   

9.
AMOS is a mediator system that supports passive (non-intrusive) integration of data from heterogeneous and autonomous data sources. It is based on a functional data model and a declarative functional query language AMOSQL. Foreign data sources, e.g., relational databases, text files, or other types of data sources can be wrapped with AMOS mediators, making them accessible through AMOSQL. AMOS mediators can communicate among each other through the multi-database constructs of AMOSQL that allow definition of functional queries and OO views accessing other AMOS servers. The integrated views can contain both functions and types derived from the data sources. Furthermore, local data associated with these view definitions may be stored in the mediator database. This paper describes AMOS' multi-database query facilities and their optimization techniques. Calculus-based function transformations are used to generate minimal query expressions before the query decomposition and cost-based algebraic optimization steps take place. Object identifier (OID) generation is used for correctly representing derived objects in the mediators. A selective OID generation mechanism avoids overhead by generating in the mediator OIDs only for those derived objects that are either needed during the processing of a query or have associated local data in the mediator database. The validity of the derived objects that are assigned OIDs and the completeness of queries to the views are guaranteed by system generated predicates added to the queries.  相似文献   

10.
支持多领域动态数据集成的数据库网格系统   总被引:5,自引:0,他引:5  
申德荣  于戈  聂铁铮  寇月 《软件学报》2006,17(11):2302-2313
随着公有数据库资源的丰富,广泛分布的用户希望能够按需地、透明地访问和使用这些丰富的数据资源.DS_Grid(database grid)是一个采用SOA(service-oriented architecture)思想、支持多应用领域数据共享的数据库网格系统.系统采用一种P2P(peer-to-peer)多Chord(MultiChord)网格体系结构,实现数据资源的分布存储、查询处理和动态数据集成;基于文本相似性,可分领域地注册数据资源,实现资源的快速发现;根据领域本体知识和推理规则,实现基于语义的智能查询;采用多根节点多点维护的数据资源副本管理机制,提高系统可靠性;基于关键字过滤的数据集成策略,减少通信代价;采用分布式聚类技术,实现大数据量信息的概要显示.通过实验验证了DS_Grid中所采用的关键技术的可行性和有效性.  相似文献   

11.
12.
Providing integrated access to multiple, distributed, heterogeneous databases and other information sources has become one of the leading issues in database research and the industry. One of the most effective approaches is to extract and integrate information of interest from each source in advance and store them in a centralized repository (known as a data warehouse). When a query is posed, it is evaluated directly at the warehouse without accessing the original information sources. One of the techniques that this approach uses to improve the efficiency of query processing is materialized view(s). Essentially, materialized views are used for data warehouses, and various methods for relational databases have been developed. In this paper, we first discuss an object deputy approach to realize materialized object views for data warehouses which can also incorporate object-oriented databases. A framework has been developed using Smalltalk to prepare data for data warehousing, in which an object deputy model and database connecting tools have been implemented. The object deputy model can provide an easy-to-use way to resolve inconsistency and conflicts while preparing data for data warehousing, as evidenced by our empirical study.  相似文献   

13.
董书暕  汪璟玢  陈远 《计算机科学》2016,43(3):220-224, 230
为了解决HMSST(HashMapSelectivityStrategyTree)算法在集中式环境下受限于有限内存的问题,提出了一种新的分布式SPARQL查询优化算法HMSST+。该算法基于Redis提出了一种分布式存储方案,通过平行扩展存储节点和分布式调度,使得海量RDF数据的查询得以在分布集群的内存中实现。采用LUBM1000所大学的测试数据集对查询策略进行了实验,结果表明提出的方法与HMSST算法相比具有更好的扩展能力,与现有的分布式查询方案相比也具有更好的查询效率。  相似文献   

14.
异构数据源集成系统旨在为用户提供一个一致的访问接口,由于参与集成的各数据源不仅高度自治、模式各异、更新频繁,而且查询功能有各自特殊的限制,给查询处理过程中数据源定位和查询优化造成一定的困难。本文在分析异构集成系统特征和功能需求的基础上,提出一种基于KQML的数据源能力描述框架,为各数据源灵活动态的发布自身能力提供保证。进而通过形式化的规范描述刻画数据源的结构特征和行为特征,为定位查询相关数据源奠定基础.并有助于全局查询处理器对查询计划进行优化,缩减查询的搜索空间,提高查询效率。  相似文献   

15.
This paper presents an approach to query decomposition in a multidatabase environment. The unique aspect of this approach is that it is based on performing transformations over an object algebra that can be used as the basis for a global query language. In the paper, we first present our multidatabase environment and semantic framework, where a global conceptual schema based on the Object Data Management Group standard encompasses the information from heterogeneous data sources that include relational databases as well as object-oriented databases and flat file sources. The meta-data about the global schema is enhanced with information about virtual classes as well as virtual relationships and inheritance hierarchies that exist between multiple sources. The AQUA object algebra is used as the formal foundation for manipulation of the query expression over the multidatabase. AQUA is enhanced with distribution operators for dealing with data distribution issues. During query decomposition we perform an extensive analysis of traversals for path expressions that involve virtual relationships and hierarchies for access to several heterogeneous sources. The distribution operators defined in algebraic terms enhance the global algebra expression with semantic information about the structure, distribution, and localization of the data sources relevant to the solution of the query. By using an object algebra as the basis for query processing, we are able to define algebraic transformations and exploit rewriting techniques during the decomposition phase. Our use of an object algebra also provides a formal and uniform representation for dealing with an object-oriented approach to multidatabase query processing. As part of our query processing discussion, we include an overview of a global object identification approach for relating semantically equivalent objects from diverse data sources, illustrating how knowledge about global object identity is used in the decomposition and assembly processes.  相似文献   

16.
Boolean query mapping across heterogeneous information sources   总被引:5,自引:0,他引:5  
Searching over heterogeneous information sources is difficult because of the nonuniform query languages. Our approach is to allow a user to compose Boolean queries in one rich front end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final result. We introduce the architecture and associated algorithms for generating the supported subsuming queries and filters. We show that generated subsuming queries return a minimal number of documents; we also discuss how minimal cost filters can be obtained. We have implemented prototype versions of these algorithms and demonstrated them on heterogeneous Boolean systems  相似文献   

17.
针对集中式系统的查询功能存在设计复杂、查询速度慢、时效性差等问题,提出一种建立在分布式数据源基础上的通用解决方案,将大量数据进行分发,并采用反向的数据访问方式,以提高查询效率,增强扩展性,降低程序复杂度。实现一种可继承的组件装配式查询方案,可以简化开发部署工作,对不断变化的各种形式的业务需求做出快速响应。  相似文献   

18.
多媒体会议系统分层通信结构及共算法研究   总被引:6,自引:0,他引:6  
目前,多媒体会议系统的通信结构主要分为两种:集中式通信结构和全分布式通信结构。在远程多媒体会议系统中,这两种通信结构都很难满足多媒体会议系统的实时性要求。针对这两种通信结构的缺点与不足,该文提出了一种新的通信结构:分层通信结构。这种通信结构可有效减少远程多媒体会议系统的时间延迟。对于不同的媒体混合技术,我们分别给出了构造其最佳分层通信结构的方法及相应算法。实验数据表明,:相对于集中式通信结构和全分布式通信结构而言,分层通信结构可以有效减少远程多媒体会议系统的时间延尺,从而更好的满足远程多媒体会议系统的实时性要求。  相似文献   

19.
在众多以手机呼叫记录(Call Detail Record,CDR)为数据源的分析研究和挖掘应用中,相似用户查询作为基础研究方法占据着重要地位。传统的查询算法多为集中式处理,然而CDR数据分布产生和存储的本质带来了相似用户分布式查询的问题。本文结合真实数据集,分析用户存储于各基站的局部数据与全局数据的关系,提出并实现基于局部呼叫数据建模的相对相似用户分布式查询方法(Rsu-DQ),使用真实数据设计实验验证所提出方法的准确性和高效性。  相似文献   

20.
Searching information through the Internet often requires users to separately contact several digital libraries, use each library interface to author the query, analyze retrieval results and merge them with results returned by other libraries. Such a solution could be simplified by using a centralized server that acts as a gateway between the user and several distributed repositories: The centralized server receives the user query, forwards the user query to federated repositories—possibly translating the query in the specific format required by each repository—and fuses retrieved documents for presentation to the user. To accomplish these tasks efficiently, the centralized server should perform some major operations such as: resource selection, query transformation and data fusion. In this paper we report on some aspects of MIND, a system for managing distributed, heterogeneous multimedia libraries (MIND, 2001, http://www.mind-project.org). In particular, this paper focusses on the issue of fusing results returned by different image repositories. The proposed approach is based on normalization of matching scores assigned to retrieved images by individual libraries. Experimental results on a prototype system show the potential of the proposed approach with respect to traditional solutions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号