首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
We present the design of ObjectGlobe, a distributed and open query processor for Internet data sources. Today, data is published on the Internet via Web servers which have, if at all, very localized query processing capabilities. The goal of the ObjectGlobe project is to establish an open marketplace in which data and query processing capabilities can be distributed and used by any kind of Internet application. Furthermore, ObjectGlobe integrates cycle providers (i.e., machines) which carry out query processing operators. The overall picture is to make it possible to execute a query with – in principle – unrelated query operators, cycle providers, and data sources. Such an infrastructure can serve as enabling technology for scalable e-commerce applications, e.g., B2B and B2C market places, to be able to integrate data and data processing operations of a large number of participants. One of the main challenges in the design of such an open system is to ensure privacy and security. We discuss the ObjectGlobe security requirements, show how basic components such as the optimizer and runtime system need to be extended, and present the results of performance experiments that assess the additional cost for secure distributed query processing. Another challenge is quality of service management so that users can constrain the costs and running times of their queries. Received: 30 October 2000 / Accepted: 14 March 2001 Published online: 7 June 2001  相似文献   

2.
Approximate query mapping: Accounting for translation closeness   总被引:2,自引:0,他引:2  
In this paper we present a mechanism for approximately translating Boolean query constraints across heterogeneous information sources. Achieving the best translation is challenging because sources support different constraints for formulating queries, and often these constraints cannot be precisely translated. For instance, a query [score>8] might be “perfectly” translated as [rating>0.8] at some site, but can only be approximated as [grade=A] at another. Unlike other work, our general framework adopts a customizable “closeness” metric for the translation that combines both precision and recall. Our results show that for query translation we need to handle interdependencies among both query conjuncts as well as disjuncts. As the basis, we identify the essential requirements of a rule system for users to encode the mappings for atomic semantic units. Our algorithm then translates complex queries by rewriting them in terms of the semantic units. We show that, under practical assumptions, our algorithm generates the best approximate translations with respect to the closeness metric of choice. We also present a case study to show how our technique may be applied in practice. Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001  相似文献   

3.
Symbolic images are composed of a finite set of symbols that have a semantic meaning. Examples of symbolic images include maps (where the semantic meaning of the symbols is given in the legend), engineering drawings, and floor plans. Two approaches for supporting queries on symbolic-image databases that are based on image content are studied. The classification approach preprocesses all symbolic images and attaches a semantic classification and an associated certainty factor to each object that it finds in the image. The abstraction approach describes each object in the symbolic image by using a vector consisting of the values of some of its features (e.g., shape, genus, etc.). The approaches differ in the way in which responses to queries are computed. In the classification approach, images are retrieved on the basis of whether or not they contain objects that have the same classification as the objects in the query. On the other hand, in the abstraction approach, retrieval is on the basis of similarity of feature vector values of these objects. Methods of integrating these two approaches into a relational multimedia database management system so that symbolic images can be stored and retrieved based on their content are described. Schema definitions and indices that support query specifications involving spatial as well as contextual constraints are presented. Spatial constraints may be based on both locational information (e.g., distance) and relational information (e.g., north of). Different strategies for image retrieval for a number of typical queries using these approaches are described. Estimated costs are derived for these strategies. Results are reported of a comparative study of the two approaches in terms of image insertion time, storage space, retrieval accuracy, and retrieval time. Received June 12, 1998 / Accepted October 13, 1998  相似文献   

4.
Active rules for XML: A new paradigm for E-services   总被引:1,自引:0,他引:1  
XML is rapidly becoming one of the most widely adopted technologies for information exchange and representation. As the use of XML becomes more widespread, we foresee the development of active XML rules, i.e., rules explicitly designed for the management of XML information. In particular, we argue that active rules for XML offer a natural paradigm for the rapid development of innovative e-services. In the paper, we show how active rules can be specified in the context of XSLT, a pattern-based language for publishing XML documents (promoted by the W3C) which is receiving strong commercial support, and Lorel, a query language for XML documents that is quite popular in the research world. We demonstrate, through simple examples of active rules for XSLT and Lorel, that active rules can be effective for the implementation of e-commerce services. We also discuss the various issues that need to be considered in adapting the notion of relational triggers to the XML context. Received: 30 October 2000 / Accepted: 19 December 2000 Published online: 27 April 2001  相似文献   

5.
In this paper, we re-examine the results of prior work on methods for computing ad hoc joins. We develop a detailed cost model for predicting join algorithm performance, and we use the model to develop cost formulas for the major ad hoc join methods found in the relational database literature. We show that various pieces of “common wisdom” about join algorithm performance fail to hold up when analyzed carefully, and we use our detailed cost model to derive op timal buffer allocation schemes for each of the join methods examined here. We show that optimizing their buffer allocations can lead to large performance improvements, e.g., as much as a 400% improvement in some cases. We also validate our cost model's predictions by measuring an actual implementation of each join algorithm considered. The results of this work should be directly useful to implementors of relational query optimizers and query processing systems. Edited by M. Adiba. Received May 1993 / Accepted April 1996  相似文献   

6.
Similarity queries on complex objects are usually translated into searches among their feature vectors. This paper studies indexing techniques for very high-dimensional (e.g., in hundreds) vectors that are sparse or quasi-sparse, i.e., vectors each having only a small number (e.g., ten) of non-zero or significant values. Based on the R-tree, the paper introduces the xS-tree that uses lossy compression of bounding regions to guarantee a reasonable minimum fan-out within the allocated storage space for each node. In addition, the paper studies the performance and scalability of the xS-tree via experiments. Received: 3 May 1999 / Accepted: 23 October 2000 Published online: 27 April 2001  相似文献   

7.
Summary. When designing distributed systems, one is faced with the problem of verifying a refinement between two specifications, given at different levels of abstraction. Suggested verification techniques in the literature include refinement mappings and various forms of simulation. We present a verification method, in which refinement between two systems is proven by constructing a transducer that inputs a computation of a concrete system and outputs a matching computation of the abstract system. The transducer uses a FIFO queue that holds segments of the concrete computation that have not been matched yet. This allows a finite delay between the occurrence of a concrete event and the determination of the corresponding abstract event. This delay often makes the use of prophecy variables or backward simulation unnecessary. An important generalization of the method is to prove refinement modulo some transformation on the observed sequences of events. The method is adapted by replacing the FIFO queue by a component that allows the appropriate transformation on sequences of events. A particular case is partial-order refinement, i.e., refinement that preserves only a subset of the orderings between events of a system. Examples are sequential consistency and serializability. The case of sequential consistency is illustrated on a proof of sequential consistency of a cache protocol.  相似文献   

8.
基于Z39.50的联机书目检索服务   总被引:7,自引:0,他引:7  
杨晓江  张福炎 《软件学报》1999,10(8):824-828
联机书目检索服务是在网络环境下图书馆应当向读者提供的重要服务.基于Telnet或者Web的联机书目检索服务存在检索接口不一致的问题,而Z39.50协议为这种问题提供了解决办法.文章描述了一个基于Z39.50的联机书目检索服务系统.该系统支持中文检索和包括CNMARC在内的多种MARC(machine-readable cataloguing)类型,具有灵活的可配置性和可伸缩性.文章还给出了系统的主要技术设计,并与已有的相关系统进行了简单的比较.  相似文献   

9.
Recent advances in computer technologies have made it feasible to provide multimedia services, such as news distribution and entertainment, via high-bandwidth networks. The storage and retrieval of large multimedia objects (e.g., video) becomes a major design issue of the multimedia information system. While most other works on multimedia storage servers assume an on-line disk storage system, we consider a two-tier storage architecture with a robotic tape library as the vast near-line storage and an on-line disk system as the front-line storage. Magnetic tapes are cheaper, more robust, and have a larger capacity; hence, they are more cost effective for large scale storage systems (e.g., video-on-demand (VOD) systems may store tens of thousands of videos). We study in detail the design issues of the tape subsystem and propose some novel tape-scheduling algorithms which give faster response and require less disk buffer space. We also study the disk-striping policy and the data layout on the tape cartridge in order to fully utilize the throughput of the robotic tape system and to minimize the on-line disk storage space.  相似文献   

10.
To efficiently support automated interoperability between ontology-based information systems in distributed environments, the semantic heterogeneity problem has to be dealt with. To do so, traditional approaches have acquired and employed explicit mappings between the corresponding ontologies. Usually these mappings can be only obtained from human domain experts. However, it is too expensive and time-consuming to collect all possible mapping results on distributed information systems. More seriously, as the number of systems in a large-scale peer-to-peer (P2P) network increases, the efficiency of the ontology mapping is exponentially decreased. Thereby, in this paper, we propose a novel semantic P2P system, which is capable of (i) sharing and exchanging existing mappings among peers, and (ii) composing shared mappings to build a certain path between two systems. Given two arbitrary peers (i.e., source and destination), the proposed system can provide indirect ontology mappings to make them interoperable. In particular, we have focused on query-based communication for evaluating the proposed ontology mapping composition system. Once direct ontology mappings are collected from candidate peers, a given query can be (i) segmented into a set of sub-queries, and (ii) transformed to another query. With respect to the precision performance, our experimentation has shown an improvement of about 42.5% compared to the keyword-based query searching method.  相似文献   

11.
We are interested in defining and querying views in a huge and highly heterogeneous XML repository (Web scale). In this context, view definitions are very large, involving lots of sources, and there is no apparent limitation to their size. This raises interesting problems that we address in the paper: (i) how to distribute views over several machines without having a negative impact on the query translation process; (ii) how to quickly select the relevant part of a view given a query; (iii) how to minimize the cost of communicating potentially large queries to the machines where they will be evaluated. The solution that we propose is based on a simple view definition language that allows for automatic generation of views. The language maps paths in the view abstract DTD to paths in the concrete source DTDs. It enables a distributed implementation of the view system that is scalable both in terms of data and load. In particular, the query translation algorithm is shown to have a good (linear) complexity. Received: November 1, 2001 / Accepted: March 2, 2002 Published online: September 25, 2002  相似文献   

12.
Answering queries using views: A survey   总被引:25,自引:0,他引:25  
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results. Received: 1 August 1999 / Accepted: 23 March 2001 Published online: 6 September 2001  相似文献   

13.
Data integration with uncertainty   总被引:1,自引:0,他引:1  
This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, the data from the sources may be extracted using information extraction techniques and so may yield erroneous data. Third, queries to the system may be posed with keywords rather than in a structured form. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we do not know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of probabilistic schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting. Finally, we consider using probabilistic mappings in the scenario of data exchange.  相似文献   

14.
In many decision-making scenarios, decision makers require rapid feedback to their queries, which typically involve aggregates. The traditional blocking execution model can no longer meet the demands of these users. One promising approach in the literature, called online aggregation, evaluates an aggregation query progressively as follows: as soon as certain data have been evaluated, approximate answers are produced with their respective running confidence intervals; as more data are examined, the answers and their corresponding running confidence intervals are refined. In this paper, we extend this approach to handle nested queries with aggregates (i.e., at least one inner query block is an aggregate query) by providing users with (approximate) answers progressively as the inner aggregation query blocks are evaluated. We address the new issues pose by nested queries. In particular, the answer space begins with a superset of the final answers and is refined as the aggregates from the inner query blocks are refined. For the intermediary answers to be meaningful, they have to be interpreted with the aggregates from the inner queries. We also propose a multi-threaded model in evaluating such queries: each query block is assigned to a thread, and the threads can be evaluated concurrently and independently. The time slice across the threads is nondeterministic in the sense that the user controls the relative rate at which these subqueries are being evaluated. For enumerative nested queries, we propose a priority-based evaluation strategy to present answers that are certainly in the final answer space first, before presenting those whose validity may be affected as the inner query aggregates are refined. We implemented a prototype system using Java and evaluated our system. Results for nested queries with a level and multiple levels of nesting are reported. Our results show the effectiveness of the proposed mechanisms in providing progressive feedback that reduces the initial waiting time of users significantly without sacrificing the quality of the answers. Received April 25, 2000 / Accepted June 27, 2000  相似文献   

15.
Advances in mobile devices and wireless telecommunication infrastructure already provide mobile users with access to online information sources and services. Compared to the PC world, however, mobile access is still quite restricted, especially with regard to the display of graphical representations, such as images, drawings, diagrams, maps and logos. Since graphical representations are increasingly used in the World Wide Web for the purpose of information presentation, the adaptation of graphics for tiny displays is a challenge that should not be neglected. The current contribution discusses several transformation approaches which might be employed to accomplish this adaptation task.  相似文献   

16.
The Human-Computer Interaction Laboratory (HCIL) of the University of Maryland and NASA have collaborated over three years to refine and apply user interface research concepts developed at HCIL in order to improve the usability of NASA data services. The research focused on dynamic query user interfaces, visualization, and overview + preview designs. An operational prototype, using query previews, was implemented with NASA’s Global Change Master Directory (GCMD), a directory service for earth science datasets. Users can see the histogram of the data distribution over several attributes and choose among attribute values. A result bar shows the cardinality of the result set, thereby preventing users from submitting queries that would have zero hits. Our experience confirmed the importance of metadata accuracy and completeness. The query preview interfaces make visible the problems or gaps in the metadata that are undetectable with classic form fill-in interfaces. This could be seen as a problem, but we think that it will have a long-term beneficial effect on the quality of the metadata as data providers will be compelled to produce more complete and accurate metadata. The adaptation of the research prototype to the NASA data required revised data structures and algorithms. Received: 12 December 1997 / Revised: June 1999  相似文献   

17.
Locating and accessing data repositories with WebSemantics   总被引:1,自引:0,他引:1  
Many collections of scientific data in particular disciplines are available today on the World Wide Web. Most of these data sources are compliant with some standard for interoperable access. In addition, sources may support a common semantics, i.e., a shared meaning for the data types and their domains. However, sharing data among a global community of users is still difficult because of the following reasons: (i) data providers need a mechanism for describing and publishing available sources of data; (ii) data administrators need a mechanism for discovering the location of published sources and obtaining metadata from these sources; and (iii) users need a mechanism for browsing and selecting sources. This paper describes a system, WebSemantics, that accomplishes the above tasks. We describe an architecture for the publication and discovery of scientific data sources, which is an extension of the World Wide Web architecture and protocols. We support catalogs containing metadata about data sources for some application domain. We define a language for discovering sources and querying their metadata. We then describe the WebSemantics prototype. Edited by H. Korth. Received: 15 July 1999 / Accepted: 13 September 2000 Published online: 16 April 2002  相似文献   

18.
异构数据源集成系统需要集成查询能力差别较大的各种数据源。为充分利用数据源的查询能力和简化包装器设计,本文提出使用复合法进行基于能力的查询分解和优化。在复合法中,仅需简单说明数据的基本查询能力,各数据源对查询处理的特殊限制封装在包装器中。文中给出描述数据源基本查询能力的方法,并且在讨论OIM对象代数操作的概括查询和过滤子的基础上,探讨数据源包装器对基本支持查询的处理方法。  相似文献   

19.
Self-stabilizing depth-first token circulation in arbitrary rooted networks   总被引:1,自引:0,他引:1  
Abstract. We present a deterministic distributed depth-first token passing protocol on a rooted network. This protocol uses neither the processor identifiers nor the size of the network, but assumes the existence of a distinguished processor, called the root of the network. The protocol is self-stabilizing, meaning that starting from an arbitrary state (in response to an arbitrary perturbation modifying the memory state), it is guaranteed to reach a state with no more than one token in the network. Our protocol implements a strictly fair token circulation scheme. The proposed protocol has extremely small state requirement – only states per processor, i.e., bits per processor, where is the degree of the network. The protocol can be used to implement a strictly fair distributed mutual exclusion in any rooted network. This protocol can also be used to construct a DFS spanning tree. Received: July 1998 / Accepted: April 2000  相似文献   

20.
We consider the task of assigning distinct labels to nodes of an unknown anonymous network in a distributed manner. A priori, nodes do not have any identities, except for one distinguished node, called the source, and do not know the topology or the size of the network. They execute identical algorithms, apart from the source which plays the role of a leader and starts the labeling process. Our goal is to assign short labels, as fast as possible. The quality of a labeling algorithm is measured by the range from which the algorithm picks the labels, or alternatively, the length of the assigned labels. Natural efficiency measures are the time, i.e., the number of rounds required for the label assignment, and the message and bit complexities of the label assignment protocol, i.e., the total number of messages (resp., bits) circulating in the network. We present label assignment algorithms whose time and message complexity are asymptotically optimal and which assign short labels. On the other hand, we establish inherent trade-offs between quality and efficiency for labeling algorithms. Received: July 2000 / Accepted: February 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号