期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A United Framework for Large-Scale Resource Description Framework Stream Processing

Fang Hong Zhao Bo Zhang Xiao-Wang Yang Xuan-Xing 《计算机科学技术学报》2019,34(4):762-774

相似文献

2.

Round-Eye: A system for tracking nearest surrounders in moving object environments

Ken C.K. Lee Josh Schiffman Baihua Zheng Wang-Chien Lee Hong Va Leong 《Journal of Systems and Software》2007,80(12):2063-2076

This paper presents “Round-Eye”, a system for tracking nearest surrounding objects (or nearest surrounders) in moving object environments. This system provides a platform for surveillance applications. The core part of this system is continuous nearest surrounder (NS) query that maintains views of the nearest objects at distinct angles from query points. This query differs from conventional spatial queries such as range queries and nearest neighbor queries as NS query considers both distance and angular aspects of objects with respect to a query point at the same time. In our system framework, a centralized server is dedicated (1) to collect location updates of both objects and queries, (2) to determine which NS queries are invalidated in presence of object/query location changes and corresponding result changes if any, and (3) to refresh the affected query answers. To enhance the system performance in terms of processing time and network bandwidth consumption, we propose various techniques, namely, safe region, partial query reevaluation, and incremental query result update. Through simulations, we evaluate our system with the proposed techniques over a wide range of settings. 相似文献

3.

PosDB: An Architecture Overview

G. A. Chernishev V. A. Galaktionov V. D. Grigorev E. S. Klyuchikov K. K. Smirnov 《Programming and Computer Software》2018,44(1):62-74

相似文献

4.

分布式流数据加载和查询技术优化

易佳薛晨王树鹏《计算机科学》2017,44(5):172-177

分布式流查询是一种基于数据流的实时查询计算方法,近年来得到了广泛的关注和快速发展。综述了分布式流处理框架在实时关系型查询上取得的研究成果;对涉及分布式数据加载、分布式流计算框架、分布式流查询的产品进行了分析和比较;提出了基于Spark Streaming和Apache Kafka构建的分布式流查询模型,以并发加载多个文件源的形式,设计内存文件系统实现数据的快速加载,相较于基于Apache Flume的加载技术提速1倍以上。在Spark Streaming的基础上,实现了基于Spark SQL的分布式流查询接口,并提出了自行编码解析SQL语句的方法,实现了分布式查询。测试结果表明,在查询语句复杂的情况下,自行编码解析SQL的查询效率具有明显的优势。相似文献

5.

MaD‐WiSe: a distributed stream management system for wireless sensor networks

Giuseppe Amato Stefano Chessa Claudio Vairo 《Software》2010,40(5):431-451

Wireless sensor networks (WSN) are composed of several sensors having limited memory, processing power, communication bandwidth, and energy, which cooperate in performing a given task. The use of the database paradigm has emerged in the last few years as a viable solution to manage data in such a context. In this paper we present the MaD‐WiSe system, a distributed query processing framework that moves the processing of the query into the network. MaD‐WiSe reconsiders various aspects related to database system design and it reinterprets them according to the WSN constraints and requirements. In particular it considers the aspects related to the definition of a query language to formalize the queries, a stream model to manage data acquired by the sensors, a query algebra to define the operators that actually perform the query, and energy efficiency and query optimization strategies for saving energy. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

6.

An intelligent query processing for distributed ontologies

Jihyun Lee Author Vitae Jun-Ki Min^{Author Vitae} 《Journal of Systems and Software》2010,83(1):85-95

In this paper, we propose an intelligent distributed query processing method considering the characteristics of a distributed ontology environment. We suggest more general models of the distributed ontology query and the semantic mapping among distributed ontologies compared with the previous works. Our approach rewrites a distributed ontology query into multiple distributed ontology queries using the semantic mapping, and we can obtain the integrated answer through the execution of these queries. Furthermore, we propose a distributed ontology query processing algorithm with several query optimization techniques: pruning rules to remove unnecessary queries, a cost model considering site load balancing and caching, and a heuristic strategy for scheduling plans to be executed at a local site. Finally, experimental results show that our optimization techniques are effective to reduce the response time. 相似文献

7.

Efficient Distributed Skyline Queries for Mobile Applications 总被引：3，自引：0，他引：3

下载免费PDF全文

Ying-Yuan Xiao 《计算机科学技术学报》2010,25(3):523-536

In this paper, we consider skyline queries in a mobile and distributed environment, where data objects are distributed in some sites (database servers) which are interconnected through a high-speed wired network, and queries are issued by mobile units (laptop, cell phone, etc.) which access the data objects of database servers by wireless channels. The inherent properties of mobile computing environment such as mobility, limited wireless bandwidth, frequent disconnection, make skyline queries more complicated. We show how to efficiently perform distributed skyline queries in a mobile environment and propose a skyline query processing approach, called efficient distributed skyline based on mobile computing (EDS-MC). In EDS-MC, a distributed skyline query is decomposed into five processing phases and each phase is elaborately designed in order to reduce the network communication, network delay and query response time. We conduct extensive experiments in a simulated mobile database system, and the experimental results demonstrate the superiority of EDS-MC over other skyline query processing techniques on mobile computing. 相似文献

8.

Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables

Rao Praveen R. Moon Bongki 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(12):1737-1752

One of the key challenges in a peer-to-peer (P2P) network is to efficiently locate relevant data sources across a large number of participating peers. With the increasing popularity of the extensible markup language (XML) as a standard for information interchange on the Internet, XML is commonly used as an underlying data model for P2P applications to deal with the heterogeneity of data and enhance the expressiveness of queries. In this paper, we address the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath. We have developed a new system called psiX that runs on top of an existing distributed hashing framework. Under the psiX system, each XML document is mapped into an algebraic signature that captures the structural summary of the document. An XML query pattern is also mapped into a signature. The query's signature is used to locate relevant document signatures. Our signature scheme supports holistic processing of query patterns without breaking them into multiple path queries and processing them individually. The participating peers in the network collectively maintain a collection of distributed hierarchical indexes for the document signatures. Value indexes are built to handle numeric and textual values in XML documents. These indexes are used to process queries with value predicates. Our experimental study on PlanetLab demonstrates that psiX provides an efficient location service in a P2P network for a wide variety of XML documents. 相似文献

9.

A framework for ranking uncertain distributed database

《Data & Knowledge Engineering》2014

Distribution and uncertainty are considered as the most important design issues in database applications nowadays. A lot of ranking or top-k query processing techniques are introduced to solve the problems of communication cost and centralized processing. On the other hand, many techniques are also developed for modeling and managing uncertain databases. Although these techniques were efficient, they didn't deal with distributed data uncertainty. This paper proposes a framework that deals with both data distribution and uncertainty based on ranking queries. Within the proposed framework, communication and computation-efficient algorithms are investigated for retrieving the top-k tuples from distributed sites. The main objective of these algorithms is to reduce the communication rounds utilized and amount of data transmitted while achieving efficient ranking. Experimental results show that both proposed techniques have a great impact in reducing communication cost. Both techniques are efficient but in different situations. The first one is efficient in the case of low number of sites while the other achieves better performance at higher number of sites. 相似文献

10.

Distributed processing of continuous sliding-window k-NN queries for data stream filtering

Kre?imir Pripu?i? Ivana Podnar ?arko Karl Aberer 《World Wide Web》2011,14(5-6):465-494

A sliding-window k-NN query (k-NN/w query) continuously monitors incoming data stream objects within a sliding window to identify k closest objects to a query. It enables effective filtering of data objects streaming in at high rates from potentially distributed sources, and offers means to control the rate of object insertions into result streams. Therefore k-NN/w processing systems may be regarded as one of the prospective solutions for the information overload problem in applications that require processing of structured data in real-time, such as the Sensor Web. Existing k-NN/w processing systems are mainly centralized and cannot cope with multiple data streams, where data sources are scattered over the Internet. In this paper, we propose a solution for distributed continuous k-NN/w processing of structured data from distributed streams. We define a k-NN/w processing model for such setting, and design a distributed k-NN/w processing system on top of the Content-Addressable Network (CAN) overlay. An extensive evaluation using both real and synthetic data sets demonstrates the feasibility of the proposed solution because it balances the load among the peers, while the messaging overhead within the P2P network remains reasonable. Moreover, our results clearly show the solution is scalable for an increasing number of queries and peers. 相似文献

11.

DHTJoin: processing continuous join queries using DHT networks

Wenceslao Palma Reza Akbarinia Esther Pacitti Patrick Valduriez 《Distributed and Parallel Databases》2009,26(2-3):291-317

Continuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incurring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic. 相似文献

12.

一种分布式环境中的二分式多层网格skyline算法

下载免费PDF全文

丁日强《计算机工程与应用》2013,49(18):116-119

skyline计算在数据挖掘、多标准决策和数据库可视化等领域有着非常重要的作用,这些年已经得到了广泛的关注,以往对于skyline查询的研究大多集中在处理集中的数据集上,即集中式skyline查询,已经得到了很多的研究成果。然而,实际情况是：相关数据几乎分散在几个不同的服务器上,因此在分布式环境中的skyline查询计算需要从各个服务器收集大量的数据;现有的在分布式环境中的skyline查询方法有两个主要问题：一是skyline查询的处理时间较慢;二是在网络中服务器之间传输了很多不必要的重叠数据。提出了一种二分式多层网格法（DMLG）,可以有效地处理在分布式环境中的skyline查询。该方法利用网格的方法,借鉴二分法,最大限度地减少了不必要的重叠数据传输,基于不同的数据集的实验表明,这种方法优于现有的方法。相似文献

13.

Query processing and inverted indices in shared-nothing text document information retrieval systems 总被引：1，自引：0，他引：1

Anthony Tomasic M.A. Ph.D. Candidate Hector Garcia-Molina Ph.D. 《The VLDB Journal The International Journal on Very Large Data Bases》1993,2(3):243-275

The performance of distributed text document retrieval systems is strongly influenced by the organization of the inverted text. This article compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine those variables that most strongly influence response time and throughput. This leads to a set of design trade-offs over a wide range of hardware configurations and new parallel query processing strategies. 相似文献

14.

结构化网络中聚合Top-K查询优化技术

李柰王斌关晶王国仁《小型微型计算机系统》2007,28(11):2033-2037

top-k查询在分布式环境中引起越来越多的关注,但是现存的一些top-k算法大都只适用于集中式网络.提出了一个解决分布式网络中top-k查询的新方法—Histogram-Container算法(简称为HC算法),它不仅网络延迟小,网络带宽花费少,而且能够运行在任何结构的分布式网络中.本文将基于一个树型拓扑网络来说明如何使用本地的直方图和bloom filter信息来优化查询,以及如何在中间节点进行部分结果的合并.实验评估和性能分析表明HC算法在网络带宽消耗和查询响应时间方面要优于其他同类方法. 相似文献

15.

Efficient range query processing in metric spaces over highly distributed data

Christos Doulkeridis Akrivi Vlachou Yannis Kotidis Michalis Vazirgiannis 《Distributed and Parallel Databases》2009,26(2-3):155-180

Similarity search in P2P systems has attracted a lot of attention recently and several important applications, like distributed image search, can profit from the proposed distributed algorithms. In this paper, we address the challenging problem of efficient processing of range queries in metric spaces, where data is horizontally distributed across a super-peer network. Our approach relies on SIMPEER (Doulkeridis et al. in Proceedings of VLDB, pp. 986–997, 2007), a framework that dynamically clusters peer data, in order to build distributed routing information at super-peer level. SIMPEER allows the evaluation of exact range and nearest neighbor queries in a distributed manner that reduces communication cost, network latency, bandwidth consumption and computational overhead at each individual peer. In this paper, we extend SIMPEER by focusing on efficient range query processing and providing recall-based guarantees for the quality of the result retrieved so far. This is especially useful for range queries that lead to result sets of high cardinality and incur high processing costs, while the complete result set becomes overwhelming for the user. Our framework employs statistics for estimating an upper limit of the number of possible results for a range query and each super-peer may decide not to propagate further the query and reduce the scope of the search. We provide an experimental evaluation of our framework and show that our approach performs efficiently, even in the case of high degree of distribution. 相似文献

16.

基于查询优化器的分布式空间查询优化方法

林键刘仁义刘南张丰《计算机工程与应用》2012,48(22):161-165

为了实现分布式空间数据库之间的互操作,需要对分布式查询进行优化处理,这种查询处理指的是在任何一个数据处理语句中它访问的是各个节点的数据而不是仅仅对发起查询的节点。提出了一种查询优化器的体系结构,针对上述查询最优化做了详细的讨论,着重讨论包含空间选择和连接的复杂空间查询。建立了典型的空间数据库的案例程序,通过分析表明,带有过滤和修正的查询优化器在时间与空间上的效率优势比较明显,获得了具有参考价值的结果。相似文献

17.

Query processing over incomplete autonomous databases: query rewriting using learned data dependencies

Garrett Wolf Aravind Kalavagattu Hemal Khatri Raju Balakrishnan Bhaumik Chokshi Jianchun Fan Yi Chen Subbarao Kambhampati 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(5):1167-1190

Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to a user query. Ideally we would like the mediator to retrieve such possibleanswers and gauge their relevance by accessing their likelihood of being pertinent answers to the query. The autonomous nature of web databases poses several challenges in realizing this objective. Such challenges include the restricted access privileges imposed on the data, the limited support for query patterns, and the bounded pool of database and network resources in the web environment. We introduce a novel query rewriting and optimization framework QPIAD that tackles these challenges. Our technique involves reformulating the user query based on mined correlations among the database attributes. The reformulated queries are aimed at retrieving the relevant possibleanswers in addition to the certain answers. QPIAD is able to gauge the relevance of such queries allowing tradeoffs in reducing the costs of database query processing and answer transmission. To support this framework, we develop methods for mining attribute correlations (in terms of Approximate Functional Dependencies), value distributions (in the form of Naïve Bayes Classifiers), and selectivity estimates. We present empirical studies to demonstrate that our approach is able to effectively retrieve relevant possibleanswers with high precision, high recall, and manageable cost. 相似文献

18.

Para-G: Path pattern query processing on large graphs

Yiyuan Bai Chaokun Wang Xiang Ying 《World Wide Web》2017,20(3):515-541

There are plentiful and diverse applications of graph data management and mining techniques in the real-world scientific research and business activities. As one of the most basic operations, uniform path pattern query processing on graph data faces three big challenges. In this paper, we deal with these challenges by the following points. Firstly, a new query language on graph, called G-Path, is presented, which focuses on complex path pattern query processing on a very large graph. Also, the design of a system called Para-G is proposed, which is based on a BSP-like model as well as MapReduce model, and can effectively handle distributed graph data operations and queries. Secondly, the implementation of Para-G on the de facto cloud platform — Hadoop — is brought forward. Based on the concept of distributed path finite state automaton, the query processing of a G-Path statement in Para-G is detailed. In addition, as the query optimization of G-Path queries, several tricks are utilized to dramatically improve the performance of query execution. Finally, extensive experiments on several graph data sets are conducted to show the usability of the G-Path query language and the effectiveness of Para-G. 相似文献

19.

Form-based proxy caching for database-backed web sites: keywords and functions 总被引：1，自引：0，他引：1

Qiong Luo Jeffrey F. Naughton Wenwei Xue 《The VLDB Journal The International Journal on Very Large Data Bases》2008,17(3):489-513

Web caching proxy servers are essential for improving web performance and scalability, and recent research has focused on making proxy caching work for database-backed web sites. In this paper, we explore a new proxy caching framework that exploits the query semantics of HTML forms. We identify two common classes of form-based queries from real-world database-backed web sites, namely, keyword-based queries and function-embedded queries. Using typical examples of these queries, we study two representative caching schemes within our framework: (i) traditional passive query caching, and (ii) active query caching, in which the proxy cache can service a request by evaluating a query over the contents of the cache. Results from our experimental implementation show that our form-based proxy is a general and flexible approach that efficiently enables active caching schemes for database-backed web sites. Furthermore, handling query containment at the proxy yields significant performance advantages over passive query caching, but extending the power of the active cache to do full semantic caching appears to be less generally effective. 相似文献

20.

Effective query aggregation for data services in sensor networks 总被引：1，自引：0，他引：1

Wei Thang Nam Jangwon Dong 《Computer Communications》2006,29(18):3733-3744

Providing efficient data services has been required by many sensor network applications. While most existing work in this area focuses on data aggregation, not much attention has been paid to query aggregation. For many applications, especially ones with high query rates, query aggregation is very important. In this paper, we study a query aggregation-based approach to provide efficient data services. In particular: (1) we propose a multi-layer overlay-based framework consisting of a query manager and access points (nodes), where the former provides the query aggregation plan and the latter executes the plan; (2) we design an effective query aggregation algorithm to reduce the number of duplicate/overlapping queries and save overall energy consumption in the sensor network. We also design protocols to effectively deliver aggregated queries and query results in the sensor network. Our performance evaluations show that by applying our query aggregation algorithm, the overall energy consumption can be significantly reduced and the sensor network lifetime can be prolonged correspondingly. 相似文献