期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于结构化P2P的分布式数据流系统的查询处理模型

刘云生赵海谊《计算机应用研究》2007,24(12):74-76

分析了基于结构化覆盖网的分布式查询处理模型,支持大量数据流的分布式存储,连续查询间、查询内的并行处理操作,能够在很大程度上消除资源约束问题（主要是内存）,提高了查询性能、服务质量,并且该查询模型具有很好的扩展性。相似文献

2.

一种改进的分布式数据流查询操作放置算法

下载免费PDF全文

柴宝杰《计算机工程与应用》2008,44(8):183-186

在分布式数据流管理系统中,需要将查询操作放置到不同的处理结点执行。因此,如何放置查询操作成为分布式数据流管理研究的核心问题。Peter等人提出一种基于时延空间和弹簧张弛技术的查询操作放置算法,但是该算法假设查询操作之间数据流的流速不变,没有考虑数据流的流速与数据流查询操作之间的相关性。为此,通过分析不同的数据流查询操作与其输出的数据流的流速之间的关系,对Peter等人提出的算法加以改进,实验结果表明,改进后的算法可以有效地应用于分布式数据流管理系统。相似文献

3.

大数据分析下分布式数据流处理技术研究

《软件工程师》2019,(12):44-46

由于数据流的不稳定性,将数据流查询安排在固定节点上就会造成分布式数据流处理技术很难对计算资源实现较高的处理效率,基于此,提出大数据分析下分布式数据流处理技术研究。具体流程是数据收集、历史数据的存储和查询、Storm实时处理、智能索引、数据模型的建立。根据实验结果可知,本文提出的大数据分析下分布式数据流处理技术与传统技术相比,在数据流的处理效率上占有较大优势,一般维持在75%以上,能够大大节省处理时间。相似文献

4.

一种基于动态修正值的分布数据流Top-K查询处理算法

刘维弋金远平《计算机应用与软件》2009,26(1)

对分布式数据流进行查询,得到数值最大的K个对象(Top-K观测查询),最直接的解决方法是由中心结点处理分布式数据流,但这种方法导致中心结点和网络负载较大.提出一种基于动态修正值的查询算法,通过对观测数据进行计算得到修正值,并利用该修正值对不同结点处的对象数据进行操作,从而无需将结点数据流全部发送到中心结点就能完成Top-K观测查询.因而可以减少对网络带宽的要求和降低中心结点的负载,同时还能保持查询结果的完全准确. 相似文献

5.

基于语义路由的P2P信息检索 总被引：5，自引：1，他引：5

叶春葛燧和熊齐邦《计算机仿真》2004,21(10):143-145

高效、稳定的P2P信息检索机制已经成为了研究的热点。现有的搜索方法包括使用广播方式或者分布式哈希表。基于分布式哈希表的方法虽然可以获得很好的查询性能，但是不支持近似查询，区间查询、而使用广播方式效率又低，引入语义路由，能很好地解决这些问题。语义路由是一种对广播搜索进行修剪的方法，将查询请求有选择的发送给能响应请求的节点。论文中介绍了语义路由机制，提出了基于该机制的P2P检索系统体系结构，最后使用NeuroGrid进行了仿真，实验结果显示使用语义路由可以提高查询效率。相似文献

6.

分布式数据流系统通信有效性研究综述

王爽杨广明王国仁《微型机与应用》2007,(Z1)

在分布式数据流环境中,系统的通信带宽是一种瓶颈资源。在保证查询精度的前提下,有效地减少网络中数据流的传输量是解决这一问题的重要途径。通过分析现有的分布式数据流处理算法,总结出一个通用处理框架,以减少数据流的传输量。通用处理框架包括三个方面:最小化信息传输、使用数据流摘要表示完整信息以及通过预测维持系统的稳定性。相似文献

7.

分布式数据流连接查询算法

下载免费PDF全文

刘学军钱江波《计算机工程》2006,32(21):41-43

分布式处理是数据流管理系统发展的必然趋势。文章研究了分布式数据流的连接查询，提出DM3Join算法，它由2部分组成：一是通过分解并发的连接请求，合并相同的连接谓词，形成分布式查询操作算子；二是数据流在各分布式代理(Agent)中流转实现部分连接，并在查询引擎处组合成最终结果。DM3Join算法采用了一种类似路由表的结构执行窗口连接，由于可以共享中间结果，算法只需扫描数据1遍。分析和实验证明，该连接算法是高效的。相似文献

8.

P2P环境下面向不确定数据的Top-k查询

孙永佼袁野王国仁《计算机学报》2011,34(11):2155-2164

分布式环境中的top-k查询已经有了广泛的研究.由于仪器不精确和网络延时等原因,大多数分布式数据都存在不确定性.文中基于水平分布在P2P网络中的不确定数据提出了一个有效的top-k查询处理方法.首先利用Quad-tree构建一个分布式的不确定数据的索引,并基于索引提出了一个空间剪枝算法.然后,根据局部top-k概率与全... 相似文献

9.

一种P2P环境下的B+树索引管理算法 总被引：4，自引：1，他引：3

鞠大鹏黎明胡进锋汪东升郑纬民马永泉《计算机研究与发展》2005,42(8):1438-1444

Peer-to-Peer（P2P）广域存储系统的分布式数据查询是其重要组成部分．其中对连续有序数据的查询还没有有效的算法．提出了一种在P2P环境下为连续有序数据建立分布式索引的算法——PB-link树．PB-link树具有可靠性高、吞吐率高、网络开销低、负载均衡的性质,比传统的分布式索引算法更能适应P2P环境．理论推导和实验数据证明,PB-link树算法的数据通信开销是传统分布式索引的20％,查询效率是其7倍．在承受整个系统中50％节点失效的情况下,仍可以保证85％查询的正确性,具有很强的可靠性．相似文献

10.

P2P数据管理 总被引：14，自引：1，他引：14

余敏李战怀张龙波《软件学报》2006,17(8):1717-1730

P2P(peer-to-peer)技术是未来重构分布式体系结构的关键技术,拥有广阔的应用前景.P2P系统的大多数问题都可归结为数据放置和检索问题,因此,P2P数据管理成为数据库领域活跃的研究课题.当前,P2P数据管理主要有信息检索、数据库查询和连续查询3个子领域,取得了许多研究成果.在介绍P2P技术的优点后,指出了P2P数据管理研究的目标.然后针对上述3个方面,论述P2P数据管理研究的现状,着重讨论了P2P数据库查询的索引构造策略、语义异构的解决方法、查询语义、查询处理策略、查询类型和查询优化技术.通过比较,指出了现状与目标的差距,提出了需要进一步研究的问题. 相似文献

11.

分布式流数据加载和查询技术优化

易佳薛晨王树鹏《计算机科学》2017,44(5):172-177

分布式流查询是一种基于数据流的实时查询计算方法,近年来得到了广泛的关注和快速发展。综述了分布式流处理框架在实时关系型查询上取得的研究成果;对涉及分布式数据加载、分布式流计算框架、分布式流查询的产品进行了分析和比较;提出了基于Spark Streaming和Apache Kafka构建的分布式流查询模型,以并发加载多个文件源的形式,设计内存文件系统实现数据的快速加载,相较于基于Apache Flume的加载技术提速1倍以上。在Spark Streaming的基础上,实现了基于Spark SQL的分布式流查询接口,并提出了自行编码解析SQL语句的方法,实现了分布式查询。测试结果表明,在查询语句复杂的情况下,自行编码解析SQL的查询效率具有明显的优势。相似文献

12.

不确定数据流上的并行Skyline查询算法

王广东王意洁李小勇王媛《计算机科学与探索》2012,(12):1116-1125

不确定数据流上的Skyline查询技术逐步引起研究者的关注,传统的集中式流处理算法难以满足海量数据的查询需求,并且云计算所提供的海量计算资源和有效的存储管理模式,为研究并行Skyline查询技术提供了充足的条件。基于上述事实,提出了一种不确定数据流上的并行Skyline查询算法(parallel Skyline over uncertain data streams,PSUDS)。该算法通过交叉划分滑动窗口的方式,将集中式流查询转化为并行处理,以并行执行的方式来解决集中式算法处理性能不足的问题。大量实验结果表明,该算法具有较好的并行可扩展性。相似文献

13.

Sliding window top-k dominating query processing over distributed data streams

Daichi Amagata Takahiro Hara Shojiro Nishio 《Distributed and Parallel Databases》2016,34(4):535-566

Preference query processing is important for a wide range of applications involving distributed databases, such as network monitoring, web-based systems, and market analysis. In such applications, data objects are generated frequently and massively, which presents an important and challenging problem of continuous query processing over distributed data stream environments. A top-k dominating query, which has been receiving much research attention recently, returns the k data objects that dominate the highest number of data objects in a given dataset, and due to its dominance-based ranking function, we can easily obtain superior data objects. An emerging requirement in distributed stream environments is an efficient technique for continuously monitoring top-k dominating data objects. Despite of this fact, no study has addressed this problem. In this paper, therefore, we address the problem of continuous top-k dominating query processing over distributed data stream environments. We present two algorithms that monitor the exact top-k dominating data and efficiently eliminate unqualified data objects for the result, which reduces both communication and computation costs. In addition to these algorithms, we present an approximate algorithm that further reduces both communication and computation costs. Extensive experiments on both synthetic and real data have demonstrated the efficiency and scalability of our algorithms. 相似文献

14.

Distributed stream join query processing with semijoins

Tri Minh Tran Byung Suk Lee 《Distributed and Parallel Databases》2010,27(3):211-254

This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively. 相似文献

15.

A United Framework for Large-Scale Resource Description Framework Stream Processing

下载免费PDF全文

Fang Hong Zhao Bo Zhang Xiao-Wang Yang Xuan-Xing 《计算机科学技术学报》2019,34(4):762-774

相似文献

16.

应对倾斜数据流在线连接方法

王春凯孟小峰《软件学报》2018,29(3):869-882

并行环境下的分布式连接处理要求制定划分策略以减少状态迁移和通信开销。相对于数据库管理系统而言,分布式数据流管理系统中的在线θ连接操作需要更高的计算成本和内存资源。基于完全二部图的连接模型可支持分布式数据流的连接操作。因为连接操作的每个关系仅存放于二部图模型的一侧处理单元,无需复制数据,且处理单元相互独立,因此该模型具有内存高效、易伸缩和可扩展等特性。然而,由于数据流速的不稳定性和属性值分布的不均衡性,导致倾斜数据流的连接操作易出现集群负载不均衡的现象。针对倾斜数据流的连接操作,模型无法动态分配查询节点,并需要人工干预数据分组的参数设置。尤其是应对全部历史数据的连接查询,模型效率更低。基于上述问题,提出了管理倾斜数据流连接的框架,使用基于键值和元组混合的划分样式有效应对二部图模型的各侧倾斜数据。并设计了重新动态分配查询节点的策略和状态迁移算法,以支持全历史数据的连接查询和自适应的资源管理。针对合成数据和真实数据的实验表明,该方案可有效应对倾斜数据的连接操作并进一步提升分布式数据流管理系统的吞吐率,特别是降低云环境中的计算成本。相似文献

17.

QoS-Aware Shared Component Composition for Distributed Stream Processing Systems

Repantis Thomas Gu Xiaohui Kalogeraki Vana 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(7):968-982

Many emerging online data analysis applications require applying continuous query operations such as correlation, aggregation, and filtering to data streams in real time. Distributed stream processing systems allow in-network stream processing to achieve better scalability and quality-of-service (QoS) provision. In this paper, we present Synergy, a novel distributed stream processing middleware that provides automatic sharing-aware component composition capability. Synergy enables efficient reuse of both result streams and processing components, while composing distributed stream processing applications with QoS demands. It provides a set of fully distributed algorithms to discover and evaluate the reusability of available result streams and processing components when instantiating new stream applications. Specifically, Synergy performs QoS impact projection to examine whether the shared processing can cause QoS violations on currently running applications. The QoS impact projection algorithm can handle different types of streams including both regular traffic and bursty traffic. If no existing processing components can be reused, Synergy dynamically deploys new components at strategic locations to satisfy new application requests. We have implemented a prototype of the Synergy middleware and evaluated its performance on both PlanetLab and simulation testbeds. The experimental results show that Synergy can achieve much better resource utilization and QoS provisioning than previously proposed schemes, by judiciously sharing streams and components during application composition. 相似文献

18.

数据流管理系统中适应性查询机制的研究

宋宝燕武珊珊于戈《计算机科学》2005,32(7):112-115

介绍了数据流技术的发展现状,然后讨论了适应性查询在数据管理中的发展演变,特别是在数据流管理中的特殊性。最后,在此基础上,提出了一个支持适应性查询的数据流管理系统RealStream,并详细介绍了其适应性查询处理机制。相似文献