首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
陈刚 《计算机时代》2013,(7):11-12,16
基于稳定网的数据流连接算法已有很多研究成果,但在实际应用中,还需要处理不同速率网络下的查询连接,这使得目前流行的基于稳定数据流且内存分配固定环境下的连接算法难以适用。介绍了一种在非对称数据率网络下的无阻塞排序归并连接算法SMA。SMA算法的连接运算分为两阶段:join during run creation和join during merge,第一阶段可用于网络无阻塞情况下通过内存刷新策略来生成头批连接结果,第二阶段用于数据源受阻时借助外存驻留数据继续生成查询连接结果,从而保证了连接结果产生的无阻塞性。试验证明,SMA对等值和空间连接效率很高。  相似文献   

2.
字符串相似连接操作具有广泛应用,因而将着重研究基于编辑距离的字符串相似连接.而现有的字符串相似连接算法大多为内存算法.实际应用中的数据集越来越大,有必要针对超大规模数据集研制字符串相似性连接外存算法.利用组合频率向量划分数据集,并提出了基于编辑距离的字符串相似性连接外存算法框架,证明了磁盘调度问题的难度并提出了不同的启发式磁盘调度方法.此外,还提出了基于该外存算法框架实现字符串相似性连接增量式计算的方法.实验结果表明,数据划分方法可以有效地过滤不相关的数据子集;磁盘调度算法能够有效减少磁盘IO次数;外存算法是高效的;增量式计算方法能够高效地处理数据更新.  相似文献   

3.
Top-K查询是一种被广泛应用的操作,它根据给定的评分函数在潜在的海量数据中返回k个分值最高的元组。传统的TA算法要求能够支持随机读,NRA算法虽然放宽了对随机读的限制,但是增长阶段需要在内存中维护大量的元组,运行时将占用大量的内存资源。提出的RRTA算法相比NRA算法对数据的存储进行了重新的规划,创建一个新的表将内存上的开销转换到较廉价的外存开销,只需顺序读取就可以进行有效的Top-K查询,同时将表进行了划分,在并行处理的情况下更能提高程序的效率,能够很好地运行在内存有限的环境中。  相似文献   

4.
支持实时内存数据库不间断服务的恢复技术   总被引:1,自引:0,他引:1       下载免费PDF全文
实时内存数据库的目标是尽量不使用I/O操作,但在实时内存数据库的恢复子系统中,有两类I/O——刷新内存中的数据到外存和写日志无法避免。该文提出一个双机系统,无须直接写数据到外存,当一个节点失效时,仍能提供不间断服务,且在恢复的过程,对整个系统的性能基本没有影响。  相似文献   

5.
基于内存池的空间数据调度算法   总被引:4,自引:0,他引:4       下载免费PDF全文
计算机处理海量空间数据时,内外存之间数据的频繁交互导致内存占用高、处理效率低。使用内存池方式调度空间数据可以提高计算机效率。在多种特定地图使用模式下,不同的内存池页面置换算法能有效降低操作过程中的内外存交互,提高空间数据调度效率。实验表明,该算法为内存容量有限的嵌入式设备上的GIS提出了高效处理空间数据的方案。  相似文献   

6.
《软件》2016,(3):89-93
针对网络空间海量态势目标数据的动态、实时绘制需求,本文在细节层次(LOD)控制技术的基础上提出了一种数据分片标记与内外存动态调度相结合的态势数据处理算法。首先在底图上建立多尺度显示模型,然后将网络态势数据节点进行分层、分块标记,最后在交互时基于用户当前视口进行动态加载调度,提高内存的使用效率。经验证,该算法可以减小内存开销,提高海量态势数据绘制的实时性,提供较好的用户体验。  相似文献   

7.
提出了一种新的实时数据仓库环境下的数据流更新算法——MESHJOIN*算法。算法的特性有:(1)关系R采用了分块和散列的组织形式,尽可能避免对当前连接无效元组的读取,减少连接操作所涉及元组的数量,从而提高连接算法的效率;(2)采用了多线程并发连接技术,并根据工程学原理,实现了连接操作和关系R读取操作的最佳调度,保证了连接算法效率的最大化;(3)根据当前系统的服务率和数据流元组的到达率之间的关系,合理调度实时元组和准实时元组的执行,保证了系统对实时元组的处理要求。实验结果表明,MESHJOIN*算法可以取得比MESHJOIN算法更好的性能。  相似文献   

8.
基于LOD控制与内外存调度的大型三维点云数据绘制   总被引:6,自引:1,他引:5  
通过结合基于视点的细节层次(level-of-detail,LOD)控制技术和内外存调度的数据控制策略,实现大型三维点云数据在一般配置PC机上的实时交互浏览.首先将输入点云分为大小相等的若干块。然后对每块数据分别建立误差控制下的多分辨率数据结构,并进行内外存分配.在交互绘制中,通过用户视点来确定当前的感兴趣区域,以控制模型表面的细节层次分布.该算法不但可以实现大型点云数据的实时交互绘制,而且可有效地提高一般点云数据绘制时的内存使用效率.  相似文献   

9.
王春凯  孟小峰 《软件学报》2018,29(3):869-882
并行环境下的分布式连接处理要求制定划分策略以减少状态迁移和通信开销。相对于数据库管理系统而言,分布式数据流管理系统中的在线θ连接操作需要更高的计算成本和内存资源。基于完全二部图的连接模型可支持分布式数据流的连接操作。因为连接操作的每个关系仅存放于二部图模型的一侧处理单元,无需复制数据,且处理单元相互独立,因此该模型具有内存高效、易伸缩和可扩展等特性。然而,由于数据流速的不稳定性和属性值分布的不均衡性,导致倾斜数据流的连接操作易出现集群负载不均衡的现象。针对倾斜数据流的连接操作,模型无法动态分配查询节点,并需要人工干预数据分组的参数设置。尤其是应对全部历史数据的连接查询,模型效率更低。基于上述问题,提出了管理倾斜数据流连接的框架,使用基于键值和元组混合的划分样式有效应对二部图模型的各侧倾斜数据。并设计了重新动态分配查询节点的策略和状态迁移算法,以支持全历史数据的连接查询和自适应的资源管理。针对合成数据和真实数据的实验表明,该方案可有效应对倾斜数据的连接操作并进一步提升分布式数据流管理系统的吞吐率,特别是降低云环境中的计算成本。  相似文献   

10.
本文提出了一种基于用户指导的多关系关联规则挖掘算法,借鉴有向图的概念动态的选择最优关键表,并利用元组ID传播的思想使多表间无需物理连接而能直接进行关联规则挖掘,同时引入用户指导的概念,提高了用户的满意程度及海量数据挖掘的效率和精确度,该算法能够直接支持关系数据库,且运行效率远远高于基于ILP技术的多关系关联规则挖掘算法,  相似文献   

11.
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively.  相似文献   

12.
分析了面向先进硬件平台上的数据库优化技术,提出了基于内存存储模型的多表连接查询处理优化技术,采用内存存储模型存储维表并对维表主键进行顺序化,从而使维表的主键与内存维表记录的内存偏移地址相一致,实现对维表记录的内存直接访问。通过列存储技术减少维表记录的访问宽度,进一步优化维表访问的cache性能。与基于SQL Server 2005的查询执行计划的连接算法、join index连接算法以及基于列存储模型的优化连接算法进行了实验比较和性能分析,结果表明:基于内存存储模型的多表连接算法在处理星型结构数据仓库多谓词、多连接的复杂查询时具有很好的性能,与join index相比不需要额外的空间开销,与列存储数据模型相比具有更好的兼容性和性能。  相似文献   

13.
We address the problem of load shedding for continuous multi-way join queries over multiple data streams. When the arrival rates of tuples from data streams exceed the system capacity, a load shedding algorithm drops some subset of input tuples to avoid system overloads. To decide which tuples to drop among the input tuples, most existing load shedding algorithms determine the priority of each input tuple based on the frequency or some historical statistics of its join attribute value, and then drop tuples with the lowest priority. However, those value-based algorithms cannot determine the priorities of tuples properly in environments where join attribute values are unique and each join attribute value occurs at most once in each data stream. In this paper, we propose a load shedding algorithm specifically designed for such environments. The proposed load shedding algorithm determines the priority of each tuple based on the order of streams in which its join attribute value appears, rather than its join attribute value itself. Consequently, the priorities of tuples can be determined effectively in environments where join attribute values are unique and do not repeat. The experimental results show that the proposed algorithm outperforms the existing algorithms in such environments in terms of effectiveness and efficiency.  相似文献   

14.
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse   总被引:1,自引:0,他引:1  
Active data warehousing has emerged as an alternative to conventional warehousing practices in order to meet the high demand of applications for up-to-date information. In a nutshell, an active warehouse is refreshed online and thus achieves a higher consistency between the stored information and the latest data updates. The need for online warehouse refreshment introduces several challenges in the implementation of data warehouse transformations, with respect to their execution time and their overhead to the warehouse processes. In this paper, we focus on a frequently encountered operation in this context, namely, the join of a fast stream 5" of source updates with a disk-based relation R, under the constraint of limited memory. This operation lies at the core of several common transformations such as surrogate key assignment, duplicate detection, or identification of newly inserted tuples. We propose a specialized join algorithm, termed mesh join (MESHJOIN), which compensates for the difference in the access cost of the two join inputs by 1) relying entirely on fast sequential scans of R and 2) sharing the I/O cost of accessing R across multiple tuples of 5". We detail the MESHJOIN algorithm and develop a systematic cost model that enables the tuning of MESHJOIN for two objectives: maximizing throughput under a specific memory budget or minimizing memory consumption for a specific throughput. We present an experimental study that validates the performance of MESHJOIN on synthetic and real-life data. Our results verify the scalability of MESHJOIN to fast streams and large relations and demonstrate its numerous advantages over existing join algorithms.  相似文献   

15.
This paper presents a memory-constrained hash join algorithm (PaMeCo Join) designed to operate with main-memory column-store database systems. Whilst RAM has become more affordable and the popularity of main-memory database systems continues to grow, we recognize that RAM is a finite resource and that database systems rarely have an excess of memory available to them. Therefore, we design PaMeCo to operate within an arbitrary memory limitation by processing the input relations by parts, and by using a compact hash table that represents the contained tuples in a compact format. Coupled with a radix-clustering system that lowers memory latencies, we find that PaMeCo can offer competitive performance levels to other contemporary hash join algorithms in an unconstrained environment, while being up to three times faster than a high-performing hash join when memory constraints are applied.  相似文献   

16.
在列数据库中,连接操作依然是最核心和最耗时的操作,GPU强大的计算能力可为此提供新的优化手段。基于Fermi架构,提出了新的Hash Join算法和Sort merge Join算法,其基本思想是充分利用该架构新增的缓存结构来减少连接操作的cache缺失率。与CUDA stream技术相结合,新算法在输出结果较多时可以有效地隐藏主存与显存间数据传输带来的延迟,进一步提升其执行效率。实验结果证实了基于Fcrmi架构的Hash Join算法处理偏抖数据的高效性及Sort merge Join算法的稳定性,并且通过比较表明,这两种算法的性能全面优于基于多核CPU充分优化的Join算法,最大加速2.4倍,在外键分布高偏抖时新的Hash Join算法的执行速度甚至达到每秒217M元组。  相似文献   

17.
We propose a new algorithm, called Stripe-join, for performing a join given a join index. Stripe-join is inspired by an algorithm called ‘Jive-join’ developed by Li and Ross. Stripe-join makes a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a set of temporary files that contain tuple identifiers but no input tuples. Stripe-join performs this efficiently even when the input relations are much larger than main memory, as long as the number of blocks in main memory is of the order of the square root of the number of blocks in the participating relations. Stripe-join is particularly efficient for self-joins. To our knowledge, Stripe-join is the first algorithm that, given a join index and a relation significantly larger than main memory, can perform a self-join with just a single pass over the input relation and without storing input tuples in intermediate files. Almost all the I/O is sequential, thus minimizing the impact of seek and rotational latency. The algorithm is resistant to data skew. It can also join multiple relations while still making only a single pass over each input relation. Using a detailed cost model, Stripe-join is analyzed and compared with competing algorithms. For large input relations, Stripe-join performs significantly better than Valduriez's algorithm and hash join algorithms. We demonstrate circumstances under which Stripe-join performs significantly better than Jive-join. Unlike Jive-join, Stripe-join makes no assumptions about the order of the join index.  相似文献   

18.
基于滑动窗口的数据流连接聚集查询降载策略   总被引:1,自引:1,他引:0       下载免费PDF全文
基于单个数据流的滑动窗口聚集查询降载技术和数据流连接技术,提出滑动窗口模型下的数据流连接聚集查询降载策略,给出判断系统是否过载的负载方程和使过载系统恢复到轻载状态的降载算法,使降载后的查询结果同时拥有较小的相对误差和最大的元组输出率。实验结果表明,该降载策略具有较好的可行性和适应性。  相似文献   

19.
提出一种新的传感器网络内的路径连接实现算法,在连接路径中,通过将有效元组的选择与实际连接一定程度分离,在信息产生节点附近实现元组选择,在查询节点附近实现元组的真正连接,减少了元组的重复传输,有效降低了能量损耗,特别在针对事件监测系统中,针对突发性的连接选择系数变化或较大的情况,有效避免大量连接结果过早产生和传输的大量能量损耗.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号