期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于滑动窗口的数据流连接聚集查询降载策略 总被引：1，自引：1，他引：0

康伟李战怀张龙波《计算机工程》2009,35(22):50-52

基于单个数据流的滑动窗口聚集查询降载技术和数据流连接技术,提出滑动窗口模型下的数据流连接聚集查询降载策略,给出判断系统是否过载的负载方程和使过载系统恢复到轻载状态的降载算法,使降载后的查询结果同时拥有较小的相对误差和最大的元组输出率。实验结果表明,该降载策略具有较好的可行性和适应性。相似文献

2.

数据流中一种适应性查询处理机制 总被引：1，自引：0，他引：1

宋宝燕张立杰陆岩于戈《计算机科学》2006,33(10):16-20

针对数据流中连续查询特征，本文提出一种适应性的查询处理机制，它不但能在有限时间内最大可能地输出结果元组，也可对有限的元组以最快时限输出。而此查询处理机制主要依托于基于输出速率的代价模型，此模型将不断变化的流速、谓词选择率、操作符处理时间作为代价函数变量，将输出速率作为代价模型的函数值。因此此代价模型可适应环境以及数据流本身不断变化的因素，并可作为查询计划动态选择的标准。实验证明此适应性查询处理机制最终能有效地提高输出速率、增加查询吞吐量、减少时间延迟，降低查询间内存占有量。相似文献

3.

数据流上的一种适应性查询优化及调度策略

宋宝燕陆岩张俊宁《计算机研究与发展》2006,43(Z3)

针对数据流系统中的查询处理机制进行了深入的研究,从内存使用量和查询的实时性两方面进行综合考虑,提出了一种适应性查询优化策略Slope.该策略一方面可以根据各操作符的选择度和单位时间处理元组个数来适时调整查询计划;另一方面可以按调整后的查询计划进行非等值时间片轮转调度.还给出了Slope策略的相应算法并进行了性能测试. 相似文献

4.

数据流系统中一种基于速率的抢占式批处理方法

下载免费PDF全文

宋宝燕李志强李巍张立杰于戈《计算机工程》2007,33(3):50-52

针对数据流的特征，提出了一种基于速率的抢占式批处理方法。一个查询计划是一个操作符序列。文章将一个查询计划划分为不同的操作单元，并为不同的操作单元分配不同的优先级，而且这个优先级随系统因素的变化而动态改变，根据变化的优先级来动态调度操作单元，采取抢占式调度，从而提高连续查询的查询效率。实验表明该方法不但能提高系统的总体性能，而且可以减少元组的平均等待时间，大大提高了元组的输出速率。相似文献

5.

数据流上Ad Hoc查询的自适应处理算法

黄浩杨卫东《计算机工程》2013,(9)

对数据流上的Ad Hoc查询进行自适应处理,需要保证已有查询计划快速在线更新和迁移,但现有方法实现新旧查询计划的更新需要大量的滑动窗口状态转换。为此,提出一种Ad Hoc查询自适应处理算法。该算法基于数据流概要分布特性和自定义评分模型,快速计算出现有查询计划的最佳增量更新,以实现新到达的 Ad Hoc 查询处理,降低新旧查询计划切换时间。在数据流benchmark Linear Road提供的高速公路数据集上进行实验,结果表明,与MS、PT方法相比,该算法可较快完成新旧查询计划的切换。相似文献

6.

应对倾斜数据流在线连接方法

王春凯孟小峰《软件学报》2018,29(3):869-882

并行环境下的分布式连接处理要求制定划分策略以减少状态迁移和通信开销。相对于数据库管理系统而言,分布式数据流管理系统中的在线θ连接操作需要更高的计算成本和内存资源。基于完全二部图的连接模型可支持分布式数据流的连接操作。因为连接操作的每个关系仅存放于二部图模型的一侧处理单元,无需复制数据,且处理单元相互独立,因此该模型具有内存高效、易伸缩和可扩展等特性。然而,由于数据流速的不稳定性和属性值分布的不均衡性,导致倾斜数据流的连接操作易出现集群负载不均衡的现象。针对倾斜数据流的连接操作,模型无法动态分配查询节点,并需要人工干预数据分组的参数设置。尤其是应对全部历史数据的连接查询,模型效率更低。基于上述问题,提出了管理倾斜数据流连接的框架,使用基于键值和元组混合的划分样式有效应对二部图模型的各侧倾斜数据。并设计了重新动态分配查询节点的策略和状态迁移算法,以支持全历史数据的连接查询和自适应的资源管理。针对合成数据和真实数据的实验表明,该方案可有效应对倾斜数据的连接操作并进一步提升分布式数据流管理系统的吞吐率,特别是降低云环境中的计算成本。相似文献

7.

HoliAdapt—数据流中一种适应性查询处理策略

张立杰李志强宋宝燕《计算机应用》2006,26(9):2028-2030

针对数据流上连续查询处理的特征，我们从选择率和执行时间的角度出发,考虑内存使用量和输出延迟适应性因素，提出一种适应性的查询处理策略—HoliAdapt。该策略基于查询窗口动态地收集统计信息，利用数学方法不断地优化查询计划，通过核心调度方法，对操作符进行适应性的调度,有效地减少时间延迟和内存使用量，提高系统查询的效率。相似文献

8.

高速网络中数据流处理系统的适应性机制

下载免费PDF全文

陈磊松《计算机工程》2007,33(22):155-157

高速网络环境中的数据是以数据流的形式存在，数据到达可能是突发性的，数据到达速率是随着时间变化的，对数据流的实时查询处理要能适应数据流的特性和网络的波动环境，该文对数据流查询计划中调度策略的适应性进行了分析，应用闭环控制理论和合适的调度算法，实现在降低运行时系统存储需求的同时保持较低的输出延迟，提高了适应性和查询的精度。相似文献

9.

数据流持续查询中调度策略的适应性研究

陈磊松《计算机时代》2006,(10):47-49

处于高速网络环境中的许多应用所需要处理的数据是以数据流的形式存在的，数据到达可能是突发性的。数据到达速率是随着时间变化的，对数据流的实时查询处理要能适应数据流的特性和网络的波动环境。文章对数据流查询计划中调度策略的适应性进行分析，实现在降低运行时系统存储需求的同时保持较低的输出延迟，在一定程度上能够适应数据流到达速度的变化。相似文献

10.

数据流上的一种适应性调度策略 MultiFactor

宋宝燕陆岩张俊宁张立杰于戈《小型微型计算机系统》2007,28(1):107-111

本文针对数据流系统中的查询处理机制进行了深入的研究,从内存使用量、系统的响应时间和查询的实时性三个方面进行综合考虑,提出了一种基于多因素的动态调度策略MultiFactor.该策略根据单位时间查询内各操作符消耗的元组数动态调整操作符调度次序,按系统截止时间确定各操作符调度时间.本文还给出了MultiFactor策略的相应算法,并通过实验证明了其性能优势. 相似文献

11.

Incremental Evaluation of Sliding-Window Queries over Data Streams

Ghanem T.M. Hammad M.A. Mokbel M.F. Aref W.G. Elmagarmid A.K. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(1):57-72

Two research efforts have been conducted to realize sliding-window queries in data stream management systems, namely, query revaluation and incremental evaluation. In the query reevaluation method, two consecutive windows are processed independently of each other. On the other hand, in the incremental evaluation method, the query answer for a window is obtained incrementally from the answer of the preceding window. In this paper, we focus on the incremental evaluation method. Two approaches have been adopted for the incremental evaluation of sliding-window queries, namely, the input-triggered approach and the negative tuples approach. In the input-triggered approach, only the newly inserted tuples flow in the query pipeline and tuple expiration is based on the timestamps of the newly inserted tuples. On the other hand, in the negative tuples approach, tuple expiration is separated from tuple insertion where a tuple flows in the pipeline for every inserted or expired tuple. The negative tuples approach avoids the unpredictable output delays that result from the input-triggered approach. However, negative tuples double the number of tuples through the query pipeline, thus reducing the pipeline bandwidth. Based on a detailed study of the incremental evaluation pipeline, we classify the incremental query operators into two classes according to whether an operator can avoid the processing of negative tuples or not. Based on this classification, we present several optimization techniques over the negative tuples approach that aim to reduce the overhead of processing negative tuples while avoiding the output delay of the query answer. A detailed experimental study, based on a prototype system implementation, shows the performance gains over the input-triggered approach of the negative tuples approach when accompanied with the proposed optimizations 相似文献

12.

Reliability of answers to queries in relational databases 总被引：1，自引：0，他引：1

Sadri F. 《Knowledge and Data Engineering, IEEE Transactions on》1991,3(2):245-251

The author studies the problem of determining the reliability of answers to queries in a relational database system, where the information in the database comes from various sources with varying degrees of reliability. An extended relational model is proposed in which each tuple in a relation is associated with an information source vector which identifies the information source(s) that contributed to that tuple. The author shows how relational algebra operations can be extended, and implemented using information source vectors, to calculate the vector corresponding to each tuple in the answer to a query, and hence, to identify information source(s) contributing to each tuple in the answer. This also enables the database system to calculate the reliability of each tuple in the answer to a query as a function of the reliability of information sources 相似文献

13.

基于最小Steiner树的关键词查询方法

张宇金顺福刘国华苑迎李丽乐《小型微型计算机系统》2010,31(1)

在关系数据库中,关键词查询无需用户学习查询语言和数据库模式相关知识,而且有效地扩大了查询范围.采用元组图描述关系数据库中元组关系,可使关键词查询问题转化为元组图的最小Steiner树求解问题.本文提出元组图上基于相似度的边权重计算方法,使边权重能够反映元组与关键词相似度的大小.然后,鉴于最小Steiner树求解问题是NP-完全问题,提出按照贪心策略执行Dijkstra算法的最小Steiner树较优解求解算法.最后,通过实验对算法进行了分析和验证. 相似文献

14.

Probabilistic query answering over inconsistent databases

Sergio Greco Cristian Molinaro 《Annals of Mathematics and Artificial Intelligence》2012,64(2-3):185-207

This paper presents a framework for querying inconsistent databases in the presence of functional dependencies. Most of the works dealing with the problem of extracting reliable information from inconsistent databases are based on the notion of repair, a minimal set of tuple insertions and deletions which leads the database to a consistent state (called repaired database), and the notion of consistent query answer, a query answer that can be obtained from every repaired database. In this work, both the notion of repair and query answer differ from the original ones. In the presence of functional dependencies, tuple deletions are the only operations that are performed in order to restore the consistency of an inconsistent database. However, deleting a tuple to remove an integrity violation potentially eliminates useful information in that tuple. In order to cope with this problem, we adopt a notion of repair, based on tuple updates, which allows us to better preserve information in the source database. A drawback of the notion of consistent query answer is that it does not allow us to discriminate among non-consistent answers, namely answers which can be obtained from a non-empty proper subset of the repaired databases. To obtain more informative query answers, we propose the notion of probabilistic query answer, that is query answers are tuples associated with probabilities. This new semantics of query answering over inconsistent databases allows us to give a measure of uncertainty to query answers. We show that the problem of computing probabilistic query answers is FP ^#P-complete. We also propose a technique for computing probabilistic answers to arbitrary relational algebra queries. 相似文献

15.

Adaptive scheduling for shared window joins over data streams

Jin Cheqing Zhou Aoying Jeffrey Xu Yu Joshua Zhexue Huang Cao Feng 《Frontiers of Computer Science in China》2007,1(4):468-477

Recently a few Continuous Query systems have been developed to cope with applications involving continuous data streams. At the same time, numerous algorithms are proposed for better performance. A recent work on this subject was to define scheduling strategies on shared window joins over data streams from multiple query expressions. In these strategies, a tuple with the highest priority is selected to process from multiple candidates. However, the performance of these static strategies is deeply influenced when data are bursting, because the priority is determined only by static information, such as the query windows, arriving order, etc. In this paper, we propose a novel adaptive strategy where the priority of a tuple is integrated with realtime information. A thorough experimental evaluation has demonstrated that this new strategy can outperform the existing strategies. 相似文献

16.

Improving performance by creating a native join-index for OLAP

Yansong Zhang Shan Wang Jiaheng Lu 《Frontiers of Computer Science in China》2011,5(2):236-249

The performance of online analytical processing (OLAP) is critical for meeting the increasing requirements of massive volume analytical applications. Typical techniques, such as in-memory processing, column-storage, and join indexes focus on high performance storage media, efficient storage models, and reduced query processing. While they effectively perform OLAP applications, there is a vital limitation: mainmemory database based OLAP (MMOLAP) cannot provide high performance for a large size data set. In this paper, we propose a novel memory dimension table model, in which the primary keys of the dimension table can be directly mapped to dimensional tuple addresses. To achieve higher performance of dimensional tuple access, we optimize our storage model for dimension tables based on OLAP query workload features. We present directly dimensional tuple accessing (DDTA) based join (DDTAJOIN), a technique to optimize query processing on the memory dimension table by direct dimensional tuple access. We also contribute by proposing an optimization of the predicate tree to shorten predicate operation length by pruning useless predicate processing. Our experimental results show that the DDTA-JOIN algorithm is superior to both simulated row-store main memory query processing and the open-source column-store main memory database MonetDB, thanks to the reduced join cost and simple yet efficient query processing. 相似文献

17.

Optimization and evaluation of disjunctive queries 总被引：2，自引：0，他引：2

Claussen J. Kemper A. Moerkotte G. Peithner K. Steinbrunn M. 《Knowledge and Data Engineering, IEEE Transactions on》2000,12(2):238-260

It is striking that the optimization of disjunctive queries-i.e. those which contain at least one OR-connective in the query predicate-has been vastly neglected in the literature, as well as in commercial systems. In this paper, we propose a novel technique, called bypass processing, for evaluating such disjunctive queries. The bypass processing technique is based on new selection and join operators that produce two output streams: the TRUE-stream with tuples satisfying the selection (join) predicate and the FALSE-stream with tuples not satisfying the corresponding predicate. Splitting the tuple streams in this way enables us to “bypass” costly predicates whenever the “fate” of the corresponding tuple (stream) can be determined without evaluating this predicate. In the paper, we show how to systematically generate bypass evaluation plans utilizing a bottom-up building-block approach. We show that our evaluation technique allows us to incorporate the standard SQL semantics of null values. For this, we devise two different approaches: one is based on explicitly incorporating three-valued logic into the evaluation plans; the other one relies on two-valued logic by “moving” all negations to atomic conditions of the selection predicate. We describe how to extend an iterator-based query engine to support bypass evaluation with little extra overhead. This query engine was used to quantitatively evaluate the bypass evaluation plans against the traditional evaluation techniques utilizing a CNFor DNF-based query predicate 相似文献

18.

Efficient histogram-based range query estimation for dirty data

Yan ZHANG Hongzhi WANG Long YANG Jianzhong LI 《Frontiers of Computer Science》2018,12(5):984-999

In recent years, data quality issues have attracted wide attentions. Data quality problems are mainly caused by dirty data. Currently, many methods for dirty data management have been proposed, and one of them is entity-based relational database in which one tuple represents an entity. The traditional query optimizations are not suitable for the new entity-based model. Then new query optimizations need to be developed. In this paper, we propose a new query selectivity estimation strategy based on histogram, and focus on solving the overestimation which traditional methods lead to. We prove our approaches are unbiased. The experimental results on both real and synthetic data sets show that our approaches can give good estimates with low error. 相似文献

19.

An incremental clustering scheme for data de-duplication

Gianni Costa Giuseppe Manco Riccardo Ortale 《Data mining and knowledge discovery》2010,20(1):152-187

We propose an incremental technique for discovering duplicates in large databases of textual sequences, i.e., syntactically different tuples, that refer to the same real-world entity. The problem is approached from a clustering perspective: given a set of tuples, the objective is to partition them into groups of duplicate tuples. Each newly arrived tuple is assigned to an appropriate cluster via nearest-neighbor classification. This is achieved by means of a suitable hash-based index, that maps any tuple to a set of indexing keys and assigns tuples with high syntactic similarity to the same buckets. Hence, the neighbors of a query tuple can be efficiently identified by simply retrieving those tuples that appear in the same buckets associated to the query tuple itself, without completely scanning the original database. Two alternative schemes for computing indexing keys are discussed and compared. An extensive experimental evaluation on both synthetic and real data shows the effectiveness of our approach. 相似文献