首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对数据源节点通信资源十分有限的缺陷,提出一种基于直方图的多数据流滑动窗口连接查询的降载策略。该算法综合中心处理节点和数据源节点的负载情况,给出降载比例计算公式,通过使用聚类技术构建中心直方图和数据源直方图,给出降载策略。结果证明该算法能够产生精确连接结果的极大子集,对多数据流窗口连接降载是高效的。  相似文献   

2.
基于滑动窗口的数据流连接聚集查询降载策略   总被引:1,自引:1,他引:0       下载免费PDF全文
基于单个数据流的滑动窗口聚集查询降载技术和数据流连接技术,提出滑动窗口模型下的数据流连接聚集查询降载策略,给出判断系统是否过载的负载方程和使过载系统恢复到轻载状态的降载算法,使降载后的查询结果同时拥有较小的相对误差和最大的元组输出率。实验结果表明,该降载策略具有较好的可行性和适应性。  相似文献   

3.
滑动窗口聚集查询在数据流管理系统中应用广泛,数据流到达高峰期,必须考虑滑动窗口聚集查询中出现的降载问题。分析了子集模型的特点和已有降载策略的不足,给出了数据流滑动窗口聚集查询降载问题的约束条件,提出了能保证子集结果产生的基于丢弃窗口更新策略的降载算法。理论分析和实验结果表明,该算法对数据流滑动窗口聚集查询降载问题的处理具有较高的有效性和实用性。  相似文献   

4.
Complex media fusion operations can be costly in terms of the time they need to process input objects. If data arrive faster to fusion nodes than the speed with which they can consume the inputs, this will result in some input objects not being processed. In this paper, we develop load shedding mechanisms which take into consideration both data quality and expensive nature of media fusion operators. In particular, we present quality assessment models for objects and multistream fusion operators and highlight that such quality assessments may impose partial orders on objects. We highlight that the most effective load control approach for fusion operators involves shedding of (not the individual input objects but) combinations of objects. Yet, identifying suitable combinations of objects in real time will not be possible if efficient combination selection algorithms do not exist. We develop efficient combination selection schemes for scenarios with different quality assessment and target characteristics. We first develop efficient combination-based load shedding when the fusion operator has unambiguously monotone semantics. We then extend this to the more general ambiguously monotone case and present experimental results that show the performance gains using quality-aware combination-based load shedding strategies under the various fusion scenarios.  相似文献   

5.
Tuple dropping, though commonly used for load shedding in most data stream operations, is generally inadequate for multiway windowed stream joins. The join output rate can be unnecessarily reduced because tuple dropping fails to exploit the time correlations that are likely to exist among interrelated streams. In this paper, we introduce GrubJoin-an adaptive multiway windowed stream join that effectively performs time correlation-aware CPU load shedding. GrubJoin maximizes the output rate by achieving near-optimal window harvesting, which picks only the most profitable segments of individual windows for the join. Due mainly to the combinatorial explosion of possible multiway join sequences involving different window segments, GrubJoin faces unique challenges that do not exist for binary joins, such as determining the optimal window harvesting configuration in a time-efficient manner and learning the time correlations among the streams without introducing overhead. To tackle these challenges, we formalize window harvesting as an optimization problem, develop greedy heuristics to determine near-optimal window harvesting configurations, and use approximation techniques to capture the time correlations. Our experimental results show that GrubJoin is vastly superior to tuple dropping when time correlations exist and is equally effective when time correlations are nonexistent.  相似文献   

6.
Data stream management systems need to adaptively control their resources, since stream characteristics and query workload may vary over time. In this paper, we investigate an approach to adaptive resource management for continuous sliding-window queries that adjusts window sizes and time granularities to keep resource usage within bounds. These two novel techniques differ from standard load shedding approaches based on sampling, as they ensure exact query answers for given user-defined quality of service specifications, even under query reoptimization. In order to quantify the effects of both techniques on the various operations in a query plan, we develop an appropriate cost model for estimating operator resource allocation in terms of memory usage and processing costs. A thorough experimental study not only validates the accuracy of our cost model but also demonstrates the efficacy and scalability of the proposed techniques.  相似文献   

7.
Semantic approximation of data stream joins   总被引:1,自引:0,他引:1  
We consider the problem of approximating sliding window joins over data streams in a data stream processing system with limited resources. In our model, we deal with resource constraints by shedding load in the form of dropping tuples from the data streams. We make two main contributions. First, we define the problem space by discussing architectural models for data stream join processing and surveying suitable measures for the quality of an approximation of a set-valued query result. Second, we examine in detail a large part of this problem space. More precisely, we consider the number of generated result tuples as the quality measure and we propose optimal offline and fast online algorithms for it. In a thorough experimental study with synthetic and real data, we show the efficacy of our solutions.  相似文献   

8.
基于测量的接纳控制研究   总被引:17,自引:1,他引:16  
与传统的接纳控制算法相比,基于测量的纳控制有诸多优点,首先它无需知识应用的流量模型,其次它能动态适应网络的负载变化,提高网络资源的利用率。文中分析了基于测量的接纳控制的基本思想,并在此基础上提出和实现了一种自适应的接纳控制算法(Adaptive Measurement-Based Admission Control,AMBAC).作者通过实验对该算法进行了验证,发现在系统资源利用率(或接纳能力)接近的情况下,与传统的(固定时间窗口的),MBAC相比,AMBAC能达到更低的平均分组丢失率。  相似文献   

9.
Sliding window-based frequent pattern mining over data streams   总被引:2,自引:0,他引:2  
Finding frequent patterns in a continuous stream of transactions is critical for many applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. Even though numerous frequent pattern mining algorithms have been developed over the past decade, new solutions for handling stream data are still required due to the continuous, unbounded, and ordered sequence of data elements generated at a rapid rate in a data stream. Therefore, extracting frequent patterns from more recent data can enhance the analysis of stream data. In this paper, we propose an efficient technique to discover the complete set of recent frequent patterns from a high-speed data stream over a sliding window. We develop a Compact Pattern Stream tree (CPS-tree) to capture the recent stream data content and efficiently remove the obsolete, old stream data content. We also introduce the concept of dynamic tree restructuring in our CPS-tree to produce a highly compact frequency-descending tree structure at runtime. The complete set of recent frequent patterns is obtained from the CPS-tree of the current window using an FP-growth mining technique. Extensive experimental analyses show that our CPS-tree is highly efficient in terms of memory and time complexity when finding recent frequent patterns from a high-speed data stream.  相似文献   

10.
In this paper, we study the incremental update of Frequent Closed Itemsets (FCIs) over a sliding window in a high-speed data stream. We propose the notion of semi-FCIs, which is to progressively increase the minimum support threshold for an itemset as it is retained longer in the window, thereby drastically reducing the number of itemsets that need to be maintained and processed. We explore the properties of semi-FCIs and observe that a majority of the subsets of a semi-FCI are not semi-FCIs and need not be updated. This finding allows us to devise an efficient algorithm, IncMine, that incrementally updates the set of semi-FCIs over a sliding window. We also develop an inverted index to facilitate the update process. Our empirical results show that IncMine achieves significantly higher throughput and consumes less memory than the state-of-the-art streaming algorithms for mining FCIs and FIs. IncMine also attains high accuracy of 100% precision and over 93% recall.  相似文献   

11.
Monitoring on data streams is an efficient method of acquiring the characters of data stream. However the available resources for each data stream are limited, so the problem of how to use the limited resources to process infinite data stream is an open challenging problem. In this paper, we adopt the wavelet and sliding window methods to design a multi-resolution summarization data structure, the Multi-Resolution Summarization Tree (MRST) which can be updated incrementally with the incoming data and can support point queries, range queries, multi-point queries and keep the precision of queries. We use both synthetic data and real-world data to evaluate our algorithm. The results of experiment indicate that the efficiency of query and the adaptability of MRST have exceeded the current algorithm, at the same time the realization of it is simpler than others.  相似文献   

12.
连接运算在数据流系统中扮演了关键角色,其执行方式与传统DBMS的连接运算不同,流连接降载策略与传统的网络降载也不相同,已开发许多连接降载策略。论文在介绍流连接运算、数据流及数据流系统的模型后,对连接降载时的系统约束、输出质量目标进行了正式阐述。提出连接降载策略的分类方法,并着重分析了目前一些较为重要的连接降载策略,指出其特征和应用范围。最后总结了好的连接降载策略应具有的特点及未来研究的发展趋势。  相似文献   

13.
滑动窗口是一种对最近一段时间内的数据进行挖掘的有效的技术,本文提出一种基于滑动窗口的流数据频繁项挖掘算法.算法采用了链表队列策略大大简化了算法,提高了挖掘的效率.对于给定的阈值S、误差ε和窗口长度n,算法可以检测在窗口内频度超过Sn的数据流频繁项,且使误差在εn以内.算法的空间复杂度为O(ε-1),对每个数据项的处理和查询时间均为O(1).在此基础上,我们还将该算法进行了扩展,可以通过参数的变化得到不同的流数据频繁项挖掘算法,使得算法的时间和空间复杂度之间得到调节.通过大量的实验证明,本文算法比其它类似算法具有更好的精度以及时间和空间效率.  相似文献   

14.
We address the problem of load shedding for continuous multi-way join queries over multiple data streams. When the arrival rates of tuples from data streams exceed the system capacity, a load shedding algorithm drops some subset of input tuples to avoid system overloads. To decide which tuples to drop among the input tuples, most existing load shedding algorithms determine the priority of each input tuple based on the frequency or some historical statistics of its join attribute value, and then drop tuples with the lowest priority. However, those value-based algorithms cannot determine the priorities of tuples properly in environments where join attribute values are unique and each join attribute value occurs at most once in each data stream. In this paper, we propose a load shedding algorithm specifically designed for such environments. The proposed load shedding algorithm determines the priority of each tuple based on the order of streams in which its join attribute value appears, rather than its join attribute value itself. Consequently, the priorities of tuples can be determined effectively in environments where join attribute values are unique and do not repeat. The experimental results show that the proposed algorithm outperforms the existing algorithms in such environments in terms of effectiveness and efficiency.  相似文献   

15.
错别字自动识别是自然语言处理中一项重要的研究任务,在搜索引擎、自动问答等应用中具有重要价值.尽管传统方法在识别文本中多字词错误方面的准确率较高,但由于中文单字词错误具有特殊性,传统方法对中文单字词检错准确率较低.该文提出了一种基于Transformer网络的中文单字词检错方法.首先,该文通过充分利用汉字混淆集和Web网...  相似文献   

16.
We present the design of a predictive load shedding scheme for a network monitoring platform that supports multiple and competing traffic queries. The proposed scheme can anticipate overload situations and minimize their impact on the accuracy of the traffic queries. The main novelty of our approach is that it considers queries as black boxes, with arbitrary (and highly variable) input traffic and processing cost. Our system only requires a high-level specification of the accuracy requirements of each query to guide the load shedding procedure and assures a fair allocation of computing resources to queries in a non-cooperative environment. We present an implementation of our load shedding scheme in an existing network monitoring system and evaluate it with a diverse set of traffic queries. Our results show that, with the load shedding mechanism in place, the monitoring system can preserve the accuracy of the queries within predefined error bounds even during extreme overload conditions.  相似文献   

17.
This paper proposes a sliding window approach, whose length and time shift are dynamically adaptable in order to improve model confidence, speed and segmentation accuracy in human action sequences. Activity recognition is the process of inferring an action class from a set of observations acquired by sensors. We address the temporal segmentation problem of body part trajectories in Cartesian Space in which features are generated using Discrete Fast Fourier Transform (DFFT) and Power Spectrum (PS). We pose this as an entropy minimization problem. Using entropy from the classifier output as a feedback parameter, we continuously adjust the two key parameters in a sliding window approach, to maximize the model confidence at every step. The proposed classifier is a Dynamic Bayesian Network (DBN) model where classes are estimated using Bayesian inference. We compare our approach with our previously developed fixed window method. Experiments show that our method accurately recognizes and segments activities, with improved model confidence and faster convergence times, exhibiting anticipatory capabilities. Our work demonstrates that entropy feedback mitigates variability problems, and our method is applicable in research areas where action segmentation and classification is used. A working demo source code is provided online for academical dissemination purposes, by requesting the authors.  相似文献   

18.
流式数据处理中,数据倾斜等原因易导致计算节点的负载不均衡,降低系统处理能力。传统的负载均衡方法,比如算子分配、算子迁移和负载脱落等技术因为相对较高的性能代价,在流式处理系统中没有得到广泛的应用。针对流式处理系统的特点,提出一种新的负载均衡方法。在该方法中,计算单元的数据被划分为若干分区,并且数据分区可以在计算单元中动态分配和迁移,在较少干扰系统运行的情况下,通过动态调整各计算单元的分区,平衡各个计算单元的输入流和利用率,以此达到负载平衡的目的。在此基础上,设计并实现了流式处理系统的负载均衡算法和数据在线迁移技术。实验结果表明,该方法能够显著减少数据处理的平均延迟,提高系统吞吐量。  相似文献   

19.
针对机器人足球系统的高度实时性、不确定性,提出了一种基于统计预测的路径规划方法,该方法考虑到障碍物的速度大小和方向的不确定性,用数学统计的方法对障碍物的运动进行建模;机器人在运动过程中,根据得到的环境信息在机器视觉范围内建立预测窗口和避障窗口,在预测窗口内,机器人根据障碍物的信息建立障碍物的预测区域,在避障窗口内,机器人根据自身的位置与障碍物的预测区域,分别调用切线法或滚动窗口法进行路径规划;该方法属于局部路径规划方法,机器人在移动过程中需要不断更新环境信息来进行避障.  相似文献   

20.
基于互信息的配准方法具有自动化程度高、配准精度高等优点,近来已成为医学图像处理领域的热点。基于互信息的配准方法实质上是一种对灰度进行统计和计算的方法,同一图像采用不同的窗宽窗位表示必然会影响配准结果。该文在分析窗宽窗位对图像质量的影响和基于互信息配准方法的影响的基础上,进行一系列的医学图像配准试验。在分析配准结果的基础上,给出基于互信息的配准方法所采用的合理窗宽窗位的建议。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号