首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Aurora: a new model and architecture for data stream management   总被引:43,自引:0,他引:43  
This paper describes the basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications. Monitoring applications differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS currently under construction at Brandeis University, Brown University, and M.I.T. We first provide an overview of the basic Aurora model and architecture and then describe in detail a stream-oriented set of operators.Received: 12 September 2002, Accepted: 26 March 2003, Published online: 21 July 2003Edited by Y. Ioannidis  相似文献   

2.
Innovation in the fields of wireless data communications, mobile devices and biosensor technology enables the development of new types of monitoring systems that provide people with assistance anywhere and at any time. In this paper we present an architecture useful to build those kind of systems that monitor data streams generated by biological sensors attached to mobile users. We pay special attention to three aspects related to the system efficiency: selection of the optimal granularity, that is, the selection of the size of the input data stream package that has to be acquired in order to start a new processing cycle; the possible use of compression techniques to store and send the acquired input data stream and; finally, the performance of a local analysis versus a remote one. Moreover, we introduce two particular real systems to illustrate the suitability and applicability of our proposal: an anywhere and at any time monitoring system of heart arrhythmias and an apnea monitoring system.  相似文献   

3.
Monitoring aggregate queries in real-time over distributed streaming environments appears to be a great challenge not only because of the huge data volume and high rate, but also because of the limitation of the network transmission bandwidth. Consequently, ensuring qualified approximate results with economical network consumption becomes one of the most important goals in such scenarios. In this paper, we study how to monitor aggregate queries continuously over distributed environments efficiently by disposing numerous filters at remote sites, in order to transmit only a small part of incoming data to the query site and therefore save the network resource significantly. We also show how to adjust the parameters of a filter continuously when the incoming data distribution at the corresponding remote site changes. Analysis and extensive experimental results demonstrate that our approach outperforms the existing work.  相似文献   

4.
This paper presents resource management techniques for allocating communication and computational resources in a distributed stream processing platform. The platform is designed to exploit the synergy of two classes of network connections—dedicated and opportunistic. Previous studies we conducted have demonstrated the benefits of such bi-modal resource organization that combines small pools of dedicated computers with a very large pool of opportunistic computing capacities of idle computers to serve high throughput computing applications. This paper extends the idea of bi-modal resource organization into the management of communication resources. Since distributed stream processing applications demand large volume of data transmission between processing sites at a consistent rate, adequate control over the network resources is important to ensure a steady flow of processing. The system model used in this paper is a platform where stream processing servers at distributed sites are interconnected with a combination of dedicated and opportunistic communication links. Two pertinent resource allocation problems are analyzed in detail and solved using decentralized algorithms. One is mapping of the processing and the communication tasks of the stream processing workload on the processing and the communication resources of the platform. The other is the dynamic re-allocation of the communication links due to variations in the capacity of the opportunistic communication links. Overall optimization goal of the allocations is higher task throughput and better utilization of the expensive dedicated links without deviating much from the timely completion of the tasks. The algorithms are evaluated through extensive simulation with a model based on realistic observations. The results demonstrate that the algorithms are able to exploit the synergy of bi-modal communication links towards achieving the optimization goals.  相似文献   

5.
基于物理参数的仪控系统阈值判决是核电站仪控系统的重要功能,但其存在阈值判决固定单一,缺乏与时间、工况的相关性等不足.对已有阈值判决进行改进和优化,通过分析研究,在安全前提下,从多种数据处理算法对比、阈值多级浮动、与具体工况相关联等方面作出改进.改进后的阈值判决有助于提升仪控系统的容错、故障诊断能力,减少不必要的停堆次数,提高了电厂的经济效益.  相似文献   

6.
针对中国环流新一号HL—1M受控核聚变装置实验对实时数据获取和处理的要求,开发了一个专用分布式实时数据采集和处理系统。系统有效地利用了网络的快速传送和数据吞吐能力,充分地发挥分布处理功能,使实时响应和数据处理能力大大提高。每次放电采集数据量达20MB以上,实时处理并显示结果,数据曲线可在30秒内完成在过处理将5—7MB数据存入数据库保存供脱机分析用,并在两次放电间隙完成近400个通道数据曲线的硬拷贝。  相似文献   

7.
分布式密度和中心点数据流聚类算法的研究   总被引:1,自引:0,他引:1  
分析分布式数据流聚类算法的基本框架结构,针对CluStream算法对非球形聚类效果不佳提出一种基于密度和中心点的分布式数据流聚类算法DDCS-Clustering(Distributed Density and Centers Stream Clustering)。该算法应用密度、中心点与衰减时间窗口,在分布式环境下对数据流进行聚类。实验结果表明,DDCS-Clustering算法具有较高的聚类质量与较低的通信代价。  相似文献   

8.
随着通信技术和硬件设备的不断发展,尤其是小型无线传感设备的广泛应用,数据采集和生成技术变得越来越便捷和趋于自动化,研究人员正面临着如何管理和分析大规模动态数据集的问题。能够产生数据流的领域应用已经非常普通,例如传感器网络、金融证券管理、网络监控、Web日志以及通信数据在线分析等新型应用。这些应用的特征是环境配备有多个分布式计算节点;这些节点往往临近于数据源;分析和监控这种环境下的数据,往往需要对挖掘任务、数据分布、数据流入速率和挖掘方法有一定的了解。综述了分布式数据流挖掘的当前进展概况,并展望了未来可能的、潜在的专题研究方向。  相似文献   

9.
随着信息高速公路的兴起,分布式多媒体系统越来越受到人们的重视,同时也向网络通信机械提出了新的挑战,该文首先介绍分布式多媒体系统中通信机制所涉及的一些重要的基本概念及其基本特点,然后从服务网络,网络服务和通信协议3个方面介绍网络通信机制对分布式多媒体系统的支持,最后得出结论。  相似文献   

10.
Next generation real-time applications demand big-data infrastructures to process huge and continuous data volumes under complex computational constraints. This type of application raises new issues on current big-data processing infrastructures. The first issue to be considered is that most of current infrastructures for big-data processing were defined for general purpose applications. Thus, they set aside real-time performance, which is in some cases an implicit requirement. A second important limitation is the lack of clear computational models that could be supported by current big-data frameworks. In an effort to reduce this gap, this article contributes along several lines. First, it provides a set of improvements to a computational model called distributed stream processing in order to formalize it as a real-time infrastructure. Second, it proposes some extensions to Storm, one of the most popular stream processors. These extensions are designed to gain an extra control over the resources used by the application in order to improve its predictability. Lastly, the article presents some empirical evidences on the performance that can be expected from this type of infrastructure.  相似文献   

11.
Usually the data generation rate of a data stream is unpredictable, and some data elements of the data stream cannot be processed in real time if the generation rate exceeds the capacity of a data stream processing algorithm. In order to overcome this situation gracefully, a load shedding technique is recommended. This paper proposes a frequency-based load shedding technique over a data stream of tuples. In many data stream processing applications, such as mining frequent patterns, data elements having high frequency can be considered more significant than others having low frequency. Based on this observation, in the proposed technique, only frequent elements of a data stream are processed in real time while the others are trimmed. The decision to shed a load from the data stream or not is controlled automatically by the data generation rate of a data stream. Consequently, an unnecessary load shedding operation is not allowed in the proposed technique.  相似文献   

12.
近年来,分布式系统中的数据流监测是一个十分活跃的领域。研究了如何实现通用并且高效的分布式top-k监测,即在分布的多数据流中根据用户给定的排序函数连续监测最大的k个值。在实际应用中,用户给定的排序函数可能是任意的排序函数,然而,目前的分布式top-k监测技术只支持加法作为排序函数。提出了一种通用的支持任意的连续的严格单调的聚集函数的分布式top-k监测算法GMR。GMR的通讯代价和k无关。通过真实世界数据和模拟数据验证了GMR的效率。实验表明,GMR的网络通讯量比同类方法低一个数量级以上。  相似文献   

13.
针对测井数据处理系统中交互数据检索速度慢等一系列问题,提出一个专用分布式并行模型。该模型通过分布式并行技术提高数据检索速度,通过定点同步的方法保证了多用户数据一致性。性能及试验分析表明,该模型提高了系统的容量和处理速度,提供了高效的数据一致性维护等服务。  相似文献   

14.
Clusters of mobile elements, such as vehicles and humans, are a common mobility pattern of interest for many applications. The on-line detection of them from large position streams of mobile entities is a challenging task because it requires algorithms that are capable of continuously and efficiently processing the high volume of position updates in a timely manner. Currently, the majority of approaches for cluster detection operate in batch mode, where position updates are recorded during time periods of certain length and then batch processed by an external routine, thus delaying the result of the cluster detection until the end of the time period. However, if the monitoring application requires results at a higher frequency than the one delivered by batch algorithms, then results might not reflect the current clustering state of the entities. To overcome this limitation, in this paper we propose DG2CEP, an algorithm that combines the well-known density-based clustering algorithm DBSCAN with the data stream processing paradigm Complex Event Processing (CEP) to achieve continuous, on-line detection of clusters. Our experiments with synthetic and real world datasets indicate that DG2CEP is able to detect the formation and dispersion of clusters with small latency and higher similarity to DBSCAN׳s output than batch-based approaches.  相似文献   

15.
GridON is an application that converts high-resolution broadcast video into MPEG-2 format, thereby reducing the file size and resolution. The application uses the user controlled lightpaths (UCLP) software to create on-demand, end-to-end, high-bandwidth dedicated connections to access remote computers. The converted MPEG-2 files can be distributed much faster and further than the source files to these dispersed computers, for reassembly into the higher resolution format. This paper describes the demonstration that took place last September at the iGrid 2005 conference held in San Diego. As a proof of concept, we successfully demonstrated that a video transfer in a Grid network environment can be integrated with a user-controlled lightpath provisioning system.  相似文献   

16.
17.
论述了在分布式虚拟环境系统中采用层次结构、多服务器/客户机的数据管理模式下的Cache机制,并对该系统结构下的数据查询与读取能力进行了测试,测试结果表明该数据管理系统具有良好的数据服务性能。  相似文献   

18.
针对高密度数据计算的要求,提出了一种VLIW处理器阵列多芯片互联的简单方法,通过独特的微码结构,建立具有可配置特征的高速数据通道的控制模型,适合构建高性能的媒体处理器阵列,模型能有效地改善系统扩展所需要的灵活性,实现高带宽的存储器接口和高性能的总线控制结构,提高了数据存取的连续性和灵活性,避免了运行过程中大量不必要的系统中断和功能切换开销,可显著提高数据处理带宽。  相似文献   

19.
In a distributed stream processing system, streaming data are continuously disseminated from the sources to the distributed processing servers. To enhance the dissemination efficiency, these servers are typically organized into one or more dissemination trees. In this paper, we focus on the problem of constructing dissemination trees to minimize the average loss of fidelity of the system. We observe that existing heuristic-based approaches can only explore a limited solution space and hence may lead to sub-optimal solutions. On the contrary, we propose an adaptive and cost-based approach. Our cost model takes into account both the processing cost and the communication cost. Furthermore, as a distributed stream processing system is vulnerable to inaccurate statistics, runtime fluctuations of data characteristics, server workloads, and network conditions, we have designed our scheme to be adaptive to these situations: an operational dissemination tree may be incrementally transformed to a more cost-effective one. Our adaptive strategy employs distributed decisions made by the distributed servers independently based on localized statistics collected by each server at runtime. For a relatively static environment, we also propose two static tree construction algorithms relying on apriori system statistics. These static trees can also be used as initial trees in a dynamic environment. We apply our schemes to both single- and multi-object dissemination. Our extensive performance study shows that the adaptive mechanisms are effective in a dynamic context and the proposed static tree construction algorithms perform close to optimal in a static environment.  相似文献   

20.
介绍了数据流技术的发展现状,然后讨论了适应性查询在数据管理中的发展演变,特别是在数据流管理中的特殊性。最后,在此基础上,提出了一个支持适应性查询的数据流管理系统RealStream,并详细介绍了其适应性查询处理机制。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号