共查询到20条相似文献,搜索用时 15 毫秒
1.
To enable efficiency in stream processing, the evaluation of a query is usually performed over bounded parts of (potentially) unbounded streams, i.e., processing windows “slide” over the streams. To avoid inefficient re-evaluations of already evaluated parts of a stream in respect to a query, incremental evaluation strategies are applied, i.e., the query results are obtained incrementally from the result set of the preceding processing state without having to re-evaluate all input buffers. This method is highly efficient but it comes at the cost of having to maintain processing state, which is not trivial, and may defeat performance advantages of the incremental evaluation strategy. In the context of RDF streams the problem is further aggravated by the hard-to-predict evolution of the structure of RDF graphs over time and the application of sub-optimal implementation approaches, e.g., using relational technologies for storing data and processing states which incur significant performance drawbacks for graph-based query patterns. To address these performance problems, this paper proposes a set of novel operator-aware data structures coupled with incremental evaluation algorithms which outperform the counterparts of relational stream processing systems. This claim is demonstrated through extensive experimental results on both simulated and real datasets. 相似文献
2.
Enterprise Communication Systems are designed in such a way to maximise the efficiency of communication and collaboration within the enterprise. With users becoming mobile, the Internet of Things (IoT) can play a crucial role in this process, but is far from being seamlessly integrated into modern online communications. In this paper, we present a semantic infrastructure for gathering, integrating and reasoning upon heterogeneous, distributed and continuously changing data streams by means of semantic technologies and rule-based inference. Our solution exploits semantics to go beyond today’s ad-hoc integration and processing of heterogeneous data sources for static and streaming data. It provides flexible and efficient processing techniques that can transform low-level data into high-level abstractions and actionable knowledge, bridging the gap between IoT and online Enterprise Communication Systems. We document the technologies used for acquisition and semantic enrichment of sensor data, continuous semantic query processing for integration and filtering, as well as stream reasoning for decision support. Our main contributions are the following, (i) we define and deploy a semantic processing pipeline for IoT-enabled Communication Systems, which builds upon existing systems for semantic data acquisition, continuous query processing and stream reasoning, detailing the implementation of each component of our framework; (ii) we present a rich semantic information model for representing and linking IoT data, social data and personal data in the Enterprise Communication scenario, by reusing and extending existing standard semantic models; (iii) we define and develop an expressive stream reasoning component as part of our framework, based on continuous query processing and non-monotonic reasoning for semantic streams, (iv) we conduct experiments to comparatively evaluate the performance of our data acquisition and semantic annotation layer based on OpenIoT, and the performance of our expressive reasoning layer in the scenario of Enterprise Communication. 相似文献
3.
如何有效管理并利用日益庞大的RDF数据是当今Web数据管理领域面临的挑战之一。对大规模的RDF数据集进行聚类操作从而得到数据集的有效划分是RDF数据存储和应用时通常采取的策略。针对现有RDF聚类过程中忽略RDF三元组自身模式特征的问题,在对RDF聚类结果的形式深入分析的基础上,定义了3种不同类型的聚类模式,从而提出基于模式的聚类方法。通过对RDF数据集的重新描述,自动生成适用于RDF数据集特征的聚类模式,在此基础上实现数据聚类的任务。在不同测试集上的实验结果验证了所提方法的正确性和有效性。 相似文献
4.
5.
传统的语义数据流推理使用前向或后向链式推理产生确定性的答案,但是在复杂的传递规则推理中效率不高,无法满足实时数据流处理场景对答案的及时性要求。因此,提出一种基于联合嵌入模型的知识表示方法,并应用于语义数据流处理中。将规则与事实三元组联合嵌入并利用深度学习模型进行训练,在推理阶段,根据查询中涉及的规则建立推理模板,利用深度学习模型对推理模板产生的三元组进行预测和分类,将结果作为查询和推理答案输出。实验表明,对于复杂规则推理,基于知识表示学习的实时语义数据流推理能够在保障较好推理准确性和命中率的前提下有效地降低延迟。 相似文献
7.
In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as ‘existential variables’. We first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete information in database theory. We then cover the different (and sometimes incompatible) treatment of blank nodes across the W3C stack of RDF-related standards. We present an empirical survey of the blank nodes present in a large sample of RDF data published on the Web (the BTC-2012 dataset), where we find that 25.7% of unique RDF terms are blank nodes, that 44.9% of documents and 66.2% of domains featured use of at least one blank node, and that aside from one Linked Data domain whose RDF data contains many “blank node cycles”, the vast majority of blank nodes form tree structures that are efficient to compute simple entailment over. With respect to the RDF-merge of the full data, we show that 6.1% of blank-nodes are redundant under simple entailment. The vast majority of non-lean cases are isomorphisms resulting from multiple blank nodes with no discriminating information being given within an RDF document or documents being duplicated in multiple Web locations. Although simple entailment is NP-complete and leanness-checking is coNP-complete, in computing this latter result, we demonstrate that in practice, real-world RDF graphs are sufficiently “rich” in ground information for problematic cases to be avoided by non-naive algorithms. 相似文献
8.
《Advanced Engineering Informatics》2014,28(4):370-380
Building related data tends to be generated, used and retained in a domain-specific manner. The lack of interoperability between data domains in the architecture, engineering and construction (AEC) industry inhibits the cross-domain use of data at an enterprise level. Semantic web technologies provide a possible solution to some of the noted interoperability issues. Traditional methods of information capture fail to take into account the wealth of soft information available throughout a building. Several sources of information are not included in performance assessment frameworks, including social media, occupant communication, mobile communication devices, occupancy patterns, human resource allocations and financial information.The paper suggests that improved data interoperability can aid the integration of untapped silos of information into existing structured performance measurement frameworks, leading to greater awareness of stakeholder concerns and building performance. An initial study of how building-related data can be published following semantic web principles and integrated with other ‘soft-data’ sources in a cross-domain manner is presented. The paper goes on to illustrate how data sources from outside the building operation domain can be used to supplement existing sources. Future work will include the creation of a semantic web based performance framework platform for building performance optimisation. 相似文献
9.
对主要的流数据模型进行了比较分析,讨论了基于概要结构的流数据处理模型---Synopsis模型。在Synopsis模型的基础上引入移动代理,提出了一种基于移动代理的分布式多流数据处理模型MADSPM。最后对基于MADSPM模型的流数据关联规则挖掘问题中需注意的一些问题进行了阐述与分析。 相似文献
10.
随着计算机和网络技术的迅猛发展以及数据获取手段的不断丰富,在越来越多的领域出现了对海量、高速数据进行实时处理的需求.由于此类需求往往超出传统数据处理技术的能力,分布式流处理模式应运而生.首先回顾分布式流处理技术产生的背景以及技术演进过程,然后将其与其他相关大数据处理技术进行对比,以界定分布式流数据处理的外延.进而对分布式流处理所需要考虑的数据模型、系统模型、存储管理、语义保障、负载控制、系统容错等主要问题进行深入分析,指出现有解决方案的优势和不足.随后,介绍S4,Storm,Spark Streaming等几种具有代表性的分布式流处理系统,并对它们进行系统地对比.最后,给出分布式流处理在社交媒体处理等领域的几种典型应用,并探讨分布式流处理领域进一步的研究方向. 相似文献
11.
信息系统在进行知识的挖掘和管理时,需要处理各种形式的数据,流数据便是其中之一.流数据具有数据规模大、产生速度快且蕴含的知识具有较强时效性等特点,因而发展支持实时处理应用的流计算技术对于信息系统的知识管理十分重要.流计算系统可以追溯到 29 世纪 90 年代,至今已经经历了长足的发展.然而,当前多样化的知识管理需求和新一代的硬件架构为流计算系统带来了全新的挑战和机遇,催生出了一系列流计算领域的技术研究.首先介绍流计算系统的基本需求以及发展脉络,再按照编程接口、执行计划、资源调度和故障容错 4 个层次分别分析流计算系统领域的相关技术;最后,展望流计算技术在未来可能的研究方向和发展趋势. 相似文献
12.
H. L. A. van der Spek S. Groot E. M. Bakker H. A. G. Wijshoff 《International journal of parallel programming》2008,36(6):592-623
Irregular access patterns are a major problem for today’s optimizing compilers. In this paper, a novel approach will be presented
that enables transformations that were designed for regular loop structures to be applied to linked list data structures.
This is achieved by linearizing access to a linked list, after which further data restructuring can be performed. Two subsequent
optimization paths will be considered: annihilation and sublimation, which are driven by the occurring regular and irregular access patterns in the applications. These intermediate codes are
amenable to traditional compiler optimizations targeting regular loops. In the case of sublimation, a run-time step is involved
which takes the access pattern into account and thus generates a data instance specific optimized code. Both approaches are
applied to a sparse matrix multiplication algorithm and an iterative solver: preconditioned conjugate gradient. The resulting
transformed code is evaluated using the major compilers for the x86 platform, GCC and the Intel C compiler. 相似文献
13.
随着高分辨率遥感卫星数据获取能力和地面数传接收能力的提高,现有遥感卫星快视处理系统的处理负载增大,实时性要求越来越难以满足。针对这些问题,采用流式计算思想提出了一种新的遥感卫星数据快视处理系统设计方法。在分析遥感卫星数据快视处理数据流特点的基础上,应用Storm框架对现有系统进行并行优化,设计遥感数据流处理任务拓扑结构,同时利用消息队列中间件Kafka改进处理单元间数据交换和数据缓存方式。实验表明,该系统在数据吞吐率和可靠性方面测试效果良好。 相似文献
14.
对传感器产生的语义数据流执行复杂推理的能力, 最近已成为语义网社区中的重要研究领域, 而目前大多数RDF流处理系统是以SPARQL (W3C标准RDF查询语言)为基础实现的, 但这些引擎在捕获复杂的用户需求和处理复杂的推理任务方面存在局限性. 针对此问题, 本文结合并扩展了回答集编程(Answer Set Programing, ASP)技术用于对RDF流进行连续的处理. 为了验证本方法的有效性, 首先以智能家居本体为实验对象, 并分析传感器设备间的共有特性及复杂事件以构建本体库; 然后基于本体库产生实例对象, 并通过中间件产生RDF数据流; 接下来通过扩展ASP, 充分利用其表达和推理能力以减少推理时间, 并设计了RDF 流的窗口划分策略等, 然后根据用户的请求, 选择性地进行静态知识库加载等; 最后通过实验与Sparkwave和Laser进行对比, 证明了该方法在延迟和内存上的性能优势. 相似文献
15.
流式数据处理中,数据倾斜等原因易导致计算节点的负载不均衡,降低系统处理能力。传统的负载均衡方法,比如算子分配、算子迁移和负载脱落等技术因为相对较高的性能代价,在流式处理系统中没有得到广泛的应用。针对流式处理系统的特点,提出一种新的负载均衡方法。在该方法中,计算单元的数据被划分为若干分区,并且数据分区可以在计算单元中动态分配和迁移,在较少干扰系统运行的情况下,通过动态调整各计算单元的分区,平衡各个计算单元的输入流和利用率,以此达到负载平衡的目的。在此基础上,设计并实现了流式处理系统的负载均衡算法和数据在线迁移技术。实验结果表明,该方法能够显著减少数据处理的平均延迟,提高系统吞吐量。 相似文献
16.
RDF is the data interchange layer for the Semantic Web.In order to manage the increasing amount of RDF data, an RDF repository should provide not only the necessary scalability and efficiency,but also sufficient inference capabilities. Though existing RDF repositories have made progress towards these goals,there is still ample space for improving the overall performance.In this paper,we propose a native RDF repository,SystemⅡ,to pursue a better tradeoff among system scalability,query efficiency,and infer... 相似文献
17.
本文介绍了一种基于Borland C Builder与LabWin-dows/CVI平台的数据采集与处理系统.该系统应用于对滚动轴承的故障信号进行数据采集与分析处理。用BorlandC Builder语言编写数据采集程序,将采集到的数据传给LabWindows/CVI,用LabWindows/CVI编写数据处理程序。数据传送时用到了动态连接库技术(DLL)。通过实例证明了这两种语言结合使用的优势。 相似文献
18.
大数据环境下的分布式数据流处理关键技术探析 总被引:1,自引:0,他引:1
大数据环境下的数据流处理实时性要求高,数据计算要求持续性和高可靠性。分布式数据流处理系统(DDSPS)能解决大数据环境下的数据流处理问题,它除具备分布式系统的可扩展性和容错性优势外,还具有高的实时处理能力。详细介绍了组成基于大数据的分布式数据流处理系统的四个子系统及其关键技术,讨论和比较了各个子系统的不同技术方案;同时介绍一种分布式拒绝服务(DDoS)攻击检测数据流处理系统结构案例,其研究内容能为大数据环境下的数据流处理理论研究和应用技术开发提供技术参考。 相似文献
19.
基于流式计算的空间科学卫星数据实时处理 总被引:1,自引:0,他引:1
针对空间科学卫星探测数据的实时处理要求越来越高的问题,提出一种基于流计算框架的空间科学卫星数据实时处理方法。首先,根据空间科学卫星数据处理特点对数据流进行抽象分析;然后,对各处理单元的输入输出数据结构进行重新定义;最后,基于流计算框架Storm设计数据流处理并行结构,以适应大规模数据并行处理和分布式计算的要求。对应用该方法开发的空间科学卫星数据处理系统进行测试分析,测试结果显示,在相同条件下数据处理时间比原有系统缩短了一半;数据局部性策略比轮询策略具有更高的吞吐率,数据元组吞吐率平均提高29%。可见采用流式计算框架能够大幅缩短数据处理延迟,提高空间科学卫星数据处理系统的实时性。 相似文献
20.