首页 | 本学科首页   官方微博 | 高级检索  
     

基于Apache Flink的RDF流数据查询
引用本文:郑滔,刘梦赤,冯嘉美.基于Apache Flink的RDF流数据查询[J].计算机与现代化,2020,0(11):47-55.
作者姓名:郑滔  刘梦赤  冯嘉美
作者单位:华南师范大学计算机学院,广东 广州 510631;华南师范大学计算机学院,广东 广州 510631;华南师范大学计算机学院,广东 广州 510631
基金项目:国家自然科学基金;广州市大数据智能教育重点实验室
摘    要:目前成熟的RDF流处理(RDF Stream Processing, RSP)系统由于集中式的设计而缺乏并行处理特性,因此在查询处理大量传入的RDF流数据时,均无法实现高吞吐和低延迟。为提高查询性能,本文对RSP查询过程和Flink流计算结构进行研究,设计数据源、滤器、多路分区连接和投影4个逻辑操作符,并设计一种多流连接(Multi-Stream Join, MSJ)算法用于生成具有并行性的有向无环图的逻辑查询计划,最后以大数据流处理平台Apache Flink为底层实现逻辑操作符和逻辑查询计划。使用真实数据集SRBench和模拟数据集LUBMs进行实验验证。结果表明,与最成熟的系统C-SPARQL、CQELS相比,单机吞吐量增长高达10倍,5台机器集群的吞吐量增长高达28倍,同时在延时方面达到了毫秒级;在查询性能方面实现了处理大量RDF流数据时吞吐量的提高和延时的降低。

关 键 词:RDF流    并行处理    逻辑操作符    多流连接    Apache  Flink  
收稿时间:2020-12-03

RDF Stream Data Query Based on Apache Flink
Abstract:At present, mature RDF Stream Processing (RSP) systems lack parallel processing characteristics due to the centralized design. Therefore, when querying and processing a large amount of incoming RDF stream data, high throughput and low latency cannot be achieved. In order to improve the query performance, this paper researches the RSP query process and Flink stream calculation structure, designs four logical operators: source, filter, multi-way partition join and project, and designs a Multi-Stream Join (MSJ) algorithm that is used to generate a logical query plan of a directed acyclic graph with parallelism. Finally, a big data stream processing platform called Apache Flink is used to implement the logical operator and logical query plan. The real data set SRBench and simulated data set LUBMs are used for experimental verification. The results show that compared with the most mature systems C-SPARQL and CQELS, the throughput of a single machine increases by 10 times, the throughput of a cluster of 5 machines increases by 28 times, and the latency reaches the millisecond level; in terms of query performance, the throughput is improved and the latency is reduced when processing a large amount of RDF stream data.
Keywords:RDF stream  parallel processing  logical operators  multi-stream join  Apache Flink  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机与现代化》浏览原始摘要信息
点击此处可从《计算机与现代化》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号