面向股票交易分析场景的流式大数据系统测试框架 System Test Framework of Stream Data for Stock Trading Analysis Scenario期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向股票交易分析场景的流式大数据系统测试框架

引用本文：	史凌云,郑莹莹,谭励,许利杰,王伟,魏峻.面向股票交易分析场景的流式大数据系统测试框架[J].计算机系统应用,2020,29(4):76-83.

作者姓名：	史凌云郑莹莹谭励许利杰王伟魏峻

作者单位：	北京工商大学计算机与信息工程学院,北京 100048;中国科学院软件研究所,北京 100190;中国科学院大学,北京 100049;中国科学院软件研究所,北京 100190;计算机科学国家重点实验室,北京 100190;中国科学院软件研究所,北京 100190;中国科学院大学,北京 100049;计算机科学国家重点实验室,北京 100190

基金项目：	北京市自然科学基金（4172013）；北京市自然科学基金-海淀原始创新联合基金（L182007）；国家自然科学基金（61802377，61702020）及其配套项目（PXM2018_014213_000033）；国家重点研发计划（2016YFD0401104）

摘要：	分布式集群环境使得数据实时计算更为复杂，流式大数据处理系统的正确性难以保障.现有的大数据基准测试框架可以测试流式大数据处理系统的性能表现，但是普遍存在应用场景设计简单、评价指标不充分等不足.针对这一挑战，本文构造了一个面向股票交易场景的流式大数据基准测试框架，通过生成股票高频交易数据，测试系统在高流速场景下的延迟、吞吐量、GC时间、CPU资源等的性能表现.同时，通过横向测试验证流式大数据系统的扩展性.本文以Apache Spark Streaming为待测系统进行测试，实验结果表明，高流速场景下出现延迟增加、GC时间提高等性能下降问题，原因是系统输入速率的提高及并行度的增加.
关键词：	流式大数据处理系统性能基准测试 Apache Spark Streaming
收稿时间：	2019/9/4 0:00:00
修稿时间：	2019/9/23 0:00:00
System Test Framework of Stream Data for Stock Trading Analysis Scenario

SHI Ling-Yun,ZHENG Ying-Ying,TAN Li,XU Li-Jie,WANG Wei and WEI Jun.System Test Framework of Stream Data for Stock Trading Analysis Scenario[J].Computer Systems& Applications,2020,29(4):76-83.

Authors:	SHI Ling-Yun ZHENG Ying-Ying TAN Li XU Li-Jie WANG Wei and WEI Jun

Affiliation:	School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China,Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China,School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China,Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;State Key Laboratory of Computer Science, Beijing 100190, China,Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;State Key Laboratory of Computer Science, Beijing 100190, China and Institute of Software, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;State Key Laboratory of Computer Science, Beijing 100190, China

Abstract:	Distributed cluster environment makes real-time data computation more complex, and the correctness of stream large data processing system is difficult to guarantee. The existing large data benchmarking framework can test the performance of stream large data processing system, but there are many shortcomings such as simple application scenario design and inadequate evaluation index. To address this challenge, this study constructs a stream large data benchmarking framework for stock trading scenarios, generates high-frequency stock trading data through a flow-based data generator, and tests the performance of the system in high-speed scenarios in terms of delay, throughput, GC time, CPU resources, and so on. At the same time, the scalability of large data streaming system is verified by horizontal test. In this study, Apache Spark Streaming is used as the test system to test. The experimental results show that the performance degradation problems such as delay increase and GC time increase occur in high-speed scenarios because of the increase of input rate and parallelism of the system.

Keywords:	stream data processing system performance benchmark Apache Spark Streaming
本文献已被万方数据等数据库收录！
	点击此处可从《计算机系统应用》浏览原始摘要信息
	点击此处可从《计算机系统应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏