首页 | 本学科首页   官方微博 | 高级检索  
     

HTAP数据库关键技术综述
引用本文:张超,李国良,冯建华,张金涛.HTAP数据库关键技术综述[J].软件学报,2023,34(2):761-785.
作者姓名:张超  李国良  冯建华  张金涛
作者单位:清华大学 计算机科学与技术系, 北京 100084
基金项目:国家自然科学基金(61925205,62072261,62232009)
摘    要:混合事务与分析处理(hybridtransactionalanalyticalprocessing,HTAP)技术是一种基于一站式架构同时处理事务请求与查询分析请求的技术. HTAP技术不仅消除了从关系型事务数据库到数据仓库的数据抽取、转换和加载过程,还支持实时地分析最新事务数据.然而,为了同时处理OLTP与OLAP, HTAP系统也需要在系统性能与数据分析新鲜度之间做出取舍,这主要是因为高并发、短时延的OLTP与带宽密集型、高时延的OLAP访问模式不同且互相干扰.目前,主流的HTAP数据库主要以行列共存的方式来支持混合事务与分析处理,但是由于该类数据库面向不同的业务场景,所以它们的存储架构与处理技术各有不同.首先,全面调研HTAP数据库,总结它们主要的应用场景与优缺点,并根据存储架构对它们进行分类、总结与对比.现有综述工作侧重于基于行/列单格式存储的HTAP数据库以及基于Spark的松耦合HTAP系统,而这里侧重于行列共存的实时HTAP数据库.特别地,凝炼了主流HTAP数据库关键技术,包括数据组织技术、数据同步技术、查询优化技术、资源调度技术这4个部分.同时总结分析了HTAP数据库构...

关 键 词:HTAP数据库  行列共存  数据组织  查询优化  数据同步  资源调度
收稿时间:2022/2/18 0:00:00
修稿时间:2022/5/8 0:00:00

Survey of Key Techniques of HTAP Databases
ZHANG Chao,LI Guo-Liang,FENG Jian-Hu,ZHANG Jin-Tao.Survey of Key Techniques of HTAP Databases[J].Journal of Software,2023,34(2):761-785.
Authors:ZHANG Chao  LI Guo-Liang  FENG Jian-Hu  ZHANG Jin-Tao
Affiliation:Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Abstract:Hybrid transactional analytical processing (HTAP) relies on a single system to process the mixed workloads of transactions and analytical queries simultaneously. It not only eliminates the extract-transform-load (ETL) process, but also enables real-time data analysis. Nevertheless, in order to process the mixed workloads of OLTP and OLAP, such systems must balance the trade-off between workload isolation and data freshness. This is mainly because of the interference of highly-concurrent short-lived OLTP workloads and bandwidth-intensive, long-running OLAP workloads. Most existing HTAP databases leverage the best of row store and column store to support HTAP. As there are different requirements for different HTAP applications, HTAP databases have disparate storage strategies and processing techniques. This study comprehensively surveys the HTAP databases. The taxonomy of state-of-the-art HTAP databases is introduced according to their storage strategies and architectures. Then, their pros and cons are summarized and compared. Different from previous works that focus on single-model and spark-based loosely-coupled HTAP systems, real-time HTAP databases with a row-column dual store are focused on. Moreover, a deep dive into their key techniques is accomplished regarding data organization, data synchronization, query optimization, and resource scheduling. The existing HTAP benchmarks are also introduced. Finally, the research challenges and open problems are discussed for HTAP.
Keywords:HTAP databases  row and column  data organization  query optimization  data synchronization  resource scheduling
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号