首页 | 本学科首页   官方微博 | 高级检索  
     


The Stratosphere platform for big data analytics
Authors:Alexander Alexandrov  Rico Bergmann  Stephan Ewen  Johann-Christoph Freytag  Fabian Hueske  Arvid Heise  Odej Kao  Marcus Leich  Ulf Leser  Volker Markl  Felix Naumann  Mathias Peters  Astrid Rheinländer  Matthias J. Sax  Sebastian Schelter  Mareike Höger  Kostas Tzoumas  Daniel Warneke
Affiliation:1. Technische Universit?t Berlin, Berlin, Germany
2. Humboldt-Universit?t zu Berlin, Berlin, Germany
3. Hasso Plattner Institute, Potsdam, Germany
4. International Computer Science Institute, Berkeley, CA, USA
Abstract:We present Stratosphere, an open-source software stack for parallel data analysis. Stratosphere brings together a unique set of features that allow the expressive, easy, and efficient programming of analytical applications at very large scale. Stratosphere’s features include “in situ” data processing, a declarative query language, treatment of user-defined functions as first-class citizens, automatic program parallelization and optimization, support for iterative programs, and a scalable and efficient execution engine. Stratosphere covers a variety of “Big Data” use cases, such as data warehousing, information extraction and integration, data cleansing, graph analysis, and statistical analysis applications. In this paper, we present the overall system architecture design decisions, introduce Stratosphere through example queries, and then dive into the internal workings of the system’s components that relate to extensibility, programming model, optimization, and query execution. We experimentally compare Stratosphere against popular open-source alternatives, and we conclude with a research outlook for the next years.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号