Scaling Stream Processing Out and Up
Fast analysis of data is becoming increasingly important in many domains. To process data incrementally and in real time, many applications are leveraging stream processing systems. At the same time, new data sources become available and affordable, which means scalable solutions are required. Current developments in stream processing systems therefore are built to scale to tens to hundreds of nodes. However, modern hardware architectures also provide massive scale-up potential. In this talk, we will give an overview on big data stream processing and will contrast scale-out to scale-up approaches. We will give details on the use of big data streaming systems using Apache Flink as an example. Apache Flink is an open source system for expressive, declarative, fast, and efficient data analysis on both batch and streaming data. Flink combines the scalability and programming flexibility of distributed MapReduce-like platforms with the efficiency, out-of-core execution, and query optimization capabilities found in parallel databases. Furthermore, we will introduce concepts of stream processing on modern hardware, thus realizing high performance stream processing on few powerful machines.
Tilmann Rabl is a visiting professor at the Database Systems and Information Management (DIMA) group. At DIMA he is research director and technical coordinator of the Berlin Big Data Center (BBDC). Tilmann received his PhD at the University of Passau in 2011. He spent 4 years at the University of Toronto as a postdoc in the Middleware Systems Research Group (MSRG). Tilmann has published more than 50 papers in international conferences and journals and gave numerous invited presentations. In his PhD thesis, Tilmann invented the Parallel Data Generation Framework (PDGF), for which he received the Transaction Performance Processing Council’s (TPC) Technical Contribution Award. In Toronto, he received a MITACS Award in 2013 and 2014 and an IBM CAS postdoctoral fellowship in 2013 and 2014. He is a professional affiliate of the TPC and co-founder and chair of the SPEC Research working group on big data. Tilmann is member of the steering committee of the Workshop on Big Data Benchmarking (WBDB) series and member of the board of directors of the BigData Top100 List. Tilmann is also CEO and cofounder of the startup bankmark, for which he acquired an EXIST award. bankmark has been awarded the IKT Innovativ Award 2014 and the Weconomy Award 2015 among others.
Sebastian Breß received his PhD (Dr.-Ing.) from University of Magdeburg, Germany in 2015, under the supervision of Gunter Saake (University of Magdeburg) and Jens Teubner (TU Dortmund). He is the the initiator and system architect of the open source database system CoGaDB and the Hawk Code Generator. Currently, Sebastian is a Senior Researcher at German Research Center for Artificial Intelligence (DFKI) and a PostDoc at Technische Universität Berlin, working with Prof. Dr. Volker Markl and Prof. Dr. Tilmann Rabl. Sebastian‘s research interests include data management on modern hardware, query compilation, stream processing, and optimizing data management systems for heterogeneous processors.
Back to the Summer School 2018 overview