#7: Big Data Performance Analysis

On the 03-February-2022 at 11:00 a.m. the seventh lecture of the Living Lab lecture series took place. In this talk, ScaDS.AI scientific researcher Jan Frenzel talked about big data performance analysis. He introduced the audience to the area of performance evaluation and performance investigation of these frameworks. Furthermore, he presented benefits of using an established performance analysis tool, Vampir, as an alternative to the dashboards of Apache Spark and Apache Flink.

In the last years, the amount of data that needs to processed has increased tremendously. Java-based frameworks, such as Apache Hadoop, Apache Spark and Apache Flink have been developed to simplify the work with distributed data by hiding much of the complexity related to distributed data processing, such as splitting data or moving data in the compute cluster, behind functional building blocks. However, because of this hidden complexity, performance analysis of applications written with these frameworks is particularly challenging. The performance could be limited by the application, the framework itself or the framework’s configuration. Different approaches could be used to investigate these potential causes of low performance.

You can rewatch this lecture on YouTube.