Home // Master/Diploma thesis: Tracing of Spark Communication with Score-P
Type of thesis: Diplomarbeit / location: Dresden / Status of thesis: Finished theses
In the Big Data domain, Java-based frameworks, such as Apache Spark, provide an approach to distribute the application workload for processing large-scale datasets. For fast and resource-efficient execution of such applications, it is important to find and optimize program sections that limit the speed of the application’s execution. For performance analysis, the code of an application is enhanced with measurement code for capturing timestamps of method entries and exits (instrumentation). A program trace or profile is automatically created when the application executes and can be later presented to the performance analyst. Currently, the measurement captures method entries and exits, as well as thread events, such as when one parent thread creates a child thread. As communication between processes can be an interesting source of performance insights, measurements need to collect information about messages sent between different processes of a Spark cluster. Such messages can be of different types, such as status messages sent between the Spark master and workers, as well as messages for data transfer sent from one executor to another.
In this master or diploma thesis, a method should be investigated to identify code regions in the Spark framework that are related to communication. In the second step, the identified regions should be instrumented automatically, so that they collect timestamps, message types and message sizes of communication sections when they are executed. The runtime overhead of the message tracing should be evaluated. A measurement infrastructure based on Score-P (see: http://www.score-p.org) with instrumentation and filtering possibilities is available.
Service and Transfer Center
Performance analysis/estimation of Big data applications, Big data frameworks on HPC
ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.
Chemnitzer Str. 46b,
Copyright 2022 © ScaDS.AI Dresden/Leipzig – All rights reserved.