The Big Data Framework Thrill: An Investigation of Functionality and Performance

Type of thesis: Masterarbeit / location: Dresden / Status of thesis: Finished theses

The goal of the thesis is to investigate the Big Data processing framework Thrill for its functionality and performance. This framework is written in C++ and offers an MPI communication backend. We first discuss and understand the framework’s features. We then introduce how to configure and execute a Thrill application on an HPC cluster. The benchmark word count and a use-case with data from an HPC cooling system is implemented using Thrill. We proceed to evaluate Thrill as an HPC application and also run performance comparisons with Apache Spark. Thrill showed good HPC scalability characteristics and outperformed Apache Spark. A faster execution time compared to Apache Spark and having a native C++ framework implementation opens new possibilities to exploit faster data processing using Thrill on an HPC cluster.

Counterpart

Dr. Tara Lazariv

Dr. Christoph Lehmann

TU
Universität
Max
Leibnitz-Institut
Helmholtz
Hemholtz