LLM Inference Acceleration Techniques: A Literature Review

Type of thesis: Bachelorarbeit / location: Dresden / Status of thesis: Open theses

Description

Large Language models based on Transformers are a rapidly evolving field. With the increasing amount of pretrained models, the need for efficient inference methods grows. However, inference still lacks from underutilized hardware and yet a too slow generation of tokens on small amounts of GPUs to make it suitable for large amounts of data or real-time applications. To overcome those issues, several inference acceleration methods and frameworks arised in the last years. Yet, the performance comparability of those is still an open topic. Therefore, the goal of the research project is to review and evaluate the field of inference acceleration literature to create a survey still missing in the LLM community.

The topic can be worked on as a Bachelor’s thesis or research project.

Requirements

  • General Understanding of Neural Networks
  • Basics in Natural Language Processing and language modeling
  • Basic understanding of transformer architecture

Counterpart

Lena Jurkschat

TU Dresden

GPT-X, Natural Language Processing

TU
Universität
Max
Leibnitz-Institut
Helmholtz
Hemholtz
Institut
Fraunhofer-Institut
Fraunhofer-Institut
Max-Planck-Institut
Institute
Max-Plank-Institut