Home // LLM Inference Acceleration Techniques: A Literature Review

LLM Inference Acceleration Techniques: A Literature Review

Type of thesis: Bachelorarbeit / location: Dresden / Status of thesis: Open theses

Description

Large Language models based on Transformers are a rapidly evolving field. With the increasing amount of pretrained models, the need for efficient inference methods grows. However, inference still lacks from underutilized hardware and yet a too slow generation of tokens on small amounts of GPUs to make it suitable for large amounts of data or real-time applications. To overcome those issues, several inference acceleration methods and frameworks arised in the last years. Yet, the performance comparability of those is still an open topic. Therefore, the goal of the research project is to review and evaluate the field of inference acceleration literature to create a survey still missing in the LLM community.

The topic can be worked on as a Bachelor’s thesis or research project.

Requirements

General Understanding of Neural Networks
Basics in Natural Language Processing and language modeling
Basic understanding of transformer architecture

Counterpart

Lena Jurkschat

TU Dresden

GPT-X, Natural Language Processing

E-Mail

Quicklinks

Standorte

Dresden

Technische Universität Dresden
ScaDS.AI Dresden/Leipzig
Bürogebäude Strehlener Straße
Strehlener Straße 12, 14
01069 Dresden

Leipzig

Löhrs Carré
Humboldtstraße 25,
3. Obergeschoss
04105 Leipzig

Postal address Leipzig:
Universität Leipzig
Data Science Zentrum
Internes Postfach: 212104
04081 Leipzig