April 3, 2026
From March 29 – April 2, 2026, ScaDS.A Dresden/Leipzig joined the 48th European Conference on Information Retrieval (ECIR 2026) in Delft, the Netherlands.
Lena Jurkschat presented the paper “Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference”. The paper has been co-authored by Cornelius Kummer, Prof. Michael Färber, and Prof. Sahar Vahdati.
With the wide adoption of language models for IR – and specifically RAG systems – the latency of the underlying LLM becomes a crucial bottleneck, since the long contexts of retrieved passages lead large prompts and therefore, compute increase. Prompt compression, which reduces the size of input prompts while aiming to preserve performance on downstream tasks, has established itself as a cost-effective and low-latency method for accelerating inference in large language models. However, its usefulness depends on whether the additional preprocessing time during generation is offset by faster decoding.
The authors present the first systematic, large-scale study of this trade-off, with thousands of runs and 30,000 queries across several open-source LLMs and three GPU classes. Their evaluation separates compression overhead from decoding latency while tracking output quality and memory usage. LLMLingua achieves up to 18% end-to-end speed-ups, when prompt length, compression ratio, and hardware capacity are well matched, with response quality remaining statistically unchanged across summarization, code generation, and question answering tasks. Outside this operating window, however, the compression step dominates and cancels out the gains. They also show that effective compression can reduce memory usage enough to offload workloads from data center GPUs to commodity cards, with only a 0.3 s increase in latency. Their open-source profiler predicts the latency break-even point for each model–hardware setup, providing practical guidance on when prompt compression delivers real-world benefits.
Find the proceedings of ECIR 2026 here.
At ECIR 2026, Prof. Michael Färber received the Senior Program Committee Member Award. Congratulations. He is professor for Scalable Software Architectures for Data Analytics at ScaDS.AI Dresden/Leipzig.
ECIR 2026 took place from March 29 – April 2, 2026 in Delft, the Netherlands. As Europe’s premier forum for research in Information Retrieval (IR), it brought together researchers, practitioners, and industry experts. The conference offered a platform to discuss innovative work shaping the future of search, recommendation, and generative information access. Learn more about the conference on its official website.