The use of transformers-based  language models in the field of life sciences and medicine has grown exponentially in the last two years . Due to the vocabulary size differing from the typical text alphabet, the large language models pre-trained on the genome, exome, or proteome data, such as DNABERT , require further investigation in the direction of their resource and energy consumption. During this project, the tasks will include:
- Investigation of the GPU and energy usage of pre-trained omics transformers-based models and, in turn, better understand their similarities and differences to text transformers models.
- Explore the differences between language models trained on different omics data types.
- Discuss their possible energy and GPU resource utilization improvements.
The supervisors are open to an application for projects other than master thesis.
 Vaswani et al, “Attention is all you need”, Advances in Neural Information Processing Systems, 2017.
 Zhang et al, „Applications of transformer-based language models in bioinformatics: a survey“, Bioinform Adv, 2023.
 Ji et al, “DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome”, Bioinformatics, 2021.