Status: open / Type of Theses: Master theses / Location: Dresden
The use of transformers-based [1] language models in the field of life sciences and medicine has grown exponentially in the last two years [2]. Due to the vocabulary size differing from the typical text alphabet, the large language models pre-trained on the genome, exome, or proteome data, such as DNABERT [3], require further investigation in the direction of their resource and energy consumption. During this project, the tasks will include:
[1] Vaswani et al, “Attention is all you need”, Advances in Neural Information Processing Systems, 2017.
[2] Zhang et al, “Applications of transformer-based language models in bioinformatics: a survey”, Bioinform Adv, 2023.
[3] Ji et al, “DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome”, Bioinformatics, 2021.