The Helmholtz Centre for Environmental Research (UFZ) investigates the interaction between environment and humans. One focus is the fate of chemicals that are released to the environment with unknown effects for humans and the ecosystem. However, chemical substances outside the laboratory never occur isolated but mix with a background of naturally occurring molecules. This natural organic matter is one of the most complex mixture of chemical substances and found almost anywhere on the planet.
Here at the UFZ state-of-the art analytical tools (e.g. ultra-high resolution mass spectrometry) are used to characterize naturally occurring molecules with high accuracy and precision. Tens of thousands molecules can be detected in each sample and molecular formulas be calculated from precise mass measurements and known masses of atoms. Therefore, data sets produced by these instruments are large, structurally complex, and intrinsically connected by chemical rules and measurement parameters.
In a joint project of ScaDS and the department analytical chemistry we aim to devise a data management and evaluation pipeline facilitating the handling of these expansive data sets.
We offer multiple topics for master theses to build the underlying database architecture most likely based on Big Data Technologies (Apache Hadoop, Spark, Flink..), to implement data evaluation algorithms as well as for building novel visualizations of mass spectrometry data.
We promise close supervision by members of the Big Data Center ScaDS. In some cases we could offer student positions before or after the thesis to dive into the topic.
We are looking for Students with
- motivation to work in an interdisciplinary context
- good knowledge in data management
- good programming skills in (Java or Scala or Python)
- experience with Apache Hadoop/Spark/Flink is welcome, but not a requirement
- background in chemistry is not a requirement