27. March 2025

Git-Bob: Transparent and comprehensible AI support for code development in data science

ScaDS.AI Dresden/Leipzig

AI-generated code is a great help for many researchers, but it also carries risks in terms of reliability, traceability and reproducibility. The git-bob platform is designed to provide data scientists with low-threshold support for collaboratively generating data analysis code together with their research team and AI.

Anyone who has ever had to write large amounts of code for a research project has probably felt it: the lure of AI-generated code. Just a quick prompt and within moments the Gen-AI model of your choice spits out a few paragraphs of computer code – often not without success. AI-generated code looks good and delivers results. However, often only an expert can immediately see if the results are correct and if the code is operating correctly.

Depending on how the prompt is phrased, AI models may struggle to fully understand the requirements of a project, leading to inappropriate or inefficient code suggestions. In some data analysis projects, AI has become so ingrained that it is no longer possible to tell exactly which parts of the code were written by humans and which parts were written by AI. This compromises the principles of good scientific practice.

Nevertheless, AI-assisted programming offers enormous potential, especially for data analysis, according to Dr. Robert Haase of ScaDS.AI Dresden/Leipzig. To make this potential more accessible to science, he has developed the git-bob platform – an online tool that allows research teams to program together with AI in a comprehensible and transparent way, in line with good scientific practice.

Coding for Data Analysis with the AI-Assistant git-bob

Data analysis in research projects often requires a significant amount of programming to adequately analyze a wide range of data sets. In this process, data scientists repeatedly program very similar code. In theory, when this code is generated by AI, it can save a lot of time. However, if in practice, faulty code skews analysis and requires time-consuming debugging, any time advantage is lost. To ensure that the code is of high quality and correctly interprets the data, Robert Haase based the platform’s code generation process on the way data scientists typically work together – in collaborative feedback loops.

Illustration of how the assistant operates. git-bob works with three different AI models and generates code and data visualizations in interaction with the researchers.
(Source: https://zenodo.org/records/14019030; CC-BY 4.0)

With git-bob, multiple researchers can collaborate on a project with different AI systems and write code. During this process, the assistant transparently documents which parts of the code were written by humans and which by AI. This way, people can help each other directly, provide feedback, and have the generated code reviewed by multiple sets of eyes. Also, researchers no longer have to share tips for effective prompt strategies informally in passing; instead, they document them as part of the feedback loop

Working with git-bob

Working with git-bob typically starts with researchers creating a GitHub issue describing their problem or request. A member of the relevant repository can then activate git-bob with commands such as “git-bob comment on this” or “git-bob try to program this”. The AI assistant then responds with suggested solutions: both code snippets and data visualizations. After the initial output, the user and git-bob refine the result through an iterative exchange. The platform can draw on a number of large language models, including Anthropic’s Claude, OpenAI’s chatGPT, and Google’s Gemini. If desired, git-bob can also submit the final solution as a pull request for researchers to review and approve before implementation. Alternatively, it can also comment on pull requests from humans and incorporate corrections into the existing code. However, all changes must still be approved by a human and incorporated into the code base.

Screenshot. Prompt interaction with git-bob. — Interaction with *git-bob*. All changes and requests are openly visible to all team members and easily attributable to a human or an AI-Agent.
(Source: https://github.com/haesleinhuepf/git-bob-playground/issues/241)

In addition, git-bob is designed to be as accessible as possible. The platform uses the already ubiquitous services GitHub and GitLab and thus works online without the need for downloads or installations. What’s more, the platform is open-source. This means that users can also install it on their own GitLab servers, set up their own language model servers and then use git-bob in a completely private environment.

At the moment, the git-bob user interface is still very much geared towards the familiar working environments of data scientists and software developers. Robert Haase is now planning to make the platform usable for bio- and georesearch in a next step.

Try git-bob yourself

The publication introducing git-bob is published in Nature Computational Science. The paper can be found at this link: https://rdcu.be/efkna

If you are interested in the platform and how it works, you can try it out here: https://github.com/haesleinhuepf/git-bob

Previous Entry Back to Overview Next Entry

ACL 2025 in Vienna, Austria

ScaDS.AI Dresden/Leipzig

From July 27 to August 1 the 63rd Annual Meeting of the Association for Computational […]

The Secret Network of Extreme Floods

Research

Our understanding of flood risk has traditionally centered on single causes – be it intense […]

ScaDS.AI Dresden/Leipzig at ICML 2025 in Vancouver, Canada

Research

From July 13 to 19, the International Conference on Machine Learning (ICML) took place in […]

Meetup Recap: AI MEETS BIOTECH – The ScaDS.AI Startup AI-DT introduces the “Co-Scientist”

Events, Living Lab, Transfer and Service

On July 15, 2025, the ScaDS.AI Dresden/Leipzig Meetup brought together researchers, founders, students and companies […]

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.

Dresden

Visitor address Technische Universität Dresden
ScaDS.AI Dresden/Leipzig
Bürogebäude Strehlener Straße
Strehlener Straße 12, 14
01069 Dresden

Postal address Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen
ScaDS.AI Dresden/Leipzig
01062 Dresden

Leipzig

Visitor address ScaDS.AI Dresden/Leipzig
Löhrs Carré
Humboldtstraße 25, Uferstr. 11
04105 Leipzig

Postal address Universität Leipzig
Data Science Zentrum
Internes Postfach: 212104
04081 Leipzig

Quicklinks:

Accessibility

Imprint

Privacy

About us

Research

Education

Transfer and Service

Living Lab

Git-Bob: Transparent and comprehensible AI support for code development in data science

Coding for Data Analysis with the AI-Assistant git-bob

Working with git-bob

Try git-bob yourself

ACL 2025 in Vienna, Austria

The Secret Network of Extreme Floods

ScaDS.AI Dresden/Leipzig at ICML 2025 in Vancouver, Canada

Meetup Recap: AI MEETS BIOTECH – The ScaDS.AI Startup AI-DT introduces the “Co-Scientist”

Dresden

Leipzig

Quicklinks:

Accessibility

Imprint

Privacy