
27. March 2025
AI-generated code is a great help for many researchers, but it also carries risks in terms of reliability, traceability and reproducibility. The git-bob platform is designed to provide data scientists with low-threshold support for collaboratively generating data analysis code together with their research team and AI.
Anyone who has ever had to write large amounts of code for a research project has probably felt it: the lure of AI-generated code. Just a quick prompt and within moments the Gen-AI model of your choice spits out a few paragraphs of computer code – often not without success. AI-generated code looks good and delivers results. However, often only an expert can immediately see if the results are correct and if the code is operating correctly.
Depending on how the prompt is phrased, AI models may struggle to fully understand the requirements of a project, leading to inappropriate or inefficient code suggestions. In some data analysis projects, AI has become so ingrained that it is no longer possible to tell exactly which parts of the code were written by humans and which parts were written by AI. This compromises the principles of good scientific practice.
Nevertheless, AI-assisted programming offers enormous potential, especially for data analysis, according to Dr. Robert Haase of ScaDS.AI Dresden/Leipzig. To make this potential more accessible to science, he has developed the git-bob platform – an online tool that allows research teams to program together with AI in a comprehensible and transparent way, in line with good scientific practice.
Data analysis in research projects often requires a significant amount of programming to adequately analyze a wide range of data sets. In this process, data scientists repeatedly program very similar code. In theory, when this code is generated by AI, it can save a lot of time. However, if in practice, faulty code skews analysis and requires time-consuming debugging, any time advantage is lost. To ensure that the code is of high quality and correctly interprets the data, Robert Haase based the platform’s code generation process on the way data scientists typically work together – in collaborative feedback loops.
With git-bob, multiple researchers can collaborate on a project with different AI systems and write code. During this process, the assistant transparently documents which parts of the code were written by humans and which by AI. This way, people can help each other directly, provide feedback, and have the generated code reviewed by multiple sets of eyes. Also, researchers no longer have to share tips for effective prompt strategies informally in passing; instead, they document them as part of the feedback loop
Working with git-bob typically starts with researchers creating a GitHub issue describing their problem or request. A member of the relevant repository can then activate git-bob with commands such as “git-bob comment on this” or “git-bob try to program this”. The AI assistant then responds with suggested solutions: both code snippets and data visualizations. After the initial output, the user and git-bob refine the result through an iterative exchange. The platform can draw on a number of large language models, including Anthropic’s Claude, OpenAI’s chatGPT, and Google’s Gemini. If desired, git-bob can also submit the final solution as a pull request for researchers to review and approve before implementation. Alternatively, it can also comment on pull requests from humans and incorporate corrections into the existing code. However, all changes must still be approved by a human and incorporated into the code base.
In addition, git-bob is designed to be as accessible as possible. The platform uses the already ubiquitous services GitHub and GitLab and thus works online without the need for downloads or installations. What’s more, the platform is open-source. This means that users can also install it on their own GitLab servers, set up their own language model servers and then use git-bob in a completely private environment.
At the moment, the git-bob user interface is still very much geared towards the familiar working environments of data scientists and software developers. Robert Haase is now planning to make the platform usable for bio- and georesearch in a next step.
The publication introducing git-bob is published in Nature Computational Science. The paper can be found at this link: https://rdcu.be/efkna
If you are interested in the platform and how it works, you can try it out here: https://github.com/haesleinhuepf/git-bob