Title: Plattform Datafusion Generator (DaFne)
Project duration: 01.02.22-29.02.24 (BMBF)
Research Area: Generative AI, Smart Cities
The DaFne project aims to develop a platform dedicated to synthetic data generation. In today’s landscape, as AI methodologies gain prominence, their reliance on data-driven approaches underscores the critical importance of data availability. However, in numerous domains, data contains sensitive information that cannot be made public. Our mission is to provide researchers, industries, developers, and governmental bodies with high-quality synthetic data.
To achieve this goal, we integrate generative models and provide the quality analysis of synthetic data. DaFne primarily targets the smart city use case, encompassing both tabular and unstructured data formats. The platform will be accessible free of charge and designed with user-friendliness as a core principle.
Data generation presents a complex challenge due to its diverse nature, which can range from structured tabular data to unstructured formats. Complicating matters further, data is often sourced from multiple origins, each with its own unique structure and varying levels of quality. Thus, the DaFne platform must be versatile enough to accommodate this diversity.
In addition to facilitating the generation of data across different formats and sources, it is crucial to evaluate the quality of the generated data. To address this need, our platform incorporates robust evaluation mechanisms that provide users with quality metrics. These metrics are tailored to the specific type of data being generated, acknowledging that the assessment criteria may differ between tabular and unstructured data. Moreover, not all users possess a deep statistical background, so the provided quality metrics are easily interpretable.
Within the framework of smart cities, the lack of sufficient data is a challenge for urban designers. Our specific use case aims to replicate and analyse citizens’ mobility patterns within urban environments. Employing reinforcement learning techniques, we train an agent to optimize its behavior with the goal of maximizing overall happiness. Our study focuses on the Hafen City area in Hamburg, Germany, where we illustrate the efficacy of our approach.
In another research direction, we address the maintenance of bridges, specifically focusing on predicting the need for reconstruction. Leveraging the rich dataset of the National Bridge Inventory (NBI), USA, which encompasses inspections of more than 80,000 bridges spanning 50 years, we aim to develop robust predictive models. However, such comprehensive data is not readily available from the German government. The insights gleaned from the available USA data can be adapted to German bridges or other countries to overcome this challenge, ensuring relevance and applicability.
Advanced AI techniques are used to generate synthetic data. Generative Adversarial Networks (GANs) are suitable for the generation of tabular data generation, Graph Neural Networks (GNNs) for neighborhood patterns recreation, and a Reinforcement Learning approach for creating synthetic pedestrian paths in the context of smart cities. Synthetic data is evaluated using various statistical methods (descriptive analysis, significance, and goodness-of-fit tests, Kullback–Leibler divergence, etc.) and ML classification methods (random forest, support vector machine, neural network, among others). Python and Python frameworks were used for generation and evaluation.
The DaFne is a software platform. Users can generate data according to their needs, facilitating data-driven decision-making in research or industrial contexts. The tool’s capabilities can be helpful in urban planning sciences, finance, medicine, and other scientific and non-scientific settings.
Center for Interdisciplinary Digital Sciences (CIDS)
Center for Interdisciplinary Digital Sciences (CIDS)