JavaScript is required to use this site. Please enable JavaScript in your browser settings.
Decorative Header Image

AMPL

Title: A​utomatic ​M​eta Data ​P​rofiling and ​L​ineage for Integrating Heterogeneous Data Sources (AMPL)

Project duration: 01/2021 – 12/2023

Research Area: Data Quality

Efficiently managing and merging many heterogeneous, dynamic data sources has become a critical success factor for financial institutions. However, with increasing heterogeneity and dynamic data, it is becoming increasingly difficult to keep track of historically collected and exponentially growing data pots. This has already led to significant macroeconomic damage, including the global financial crisis of 2007 and 2008. The scale of which could have been contained with real-time transparency and thus a better overview of risk and metadata. Unfortunately, there is currently no solution for financial institutions that allows flexible integration of heterogeneous data sources while providing intuitive metadata preparation. AMPL aims to develop a new tool for structuring, analyzing, and exploring large volumes of heterogeneous, dynamic data sources. For this purpose, the tool computes comprehensive data profiles consisting of statistics, correlations, and complex provenance information (lineage).

Aims

By breaking down existing silos and merging innovative technologies with the requirements of market participants, AMPL thus allows to completely rethink data and metadata management.

Technology

Machine learning assisted methods help in schema mapping (schema matching, ontology matching) between data sources as well as new methods for scalable and incremental computation of data profiles. These will be developed based on current preliminary work of the project partners and recent research results in graph analysis, SQL-based data integration and incremental record linkage (entity resolution) on dynamic and heterogeneous data sources. The data profiles are then presented in a novel web-based visual front-end that greatly simplifies data interaction and exploration.

Team

Lead

Photo from Prof. Dr. Erhard Rahm

Prof. Dr. Erhard Rahm

Leipzig University

Department of Computer Science, Database Group, Chair of Databases

Photo from Matthias Täschner

Matthias Täschner

Leipzig University

Team Member

Photo from Michal Miazga

Michal Miazga

Leipzig University

  • Daniel Abitz

Partner

funded by:
Gefördert vom Bundesministerium für Bildung und Forschung.
Gefördert vom Freistaat Sachsen.