ScaDS.AI Dresden/Leipzig announces and welcomes you to join its public colloquium session on Monday, September 29, 2025 at 3:00 pm CEST. The colloquium takes place at seminar room “Zwenkauer See” at ScaDS.AI Dresden/Leipzig (details below) and parallel online (link to Zoom session).
Hierarchical clustering offers a natural, interpretable view of data structure across multiple scales, yet existing methods falter on high-dimensional data without clear density gaps between clusters. We introduce t-NEB [*], a probabilistically grounded hierarchical algorithm that overcomes these limitations. t-NEB first over-clusters the data using a parametric density model and then builds a hierarchy by joining these micro-clusters based on a “raising water level” approach on the density landscape: micro-clusters separated by low-density regions are split first, while micro-clusters within dense regions are separated last. The resulting hierarchy is both natural and meaningful and can be inspected for exploratory data analysis.
A complementary challenge is the evaluation of clusterings, especially when algorithms like DBSCAN or HDBSCAN [**] assign points to noise. Existing internal validation indices ignore the quality of these noise assignments, despite their critical impact on downstream tasks. To fill this gap we propose DISCO (Density-based Internal Score for Clusterings with nOise), the first clustering validation index (CVI) that explicitly rewards correctly identified noise and penalizes mis-labeled points. Built on a density-aware adaptation of the Silhouette Coefficient, DISCO handles arbitrary cluster shapes and noise labels. Its pointwise formulation enables both a single scalar quality score and fine-grained, explainable diagnostics.
Together, t-NEB and DISCO solve major challenges in the field of clustering by providing a principled hierarchical clustering algorithm and a robust way to evaluate the quality of a clustering even when noise labels are present. Future challenges still include a proper evaluation of the hierarchy produced by e.g. t-NEB, especially for clusters of arbitrary shape without ground-truth labels.
[* t-NEB refers to Student’s t-distribution and nudged elastic band (NEB) optimization]
[** [H]DBSCAN stands for [Hierarchical] Density-Based Spatial Clustering of Application with Noise]
Martin Ritzert works on both graph machine learning and data science questions surrounding clustering. He recently worked on clustering-improving settings where clusters are allowed to have arbitrary shape.
His main work on graph machine learning includes expressivity results such as the equivalence between GNNs and the 1-dimensional Weisfeiler-Leman algorithm as well as foundational research in model development and experimental rigor for graph machine learning. He also worked on generalization bounds for boosting algorithms and is generally interested in the combination of algorithm research and techniques from (graph) machine learning.
After obtaining a PhD at RWTH Aachen in 2021, Martin Ritzert worked as PostDoc at Aarhus University (2021-22) before he went to Göttingen in 2022..
ScaDS.AI Dresden/Leipzig
Löhrs Carré, Humboldtstrasse 25, 04105 Leipzig
3rd floor, large seminar room (A 03.07 “Zwenkauer See”)