Learning to Understand Small Medical Data for Non-Infectious Diseases
Despite the surge of services that collect data from which diagnostics and treatment design can benefit, medical data remain inherently small: the number of patient records is tiny in comparison to the size of the feature space, in which these patients are described. The static part of the feature space encompasses answers to questionnaires and medical assessments. The dynamic part encompasses recordings at the beginning and end of treatment, for some treatments also during the exposition. In this small data space, a straightforward medical question is „Which patient strata benefit more/less from treatment?“ The formalization of this question is less straightforward.
The talk begins with the first steps of the CRISP-DM circle, namely business and data understanding. The focus is on the challenges of specifying the target variable and of dealing with systematic missingness of values. We then discuss methods for learning on such data and on approaches for highlighting the variables that contribute to prediction. We close with the challenge of acquiring labels for supervised learning and with some advances from semi-supervised and active learning for label acquisition.
The results come from studies on tinnitus with depression as co-morbidity (clinical data) and on non-alcoholic fatty liver disease (population-based epidemiological data).
Myra Spiliopoulou is Professor of Business Information Systems at the Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany. Her main research is on mining dynamic complex data. Her publications are on mining complex streams, mining evolving objects, adapting models to drift and building models that capture drift. She focusses on two application areas: (a) business, including opinion stream mining and adaptive recommenders, and (b) medical research, including epidemiological mining and learning from clinical studies. In the application domain of medical research, she works on modeling and predicting evolution of study participants with and without the target outcome. In the area of medical research, she is currently involved in the CHRODIS+ (2017-2020) EU Joint Action on “Implementing good practices for chronic diseases” and in the UNITI (2020-2022) EU Project on “Unification of treatments and Interventions for Tinnitus patients”.
Her research on topic monitoring, social network monitoring and analysis of complex dynamic data has been published in renowned international conferences and journals. She is regularly presenting tutorials on different aspects of complex data mining, and recently on medical mining, including a tutorial on medical mining at KDD 2019. She is involved as (senior) reviewer in major conferences on data mining and knowledge discovery. In 2018, she was a PC Chair of the Applied Data Science Track in the ACM SIGKDD Int. Conf. on Knowledge Discovery from Data (KDD’2018), London, Aug. 2018. In 2016 and 2019, she served as a PC Chair of the IEEE Int. Symposium on Computer-Based Medical Systems (CBMS). In 2020, she serves as a Chair for Tutorials and Workshops at ECML PKDD 2020.
She is member of the Presidium of the European Association of Data Science (EuADS). In Germany, she is member of the Jury for the best PhD Award of the German Informatics Society. Since April 2016, she serves as Action Editor for the Data Mining and Knowledge Discovery Journal of Springer (DAMI). Since June 2020, she serves as Associate Editor for the Frontiers Journal of Ageing Neuroscience.
Back to the Summer School 2020 overview