ScaDS.AI - Center for scalable data analytics and artificial intelligence

Language is often viewed as the pinnacle of (human) intelligence. The seamlessness by which machines can be integrated with society depends on their understanding and mastery of language. Our research thus covers domain-specific large-scale language modeling, text manipulation algorithms, argumentation, and causal language, studied specifically in the context of conversational AI and connecting knowledge extraction and graphs with goal-driven dialogs, as well as in the context of mining the scientific literature.

Modeling and manipulating language

Our research in natural language processing and information retrieval focuses on algorithms and models. The overarching challenge is advancing language understanding and manipulation.

Key tasks

modeling (language representation in a computer),
paraphrasing (conveying the message of a given text using different words),
summarization (conveying the key message of a given text with fewer words),
argumentation (persuading readers of a claim or a conclusion),
conversation,
and causal knowledge.

Artificial Intelligence technologies with the help of increasingly large language resources from web archives fuel the generalization capabilities of these models.

Goals

Building domain-specific language models for writing assistance and problem solving, focusing on latent variables in language models.
Paraphrasing at paragraph level; we expect to gain insights from summarizing long texts.
Summarization research in new domains, such as social media.
Constrained paraphrasing and summarization; constraints include language simplicity, writing style, and domain-specific requirements.
Integrating computational argumentation and conversational technologies.
Causal knowledge acquisition from text for advanced AI reasoning.
Bias analytics in all of the above; focus on minority protection.

Overview of SUMMARY EXPLORER.

(1) corpus selection, (2) model selection, and (3) quality aspect assessment.

Conversational AI and knowledge extraction

Our research on conversational AI brings together knowledge graphs, natural language understanding, and deep learning. Goals include:

Fast domain adaptation techniques in goal-driven dialogs for context-sensitive, coherent, and correct responses.
Code synthesis for data analytics, with focus on foundational paradigm shifts (e.g., transformer-based encoding of tree structures).
Conversational search for exploring research, e.g., recent COVID-19 related research; building on our expertise in question answering.
Explainability approaches based on graph representations.

Mining the scientific literature

Motivated by several successful examples, we will pursue biomedical text mining.

This includes tailored information extraction and language modeling. We will also present facts and results in an argumentative frame-work for support and explanation purposes.

Emati: A recommender system for biomedical literature based on supervised learning .

Emati scans and classifies scientific literature according to a user profile, which is dynamically updated.
Emati was evaluated using a Bayes classifier and BERT language model. The latter leads to dramatic improvements.
Using AI to extract facts from literature is vital for next generation semantic search.