Today, digitization is influencing all areas of life and producing vast amounts of data. Science and research are increasingly contributing to this flood of data by data-intensive experiments, complex simulations and interconnected sensor networks that store such data in digital archives. In addition, digitization, as a driver of data generation, increasingly changes business processes and influences many aspects of private lives. Many everyday devices and objects are now connected in the so-called Internet of Things and intelligent assistants are a constant companion on the smartphone or in the living room.
Efficient and intelligent handling of very large, often distributed and heterogeneous databases also increasingly determines economic and scientific competitiveness. Large volumes of data from social networks, multimedia collections, sensor networks, or scientific experiments and their analysis using innovative methods (such as Machine Learning) open up many new opportunities to generate value. In many cases, science and industry face unprecedented challenges commonly referred to as Big Data.
The resulting resource demand often exceeds the capabilities of previously used data acquisition, integration, analysis, and visualization techniques. Only when processed efficiently and intelligently can data be the driving force to gain knowledge through analysis.
Addressing data-driven challenges, which often vary from application to application, requires an intense and collaborative exchange between domain scientists and data analysis experts. This observation was the starting point for establishing a competence center for Big Data. Since October 2014, the project partners and a number of associated partners have been working together on various Big Data research and application development topics in the german competence center ScaDS Dresden/Leipzig, funded by the Federal Ministry of Education and Research (BMBF), FKZ. 01IS14014B. After successful 4 years of the first phase ScaDS Dresden/Leipzig was extended in October 2018 for the second phase of 3 years (BMBF FKZ 01IS18026B) with the goal of further expansion and long-term continuation.
At the two university locations Dresden and Leipzig, the national competence center investigates many different aspects of processing large data sets in science and industry. New methods and solutions are being developed to examine large and complex data sets from different application areas. Within the competence center, six application areas are integrated. They bring in their domain-specific requirements for the processing of large amounts of data, but also drive the computer science research methods for data-intensive applications (see figure). These applications include life sciences, medicine, materials and engineering, digital humanities, environment and earth sciences, chemistry, physics and business data.
Our research is based on the observation that Big Data solutions can only be developed by adapting the entire data lifecycle and by accessing modern data processing and computing architectures. First of all, data from different sources must be integrated with high quality. Data needs to be augmented and enriched and to gain new insights, further knowledge needs to be extracted. Finally, the user needs to explore the data through visual analysis methods to engage the domain scientist in the analysis process.
A key factor in the success of ScaDS Dresden/Leipzig is the establishment of a Service Center that bundles interdisciplinary research activities in addition to the domain and computer science research. The Service Center provides a central point of contact for research and industry and coordinates methodological as well as applied research at both locations. In recent years, numerous cooperations have been established with a variety of other scientific institutions and companies. The Service Center supports a wide range of application domains and also conveys the necessary Big Data competences in the context of training or university education. Often, existing and proven solutions, such as the efficient use of data integration techniques or the application of high-performance computing for large compute-intensive scenarios, can be transferred between application domains.