Apache SystemML: Declarative Machine Learning for Low-Latency to Large-Scale Deployments
Declarative machine learning (ML) aims to simplify the development and usage of large-scale ML algorithms. In SystemML, data scientists specify ML algorithms in a high-level language with R-like syntax and the system automatically generates hybrid execution plans that combine single-node, in-memory operations and distributed operations on Spark. In a first part, we motivate declarative ML, provide an up-to-date overview of SystemML, its compiler and runtime, as well as APIs for different deployments, including low-latency scoring. In a second part, we then discuss advanced features including (1) deep learning support and SystemML’s GPU backend, (2) compressed linear algebra, and (3) automatic operator fusion via code generation. Overall, SystemML provides a unified system for a variety of ML algorithms, dense and sparse data representations, as well as local and distributed operations.
Matthias Boehm is a Research Staff Member at IBM Research – Almaden, in San José, California, where he is working since 2012 on optimization and runtime techniques for declarative, large-scale machine learning in SystemML. Since Apache SystemML’s open source release in 2015, he also serves as a PMC member. He received his Ph.D. from TU Dresden in 2011 with a dissertation on cost-based optimization of integration flows under the supervision of Wolfgang Lehner. His previous research also includes systems support for time series forecasting as well as in-memory indexing and query processing. Matthias is a recipient of the 2016 VLDB Best Paper Award and a 2016 SIGMOD Research Highlight Award.