Title: System and Compiler Design for Emerging CNM/CIM Architectures
Duration: 08/2022 – today
Research Area: Architectures / Scalability / Security
Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Computing-near-memory (CNM) and computing-in-memory (CIM) represent a paradigm shift in computing efficiency. Unlike traditional systems, where processing units and main memory are loosely connected by buses, leading to energy-intensive data movement, CNM/CIM systems enable computations close to where the data resides. Over the past decade, research in this area has surged, driven by the escalating demands of modern applications, as clearly seen in the increasing data and processing volume required by ML-based solutions. Although CNM/CIM systems deliver unprecedented performance and energy efficiency in the AI domain, they are only accessible to hardware experts, preventing broader adoption.
Our goal is to enable portability of AI and Big Data applications across existing CNM/CIM systems and novel accelerator designs, prioritizing performance, accuracy, and energy efficiency.
Despite advancements in emerging memory and integration technologies, current programming models are often technology-specific and low-level. Given the substantial differences compared to conventional machines, new compiler abstractions and frameworks are crucial to fully exploit the potential of CIM by providing automatic device-aware and device-agnostic optimizations and facilitate widespread adoption.
We have developed reusable abstractions and showcased compilation flows tailored for CIM systems with memristive crossbar arrays (MCAs), content-addressable memories (CAMs), CIM-logic modules, and for CNM systems like UPMEM. Additional information about them can be found in the referenced publications below.
Our MLIR-based compiler can abstract computational primitives of memory devices, and their programming models and optimize data flow while considering hardware non-idealities. As input, this flow takes codes written in high-level (domain-specific) languages such as C, PyTorch and TensorFlow.
The hierarchal flow enables device-agnostic/aware analyses and transformations, allowing us to map computational patterns, like dot-products and similarity search, to the most suitable hardware target, such as CAMs and MCAs.
The versatility of CNM/CIM systems has proven attractive for a wide range of applications, especially deep learning tasks like inference and training, benefiting from orders of magnitude improvement in performance and energy efficiency. Nonetheless, advancing automation, especially in heterogeneous systems, requires further cross-layer collaboration and development to enable the mapping of more complex ML models.