Home // Research // Architectures / Scalability / Security // Projects // NetPU

Contact

Prof. Dr. Akash Kumar

Chair of Processor Design

TUD Dresden University of Technology

akash.kumar@tu-dresden.de

NetPU

Title: NetPU: Generic Runtime-Reconfigurable Quantized Hardware Accelerator Architecture for Neural Network Inference

Duration: 01/2023 – 12/2025

Research Area: Architectures / Scalability / Security

Growing size of the trained network models in recent research encouraged the design of quantized inference hardware accelerators on edge. Previous works widely explored two architectures: 1). Processing Element Array (PEA) architecture offers a generic inference for different networks with a complex runtime environment. 2). Heterogeneous Streaming Dataflow (HSD) architecture implements customized hardware accelerators for the given trained models with simplified runtime environment control. We explored the design of a hybrid architecture, NetPU, between PEA and HSD architectures to support the runtime-reconfigurable mixed-precision quantized generic network inference. This architecture implements the inference control as hardware, simplifying the runtime environment as data streaming transmission. Moreover, based on the runtime-reconfigurable multi-precision multi-channel multiplier, NetPU can improve the parallel computing performance of low-precision (<8-bit) quantized networks.

Aims

NetPU architecture aims to support the generic inference for different mixed-precision quantized network models and their emerging hybrid variants, including MLP, CNN, Transformer, Hybrid ANN-SNN, etc. Based on the reconfigurable neuron processing unit and loop-structured network processing unit, NetPU architecture can support different kinds and sizes of network inference by steaming the configuration data to reset the accelerator function in runtime without re-generation of hardware design.

Problem

Considering the limited hardware resources for neural network accelerator design on edge, NetPU architecture must trade-off between the inference latency, hardware resource consumption, and generic support for different network models. The major question behind this project is how to achieve high throughput hardware designs by applying parallel, pipelinized, and systolic array-based technologies for mixed-precision, quantized runtime reconfiguration for generic multi-model network inference.

Technology

NetPU architecture is implemented as a pure Verilog project and is evaluated in the current low-energy Ultra96-V2 FPGA (Xilinx Zynq UltraScale+ MPSoC ZU3EG A484) platform. Based on the state-of-the-art research about the Binarized and Quantized Neural Network modeling and accelerator design, we explored the multi-precision operator, reusable loop structure, and generic non-linear activation module design in this project.

Outlook

NetPU aims to create a generic reconfigurable accelerator architecture for different networks by simply streaming the configuration data to reset the function and behavior of hardware without re-implementation for different models. Moreover, considering the in-built hardware controller, all inference progress and reconfiguration operations are scheduled by data streaming. Therefore, our NetPU architecture has the potential to be organized for cluster acceleration. We will also explore merging the current FPGA instance as the ASIC design. Furthermore, we also plan to test NetPU architecture in potential scenarios, such as bio-image processing.

Publications

Y. Liu, S. Rai, S. Ullah and A. Kumar, “High-Flexibility Designs of Quantized Runtime Reconfigurable Multi-Precision Multipliers,” in IEEE Embedded Systems Letters, vol. 15, no. 4, pp. 194-197, Dec. 2023, doi: 10.1109/LES.2023.3298736.
Y. Liu, S. Rai, S. Ullah and A. Kumar, “NetPU-M: a Generic Reconfigurable Neural Network Accelerator Architecture for MLPs,” 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, FL, USA, 2023, pp. 85-92, doi: 10.1109/IPDPSW59300.2023.00026.
Y. Liu, S. Rai, S. Ullah and A. Kumar, “NetPU: Prototyping a Generic Reconfigurable Neural Network Accelerator Architecture,” 2022 International Conference on Field-Programmable Technology (ICFPT), Hong Kong, 2022, pp. 1-1, doi: 10.1109/ICFPT56656.2022.9974206.

Team

Lead

Prof. Dr. Akash Kumar

Team Members

Yuhao Liu

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.