Prof. Guido Montúfar

At the 10th International Summer School on AI and Big Data, Prof. Guido Montúfar will talk about Biases of gradient optimization in neural networks.

Talk: Biases of gradient optimization in neural networks

This lecture discusses some of the mechanisms biasing the optimization of overparametrized neural networks towards simpler solutions. In particular, we take a look at parameter optimization in ReLU networks and show gradient descent training is biased towards smooth solution functions. The specific form of this bias depends on the network architecture and activation function as well as on the particular form of the gradient optimization procedure and initialization.

We then consider quantitative bounds measuring the difference in function space between the trajectory of a finite-width network trained on finitely many samples from an idealized kernel dynamics. These results show that the network is biased to learn the top eigenfunctions of a neural tangent kernel integral operator not just on the training set but over the entire input space. The discussion is based on works with Hui Jin and Benjamin Bowman.

Portrait of Prof. Guido Montúfar

Prof. Guido Montúfar

Associate Professor, Department of Mathematics and Department of Statistics & Data Science

UCLA – University of California, Los Angeles


Prof. Guido Montúfar is an Associate Professor of Mathematics and Statistics & Data Science at UCLA. He is also the leader of the Math Machine Learning Group at the Max Planck Institute for Mathematics in the Sciences. His research focuses on the mathematical foundations of machine learning and specifically deep learning theory. He studied mathematics and theoretical physics at TU Berlin and obtained the Dr.rer.nat. in Mathematics in 2012 as an IMPRS fellow in Leipzig. His work has been recognized with awards from the ERC, DFG, NSF. Guido Montufar is a 2022 Alfred P. Sloan Research Fellow.

Read more about the 10th International Summer School on AI and Big Data.