Artificial Intelligence

Artificial Neural Networks: advanced topics

Published on

Authors: Filippo Gatti

This chapter is a follow-up of the basic introduction provided in the chapter Artificial Neural Networks: layer architectures, optimizers and automatic differentiation, which prepares the readership to the following theoretical insights. It was presented at the 2023 ALERT Geomaterials doctoral school, in Aussois, in September 2023. In particular, the chapter rephrases the machine learning problem according to an information theory paradigm, that highlights the deep entanglement between the data science perspectives: the probabilistic and the deterministic one. Moreover, the chapter describes the fundamental theoretical result that paved the way to modern machine learning: the universal approximation theorem for a 1-hidden-layer perceptron. This section is followed by a continuum mechanics interpretation of convolutional neural networks, proving why convolutional layers are fundamental in image classification. Finally, further insights on the optimization of a neural network are provided, focusing on the convergence of first-order gradient descent methods. Finally, the automatic differentiation is explained in analogy with tensor algebra, along with some advanced strategies to avoid vanishing gradients in back-propagation algorithms. Some subsections are tagged as [RECAP], since they are meant to refresh the readership’s basics on optimization and signal processing fundamentals. The chapter is largely inspired, among others, by Stéphane Mallat’s Data Science lecture notes at Collège de France, as well as by different lecture notes of CentraleSupélec’s engineering curriculum.