Artificial Intelligence
Artificial Neural Networks: layer architectures, optimizers and automatic differentiation
Published on
This chapter is meant to provide a basic yet solid understanding of artificial neural networks, to a heterogeneous readership. In particular, the chapter presents three major types of neural networks, namely: feed-forward multi layer perceptrons (MLP ), convolutional neural networks (CN N ) and recurrent neural networks (RNN). It was presented at the 2023 ALERT Geomaterials doctoral school, in Aussois, in September 2023. The chapter describes how deep is the relationship between each of these neural networks and the specific data science task they aim at tackling: from regression to classification, from images to time-series, with practical tutorials on real datasets and mechanics-inspired examples. The neural network optimization is described in its general statistical framework, focusing on the most popular algorithms tailored to efficiently “train” such neural metamodels, through the so called “back-propagation”. The chapter explains how in practice the back-propagation occurs, by automatizing the derivative chain rule and by efficiently exploiting the computational graph constructed to predict. The chapter’s last sections provide practical recipes on how to effi- ciently optimize the learning algorithms, by duly initializing the neural network parameters, by adopting design strategies that avoid vanishing or exploding gradients and by pursuing deeper architectures to achieve data disentanglement, parsimony and generalizability. The chapter is featured by several practical examples, with corresponding code snippets, in order to practice the theoretical aspects presented in the main text. For expert readers, this chapter serves as a recap. We defer to the chapter Artificial Neural Networks: advanced topics, for further more technical and theoretical insights on the vast world of artificial neural networks. The chapter is largely inspired, among others, by Stéphane Mallat’s Data Science lecture notes at Collège de France, as well as by different lecture notes of CentraleSupélec’s engineering curriculum.