Shape, Symmetries, and Structure: The Changing Role of Mathematics in Machine Learning Research
What is the Role of Mathematics in Modern Machine Learning? The past decade has witnessed a shift in how progress is made in machine learning. Research involving carefully designed and mathematically principled architectures result in only marginal improvements while compute-intensive and engineering-first efforts that scale to ever larger training sets and model parameter counts result in remarkable new capabilities unpredicted by existing theory. Mathematics and statistics, once the primary guides of machine learning research, now struggle to provide immediate insight into the latest breakthroughs. This is not the first time that empirical progress in machine learning has outpaced more theory-motivated approaches, yet the magnitude of recent advances has forced us to swallow the bitter pill of the “Bitter Lesson” yet again [1]. This shift has prompted speculation about mathematics’ diminished role in machine learning research moving forward. It is already evident that mathematics will have to share the stage with a broader range of perspectives (for instance, biology which has deep experience drawing conclusions about irreducibly complex systems or the social sciences as AI is integrated ever more deeply into society). The increasingly interdisciplinary nature of machine learning should be welcomed as a positive development by all researchers. However, we argue that mathematics remains as relevant as ever; its role is simply evolving. For example, whereas mathematics might once have primarily provided theoretical guarantees on model performance, it may soon be more commonly used for post-hoc explanations of empirical phenomena observed in model training and performance–a role analogous to one that it plays in physics. Similarly, while mathematical intuition might once have guided the design of handcrafted features or architectural details at a granular level, its use may shift to higher-level design choices such as matching architecture to underlying task structure or data symmetries. None of this is completely new. Mathematics has always served multiple purposes in machine learning. After all, the translation equivariant convolutional neural network, which exemplifies the idea of architecture matching data symmetries mentioned above is now over 40 years old. What’s changing are the kinds of problems where mathematics will have the greatest impact and the ways it will most commonly be applied. An intriguing consequence of the shift towards scale is that it has broadened the scope of the fields of mathematics applicable to machine learning. “Pure” mathematical domains such as topology, algebra, and geometry, are now joining the more traditionally applied fields of probability theory, analysis, and linear algebra. These pure fields have grown and developed over the last century to handle high levels of abstraction and complexity, helping mathematicians make discoveries about spaces, algebraic objects, and combinatorial processes that at first glance seem beyond human intuition. These capabilities promise to address many of the biggest challenges in modern deep learning. In this article we will explore several areas of current research that demonstrate the enduring ability of mathematics to guide the process of discovery and understanding in machine learning. Figure 1: Mathematics can illuminate the ways that ReLU-based neural networks shatter input space…
