NEW PREPRINT. We argue that using the metaphor of lottery tickets to explain the success of overparameterization is inaccurate, we propose a new one: escape dimensions

Prelude

Given that position preprints can no longer be submitted to ArXiv, I temporarily post an abstract of our paper here.

If interested in the full paper, please contact me.

We wrote an opinion piece on how to explain the success of overparameterization with metaphors. We start with a commonly used one: lotteries and tickets. We realize that part of the community interprets this metaphor too literally, leading to wrong intuitions on the mechanisms of optimization in deep neural networks.

Based on results from loss landscape theory, we propose a new mental picture: Escape Dimensions. In short, escape dimensions are new dimensions of loss landscapes that are added when we make our networks wider. These new dimensions serve as escape routes for gradient descent, to avoid getting trapped into high-loss, bad, local minima.

We collect relevant theoretical and empirical results on loss landscapes under a new, intuitive lens. We name this framework: Escape Dimensions Theory.

Abstract

The lottery ticket hypothesis is often used as a didactical analogy to explain the success of overparameterized neural networks: “larger networks succeed because they more likely contain a well-initialized subnetwork that can learn the task in isolation, much like buying more tickets increases the chances of winning a lottery.” This explanation is intuitive but misleading: it suggests that subnetworks can be treated in isolation from the rest of the network. Following this reasoning leads to interpreting learning in wide networks as a multi-start optimization process, where gradient descent simply conducts a parallel search over subnetworks. We argue that this view is flawed since, among other reasons, winning tickets can be made to fail by perturbing the rest of the network. We put forward a more accurate intuitive picture for the success of overparameterized networks based on the geometry of the loss landscape: increasing width expands the set of available dimensions for optimization, making it easier to escape bad local minima. Moreover, as width grows, bad minima become increasingly rare relative to good minima, leading to a higher likelihood of convergence to good solutions. As the field grows more mature, it is important to refine the analogies we use to explain foundational phenomena, such as the apparent redundancy of large networks, reconciling practitioners' intuitions with modern theoretical insights.

Citation

If you want to reference part of this, please cite it as

BibTeX:

@article{martinelli2026lottery,
  title={on the lottery metaphor and escape dimensions},
  author={Martinelli, Flavio; Brea, Johanni; Gerstner, Wulfram},
  year={2026},
  month={April},
  url={https://flavio-martinelli.github.io/blog/2026/lottery/}
}



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Neural networks have minima at infinity. How do they look like?
  • ReLU Playground: how complex are the dynamics of one neuron learning another one?