ReLU Playground: how complex are the dynamics of one neuron learning another one?

⚠️This page is under construction⚠️

But in the meantime you can play with the relus by moving them around with the mouse, or type in some values, or move points in the phase-spaces. Press play to start learning!

Can you find which initialisations can converge to the teacher? How do other solutions look like? By looking at the different phase spaces, can you get an intuitive understanding of the learning dynamics?

DETAILS:

The student network is parameterized with 4 variables: f(x) = a ReLU(wx + b) + c
We can transform the parameterization to make it phenomenologically intuitive, from 4 continuous variables to 3 continuous and 1 discrete. The kink of the relu is k=-b/w, while the slope is computed as m=a|w| and the direction of the relu is simply s=sign(w). c remains unchanged. The grey boxes show relu dynamics in different slices of the parameter space.
The teacher network is defined phenomenologically as t(x) = m relu(s(x-k)) + c.
The 1D input data is densely sampled with a uniform distribution (std=1). There are no datapoints outside the grey area of the "INPUT-OUTPUT SPACE". This creates some interesting and under-explored boundary effects.
Training is done through standard full-batch gradient descent.