Can we identify the weights of a neural network by probing its input-output mapping? At first glance, this problem seems to have many solutions because of permutation, overparameterisation and activation function symmetries. Yet, we show that the incoming weight vector of each neuron is identifiable up to sign or scaling, depending on the activation function. Our novel method ’Expand-and-Cluster’ can identify layer sizes and weights of a target network for all commonly used activation functions. Expand-and-Cluster consists of two phases: (i) to relax the non-convex optimisation problem, we train multiple overparameterised student networks to best imitate the target function; (ii) to reverse engineer the target network’s weights, we employ an ad-hoc clustering procedure that reveals the learnt weight vectors shared between students – these correspond to the target weight vectors. We demonstrate successful weights and size recovery of trained shallow and deep networks with less than 10% overhead in the layer size and describe an ’ease-of-identifiability’ axis by analysing 150 synthetic problems of variable difficulty.
Advancements in memristive devices have given rise to a new generation of specialized hardware for bio-inspired computing. However, the majority of these implementations only draw partial inspiration from the architecture and functionalities of the mammalian brain. Moreover, the use of memristive hardware is typically restricted to specific elements within the learning algorithm, leaving computationally expensive operations to be executed in software. Here, we demonstrate actor-critic temporal difference (TD) learning on analogue memristors, mirroring the principles of reward-based learning in a neural network architecture similar to the one found in biology. Within the learning algorithm, memristors are used as multi-purpose elements: They act as synaptic weights that are trained online, they calculate the weight updates directly in hardware, and they compute the actions for navigating through the environment. Thanks to these features, weight training can take place entirely in-memory, eliminating the need for data movement and enhancing processing speed. Also, our proposed learning scheme possesses self-correction capabilities that effectively counteract noise during the weight update process, which makes it a promising alternative to traditional error mitigation schemes. We test our framework on two classic navigation tasks - the T-maze and the Morris water-maze - using analogue memristors based on the valence change memory (VCM) effect. Our approach represents a first step towards fully in-memory, online, and error-resilient neuromorphic computing engines based on bio-inspired learning schemes.
MLPGradientFlow is a software package to solve numerically the gradient flow differential equation θ ̇ = −∇L(θ; D), where θ are the parameters of a multi-layer perceptron, D is some data set, and ∇L is the gradient of a loss function. We show numerically that adaptive first- or higher-order integration methods based on Runge-Kutta schemes have better accuracy and convergence speed than gradient descent with the Adam optimizer. However, we find Newton’s method and approximations like BFGS preferable to find fixed points (local and global minima of L) efficiently and accurately. For small networks and data sets, gradients are usually computed faster than in pytorch and Hessian are computed at least 5× faster. Additionally, the package features an integrator for a teacher-student setup with bias-free, two-layer networks trained with standard Gaussian input in the limit of infinite data. The code is accessible at https://github.com/jbrea/MLPGradientFlow.jl.
Recent advances in Voice Activity Detection (VAD) are driven by artificial and Recurrent Neural Networks (RNNs), however, using a VAD system in battery-operated devices requires further power efficiency. This can be achieved by neuromorphic hardware, which enables Spiking Neural Networks (SNNs) to perform inference at very low energy consumption. Spiking networks are characterized by their ability to process information efficiently, in a sparse cascade of binary events in time called spikes. However, a big performance gap separates artificial from spiking networks, mostly due to a lack of powerful SNN training algorithms. To overcome this problem we exploit an SNN model that can be recast into a recurrent network and trained with known deep learning techniques. We describe a training procedure that achieves low spiking activity and apply pruning algorithms to remove up to 85% of the network connections with no performance loss. The model competes with state-of-the-art performance at a fraction of the power consumption comparing to other methods.
Advances of deep learning for Artificial Neural Networks (ANNs) have led to significant improvements in the performance of digital signal processing systems implemented on digital chips. Although recent progress in low-power chips is remarkable, neuromorphic chips that run Spiking Neural Networks (SNNs) based applications offer an even lower power consumption, as a consequence of the ensuing sparse spikebased coding scheme. In this work, we develop a SNN-based Voice Activity Detection (VAD) system that belongs to the building blocks of any audio and speech processing system. We propose to use the bin encoding, a novel method to convert log mel filterbank bins of single-time frames into spike patterns. We integrate the proposed scheme in a bilayer spiking architecture which was evaluated on the QUT-NOISE-TIMIT corpus. Our approach shows that SNNs enable an ultra low-0power implementation of a VAD classifier that consumes only 3.8 μW, while achieving state-of-the-art performance.