I've been reading up on the Lottery Ticket Hypothesis, which is super interesting.
Basically, the observation is that these days we build vast neural networks with billions of parameters, but most of the parameters aren't needed. That is, after training, you can just throw away 95% of the network (pruning), and it will still work fine.
The LTH paper is asking: could we start with a network just 5% of the size, and get comparable results? If so, that would be a huge performance win for Deep Learning.
What's interesting is that you can do this, but only by training the full network (perhaps several times) to see which neurons are needed. They argue that training a neural network isn't so much creating a model, as finding a lucky sub-network (a lottery ticket) from the randomly initialized network, a bit like a sculpter "finding" the bust hidden in a block of marble.
Initial LTH paper: http://arxiv.org/abs/1803.03635
Follow-up with major clarifications: http://arxiv.org/abs/1905.01067