Recurrent Neural networks

TLDR

In the previous section on neural networks we saw how we could classify a single example—eg. a single image of a handwritten digit. In this class, we’ll alter that basic neural network to allow us to classify sequences—eg. multiple digits in a row. Here’s a few thought experiments to make this concept concrete.

Imagine we want to classify handwritten phone numbers. We know that certain area codes are more popular than others, so, e.g. if we see “61”, there’s a high probability that the next number is “7” for Boston’s “617.”
Imagine we want to classify the weather forecast given some data. If we’ve had a few days of sun in April, maybe rain is more likely soon.
Imagine we want to predict the next word in the sentence “I took a walk with my ____.” Clearly “sister” and “father” should be more probable than “refrigerator.”

To accomplish these tasks, we need to give our neural network some "memory" of what happened in the past. That’s exactly what recurrent neural networks do.

In-class activities

🎭 Model Shakespeare with an RNN

Further reading

MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention, by Ava Amini Watch until minute 48, whereupon Amini starts getting into transformers.
The Unreasonable Effectiveness of Recurrent Neural Networks, by Andrej Karpathy

This latter article is pretty famous. What's amazing is that the RNN's Karpathy trains in this article are character-based. They are outputting single letters instead of words. And, even with that being the case, they are able to learn to create unreasonably awesome output. You need to understand the stuff he's talking about down until his "Fun with RNNs" section.

What's happening is that these RNNs are spitting out characters one-at-a-time. But, they remember what they output previously and so they can kinda make sensible outputs at each step. (As we will see later though, they don't have a long memory. We'll soon see how a model called a Transformer is able to remember better than an RNN. The "T" in ChatGPT is for Transformer.)

More advanced further reading

Lipton, Zachary C., John Berkowitz, and Charles Elkan. "A critical review of recurrent neural networks for sequence learning" arXiv preprint arXiv:1506.00019 (2015). This is an oft-cited review of RNNs, their upsides and downsides.
"Understanding LSTM Networks" by Christopher Olah. Long short-term memory RNNs (LSTMs) are a kind of RNN that solves two problems with vanilla RNNs: they have a longer term memory and they can forget irrelevant stuff easily.
Understanding GRU Networks by Simeon Kostadinov. A Gated Recurrent Unit is like an LSTM but more simple and easier to train
Illustrated guide to LSTMs and GRUs, by Michael Phi. The animation in this is nice. I think the link above is better written though.