Contents

Makemore Implement a Bigram Character-level Language Model

Contents

Let’s look at episode #2: The spelled-out intro to language modeling: building makemore from Andrej Karpathy amazing tutorial series.

It covers an intro to Language Model using a very barebone from scratch approch using a Bigram Character-level Language Model. It means: “given a single character, guess the next character”. For this session the NN is trained on a list of names to produce new unique name-sounding words.

The lecture goes from calculating the probabilities of each letters by hand, to automatically generating the probablilities as the set of weight of a very simple one layer NN that produce the exact same results.

The video is a treat from start to finish. To highlight one specific point, Andrej goes off on a tangent about the importance of understanding tensor broadcasting and how easy it is to shoot yourself in the foot otherwise.

The basics rules of broadcasting go as follow:

  • align the dimensions to the right
  • the dimensions must be equal
  • or one of them must be 1
  • or one of them must not exist

Here’s my take on the tutorial with additional notes. You can get the code on GitHub or bellow.