top of page
Title:
Category:
Neural Networks Zero to Hero
Tutorial
URL
Authors:
Andrej Karpathy
Published
1 March 2022
Review:
Dennis Kuriakose
Review Date :
25 July 2024
Summary
Andrej has provided a very definitive fundamental course on neural networks over 5 sessions along with its code and exercises
Review & Notes:
The spelled-out intro to neural networks and backpropagation: building micrograd
This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school.
Chapters:
00:00:00 intro
00:00:25 micrograd overview
00:08:08 derivative of a simple function with one input
00:14:12 derivative of a function with multiple inputs
00:19:09 starting the core Value object of micrograd and its visualization 00:32:10 manual backpropagation example #1: simple expression 00:51:10 preview of a single optimization step
00:52:52 manual backpropagation example #2: a neuron 01:09:02 implementing the backward function for each operation 01:17:32 implementing the backward function for a whole expression graph 01:22:28 fixing a backprop bug when one node is used multiple times 01:27:05 breaking up a tanh, exercising with more operations
01:39:31 doing the same thing but in PyTorch: comparison
01:43:55 building out a neural net library (multi-layer perceptron) in micrograd
01:51:04 creating a tiny dataset, writing the loss function
01:57:56 collecting all of the parameters of the neural net
02:01:12 doing gradient descent optimization manually, training the network 02:14:03 summary of what we learned, how to go towards modern neural nets
02:16:46 walkthrough of the full code of micrograd on github
02:21:10 real stuff: diving into PyTorch, finding their backward pass for tanh 02:24:39 conclusion
02:25:20 outtakes :)
The spelled-out intro to language modeling: building makemore
We implement a bigram character-level language model, which we will further complexify in followup videos into a modern Transformer language model, like GPT. In this video, the focus is on (1) introducing torch.Tensor and its subtleties and use in efficiently evaluating neural networks and (2) the overall framework of language modeling that includes model training, sampling, and the evaluation of a loss (e.g. the negative log likelihood for classification).
Chapters:
00:00:00 intro
00:03:03 reading and exploring the dataset
00:06:24 exploring the bigrams in the dataset
00:09:24 counting bigrams in a python dictionary
00:12:45 counting bigrams in a 2D torch tensor ("training the model") 00:18:19 visualizing the bigram tensor
00:20:54 deleting spurious (S) and (E) tokens in favor of a single . token 00:24:02 sampling from the model
00:36:17 efficiency! vectorized normalization of the rows, tensor broadcasting
00:50:14 loss function (the negative log likelihood of the data under our model)
01:00:50 model smoothing with fake counts
01:02:57 PART 2: the neural network approach: intro
01:05:26 creating the bigram dataset for the neural net
01:10:01 feeding integers into neural nets? one-hot encodings
01:13:53 the "neural net": one linear layer of neurons implemented with matrix multiplication
01:18:46 transforming neural net outputs into probabilities: the softmax 01:26:17 summary, preview to next steps, reference to micrograd 01:35:49 vectorized loss
01:38:36 backward and update, in PyTorch
01:42:55 putting everything together
01:47:49 note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
01:50:18 note 2: model smoothing as regularization loss
01:54:31 sampling from the neural net
01:56:16 conclusion
Technology Posts
bottom of page











