top of page
Title:
Category:
Neural Networks Zero to Hero
Tutorial
URL
Authors:
Andrej Karpathy
Published
1 March 2022
Review:
Dennis Kuriakose
Review Date :
25 July 2024
Summary
Andrej has provided a very definitive fundamental course on neural networks over 5 sessions along with its code and exercises
Review & Notes:
The spelled-out intro to neural networks and backpropagation: building micrograd
This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school.
Chapters:
00:00:00Â intro
00:00:25Â micrograd overview
00:08:08Â derivative of a simple function with one input
00:14:12Â derivative of a function with multiple inputs
00:19:09Â starting the core Value object of micrograd and its visualization 00:32:10Â manual backpropagation example #1: simple expression 00:51:10Â preview of a single optimization step
00:52:52Â manual backpropagation example #2: a neuron 01:09:02Â implementing the backward function for each operation 01:17:32Â implementing the backward function for a whole expression graph 01:22:28Â fixing a backprop bug when one node is used multiple times 01:27:05Â breaking up a tanh, exercising with more operations
01:39:31Â doing the same thing but in PyTorch: comparison
01:43:55Â building out a neural net library (multi-layer perceptron) in micrograd
01:51:04Â creating a tiny dataset, writing the loss function
01:57:56Â collecting all of the parameters of the neural net
02:01:12Â doing gradient descent optimization manually, training the network 02:14:03Â summary of what we learned, how to go towards modern neural nets
02:16:46Â walkthrough of the full code of micrograd on github
02:21:10Â real stuff: diving into PyTorch, finding their backward pass for tanh 02:24:39Â conclusion
02:25:20Â outtakes :)
The spelled-out intro to language modeling: building makemore
We implement a bigram character-level language model, which we will further complexify in followup videos into a modern Transformer language model, like GPT. In this video, the focus is on (1) introducing torch.Tensor and its subtleties and use in efficiently evaluating neural networks and (2) the overall framework of language modeling that includes model training, sampling, and the evaluation of a loss (e.g. the negative log likelihood for classification).
Chapters:
00:00:00Â intro
00:03:03Â reading and exploring the dataset
00:06:24Â exploring the bigrams in the dataset
00:09:24Â counting bigrams in a python dictionary
00:12:45Â counting bigrams in a 2D torch tensor ("training the model") 00:18:19Â visualizing the bigram tensor
00:20:54Â deleting spurious (S) and (E) tokens in favor of a single . token 00:24:02Â sampling from the model
00:36:17Â efficiency! vectorized normalization of the rows, tensor broadcasting
00:50:14Â loss function (the negative log likelihood of the data under our model)
01:00:50Â model smoothing with fake counts
01:02:57Â PART 2: the neural network approach: intro
01:05:26Â creating the bigram dataset for the neural net
01:10:01Â feeding integers into neural nets? one-hot encodings
01:13:53Â the "neural net": one linear layer of neurons implemented with matrix multiplication
01:18:46Â transforming neural net outputs into probabilities: the softmax 01:26:17Â summary, preview to next steps, reference to micrograd 01:35:49Â vectorized loss
01:38:36Â backward and update, in PyTorch
01:42:55Â putting everything together
01:47:49Â note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
01:50:18Â note 2: model smoothing as regularization loss
01:54:31Â sampling from the neural net
01:56:16Â conclusion
Technology Posts
bottom of page