top of page

Title:

Category:

Neural Networks Zero to Hero

Tutorial

URL

Authors:

Andrej Karpathy

https://karpathy.ai/

Published

1 March 2022

Review:

Dennis Kuriakose

Review Date :

25 July 2024

Summary

Andrej has provided a very definitive fundamental course on neural networks over 5 sessions along with its code and exercises

Review & Notes: 

The spelled-out intro to neural networks and backpropagation: building micrograd

This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school.

Chapters:

00:00:00 intro

00:00:25 micrograd overview

00:08:08 derivative of a simple function with one input

00:14:12 derivative of a function with multiple inputs

00:19:09 starting the core Value object of micrograd and its visualization 00:32:10 manual backpropagation example #1: simple expression 00:51:10 preview of a single optimization step

00:52:52 manual backpropagation example #2: a neuron 01:09:02 implementing the backward function for each operation 01:17:32 implementing the backward function for a whole expression graph 01:22:28 fixing a backprop bug when one node is used multiple times 01:27:05 breaking up a tanh, exercising with more operations

01:39:31 doing the same thing but in PyTorch: comparison

01:43:55 building out a neural net library (multi-layer perceptron) in micrograd

01:51:04 creating a tiny dataset, writing the loss function

01:57:56 collecting all of the parameters of the neural net

02:01:12 doing gradient descent optimization manually, training the network 02:14:03 summary of what we learned, how to go towards modern neural nets

02:16:46 walkthrough of the full code of micrograd on github

02:21:10 real stuff: diving into PyTorch, finding their backward pass for tanh 02:24:39 conclusion

02:25:20 outtakes :)


The spelled-out intro to language modeling: building makemore

We implement a bigram character-level language model, which we will further complexify in followup videos into a modern Transformer language model, like GPT. In this video, the focus is on (1) introducing torch.Tensor and its subtleties and use in efficiently evaluating neural networks and (2) the overall framework of language modeling that includes model training, sampling, and the evaluation of a loss (e.g. the negative log likelihood for classification).

Chapters:

00:00:00 intro

00:03:03 reading and exploring the dataset

00:06:24 exploring the bigrams in the dataset

00:09:24 counting bigrams in a python dictionary

00:12:45 counting bigrams in a 2D torch tensor ("training the model") 00:18:19 visualizing the bigram tensor

00:20:54 deleting spurious (S) and (E) tokens in favor of a single . token 00:24:02 sampling from the model

00:36:17 efficiency! vectorized normalization of the rows, tensor broadcasting

00:50:14 loss function (the negative log likelihood of the data under our model)

01:00:50 model smoothing with fake counts

01:02:57 PART 2: the neural network approach: intro

01:05:26 creating the bigram dataset for the neural net

01:10:01 feeding integers into neural nets? one-hot encodings

01:13:53 the "neural net": one linear layer of neurons implemented with matrix multiplication

01:18:46 transforming neural net outputs into probabilities: the softmax 01:26:17 summary, preview to next steps, reference to micrograd 01:35:49 vectorized loss

01:38:36 backward and update, in PyTorch

01:42:55 putting everything together

01:47:49 note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix

01:50:18 note 2: model smoothing as regularization loss

01:54:31 sampling from the neural net

01:56:16 conclusion

Follow

  • X
  • LinkedIn

©2024 Collationist.

bottom of page