Your network makes a prediction. The ground truth says otherwise. How wrong is it? That number is the loss. Pick the wrong loss function and your model optimizes for the wrong t…

Optimizers: SGD, Momentum, Adam, AdamW

Build

Python

Gradient descent tells you which direction to move. It says nothing about how far or how fast. SGD is a compass. Adam is GPS with traffic data.

Regularization: Dropout, Weight Decay, BatchNorm

Build

Python

Your model gets 99% on training data and 60% on test data. It memorized instead of learning. Regularization is the tax you impose on complexity to force generalization.

Weight Initialization & Training Stability

Build

Python

Initialize wrong and training never starts. Initialize right and 50 layers train as smoothly as 3.

Learning Rate Schedules & Warmup

Build

Python

The learning rate is the single most important hyperparameter. Not the architecture. Not the dataset size. Not the activation function. The learning rate. If you tune nothing el…

Build Your Own Mini Framework

Build

Python

You have built neurons, layers, networks, backprop, activations, loss functions, optimizers, regularization, initialization, and LR schedules. All as separate pieces. Now wire t…

Introduction to PyTorch

Build

Python

You built the engine from pistons and crankshafts. Now learn the one everyone actually drives.

Introduction to JAX

Build

Python

PyTorch mutates tensors. TensorFlow builds graphs. JAX compiles pure functions. That last one changes how you think about deep learning.

Debugging Neural Networks

Build

Python

Your network compiled. It ran. It produced a number. The number is wrong and nothing crashed. Welcome to the hardest kind of debugging -- the kind where there is no error message.