Deep Learning Core
Neural networks from first principles. No frameworks until you build one.
The Perceptron: Where It All Started
The perceptron is the atom of neural networks. Split it open and you find weights, a bias, and a decision.
Multi-Layer Networks & Forward Pass
One neuron draws a line. Stack them, and you can draw anything.
Backpropagation from Scratch
Backpropagation is the algorithm that makes learning possible. Without it, neural networks are just expensive random number generators.
Activation Functions: ReLU, Sigmoid, GELU & Why
Without nonlinearity, your 100-layer network is a fancy matrix multiply. Activations are the gates that let neural networks think in curves.
Loss Functions: MSE, Cross-Entropy, Contrastive
Your network makes a prediction. The ground truth says otherwise. How wrong is it? That number is the loss. Pick the wrong loss function and your model optimizes for the wrong t…
Optimizers: SGD, Momentum, Adam, AdamW
Gradient descent tells you which direction to move. It says nothing about how far or how fast. SGD is a compass. Adam is GPS with traffic data.
Regularization: Dropout, Weight Decay, BatchNorm
Your model gets 99% on training data and 60% on test data. It memorized instead of learning. Regularization is the tax you impose on complexity to force generalization.
Weight Initialization & Training Stability
Initialize wrong and training never starts. Initialize right and 50 layers train as smoothly as 3.
Learning Rate Schedules & Warmup
The learning rate is the single most important hyperparameter. Not the architecture. Not the dataset size. Not the activation function. The learning rate. If you tune nothing el…
Build Your Own Mini Framework
You have built neurons, layers, networks, backprop, activations, loss functions, optimizers, regularization, initialization, and LR schedules. All as separate pieces. Now wire t…
Introduction to PyTorch
You built the engine from pistons and crankshafts. Now learn the one everyone actually drives.
Introduction to JAX
PyTorch mutates tensors. TensorFlow builds graphs. JAX compiles pure functions. That last one changes how you think about deep learning.
Debugging Neural Networks
Your network compiled. It ran. It produced a number. The number is wrong and nothing crashed. Welcome to the hardest kind of debugging -- the kind where there is no error message.