Curated Resource Library

Attention Is All You Need

Still the cleanest single-source treatment of the math and core architectures. Skim chapters 5–9 before any architect-level interview.

Dive into Deep Learning

Zhang, Lipton, Li, Smola

BookIntro

Free, code-first companion to Goodfellow. Best for refreshing fundamentals through working PyTorch/TF examples.

Vaswani et al., 2017

The Illustrated Transformer

Read it once, then re-read every two years. Most architecture interview missteps come from fuzzy memory of this one paper.

Jay Alammar

BlogIntro

The visual companion. Worth keeping bookmarked for explaining transformers to non-ML stakeholders.

Yes you should understand backprop

Andrej Karpathy

BlogIntro

Short, sharp argument for why the abstraction leaks. Foundational reading for anyone designing training systems.

LLM & Generative AI

Architecture, training paradigms, and the inference-time tricks that separate prototype from production.

GPT-3 paper — Language Models are Few-Shot Learners

Brown et al., 2020

Scaling Laws for Neural Language Models

The scaling and in-context learning story. Still the right reference for explaining what made LLMs different.

Kaplan et al., 2020

Training Compute-Optimal Large Language Models (Chinchilla)

The math behind why scale works. You will be asked about scaling at architect level — own this.

Hoffmann et al., 2022

LLaMA / LLaMA 2 / LLaMA 3 papers

The correction to Kaplan. Data vs. parameter trade-off is now table stakes in any training-cost conversation.

Meta AI

Bai et al., Anthropic, 2022

Production-grade open weights. The training and data recipes are the most-cited reference architecture in industry.

Constitutional AI

Direct Preference Optimization (DPO)

Read alongside RLHF papers. Defines a vocabulary you'll need in any safety-conscious org.

Rafailov et al., 2023

FlashAttention / FlashAttention-2

The pragmatic alternative to PPO-based RLHF. Almost every preference-tuning conversation now starts here.

Dao et al.

The single optimization that reshaped training and inference economics. Architect-must-know.

vLLM and PagedAttention

Kwon et al., 2023

Why modern LLM serving looks the way it does. Read with the vLLM source open.

GPT in 60 lines of NumPy

Jay Mody

Dense Passage Retrieval (DPR)

The simplest possible reference implementation. Cuts through library noise.

RAG & Vector Search

Retrieval architectures, embeddings, ANN trade-offs, and the eval methodology that prevents shipping vibes.

Karpukhin et al., 2020

The foundational dense retrieval paper. Read before any architecture conversation about retrieval choices.

ColBERT and ColBERTv2

Khattab, Zaharia

Retrieval-Augmented Generation for Large Language Models: A Survey

Late-interaction retrieval. Important context for understanding the dense vs. lexical vs. hybrid landscape.

Gao et al., 2023

PaperReference

The map of the territory — chunking, retrieval, reranking, eval. A useful index, not bedtime reading.

Ragas — RAG evaluation framework

ToolIntermediate

Opinionated metrics (faithfulness, answer relevance, context precision/recall) that map well to what interviewers expect you to discuss.

Pinecone Learning Center / Weaviate Docs

HNSW — Hierarchical Navigable Small World

Practical, vendor-flavored but solid intros to ANN, hybrid search, and reranking patterns.

Malkov, Yashunin

Designing Machine Learning Systems

Underpins most production ANN. Worth knowing well enough to discuss recall/latency trade-offs in interviews.

MLOps & Production AI

Deployment, monitoring, feature platforms, governance — the gap between trained model and useful model.

Chip Huyen

Machine Learning Engineering

Single best industry-oriented book on ML systems design. Read end-to-end.

Andriy Burkov

Rules of Machine Learning

Pragmatic, opinionated, and dense. Pairs well with Chip Huyen's book.

Martin Zinkevich, Google

Hidden Technical Debt in Machine Learning Systems

43 hard-won rules. Worth re-reading every promotion cycle.

Sculley et al., Google

PaperIntro

The seminal paper. Still applicable in 2026. Cite it when defending governance budget.

Feature Stores for ML

Tecton / Feast docs

DocIntermediate

Open-source reference architecture for online/offline feature parity. Read before any feature-platform interview.

Monitoring ML Models in Production

Evidently AI / Arize blogs

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Practical patterns for drift detection, monitoring, and alerting that don't generate noise.

Distributed Training & GPU Infra

How large models actually get trained — parallelism strategies, collective comms, and the hardware reality underneath.

Rajbhandari et al., Microsoft

Foundational ZeRO stage paper. Read before any large-model training conversation.

Megatron-LM

Shoeybi et al., NVIDIA

Tensor parallelism done right. The mental model for intra-node parallelism.

GPipe & PipeDream

Efficient Large-Scale Language Model Training (Megatron-Turing)

Pipeline parallelism foundations. Combined with TP and DP, these are the three primitives you need to compose.

NCCL: Collective Communication Library

The 3D parallelism cookbook. Reference architecture for combining DP + TP + PP.

PyTorch FSDP Tutorial

DocIntermediate

Practical primer. FSDP is the de facto default for open-source large-model training.

DocReference

Knowing the collective ops (all-reduce, all-gather, reduce-scatter) and their topology behavior is non-optional at architect level.

Agents, Tool Use & Reasoning

The most architecturally chaotic area in AI right now. Read the originals, not the threads.

ReAct: Synergizing Reasoning and Acting

Yao et al., 2022

Chain-of-Thought Prompting Elicits Reasoning

The pattern most modern agents are built on. Read it directly rather than via blog summaries.

Wei et al., 2022

Schick et al., Meta, 2023

Foundational for the entire 'thinking' family of techniques.

Toolformer

Anthropic — Building Effective Agents

Self-supervised tool use. Still the cleanest framing of the problem.

OpenAI — Function calling and structured outputs guide

Pragmatic, opinionated guide on when not to reach for an agent framework. Required reading before designing one.

NIST AI Risk Management Framework (AI RMF)

Reference for designing reliable tool schemas. Reads as a spec for how to think about tool reliability in general.

Responsible AI, Safety & Governance

Frameworks, regulation, red-teaming, and the architectural patterns that make governance load-bearing.

DocReference

The map every US-facing org uses. Know the four functions cold.

EU AI Act — official text + plain-English summaries

DocReference

Read at least the risk-tier classification and the obligations for high-risk systems. Architect-relevant in any EU-touching product.

Model Cards for Model Reporting

Mitchell et al., 2018

PaperIntro

The original. Treat model cards as load-bearing artifacts, not paperwork.

Universal and Transferable Adversarial Attacks on Aligned LMs

Zou et al., 2023

OWASP Top 10 for LLM Applications

Required reading before claiming your prompt-injection defenses are sufficient.

Concrete Problems in AI Safety

Maps familiar AppSec thinking onto LLM-specific risks. Useful for cross-org security conversations.

Amodei et al., 2016

PaperIntro

Older but still the best brief introduction to the safety problem taxonomy.

Technical Leadership & Architecture

What separates Staff from Architect: communication, influence, and durable judgment under uncertainty.

Staff Engineer: Leadership Beyond the Management Track

Will Larson

The Staff Engineer's Path

The canonical reference. Worth one full read and an annual skim.

Tanya Reilly

Pairs well with Larson. Stronger on the day-to-day mechanics of influence and trust.

An Elegant Puzzle

Will Larson

Designing Data-Intensive Applications

Org design for engineers. Useful even if you never become a manager.

Martin Kleppmann

BookAdvanced

Not AI-specific, but the foundation under every AI system. Re-read every few years.

lethain.com — Will Larson's archive

Architecture Decision Records (ADRs)

The single best long-running blog on the senior IC craft. Subscribe.

Michael Nygard

BlogIntro

The simple template that makes you a better technical writer overnight.

Company Engineering Blogs to Follow

Where production AI patterns actually surface. Subscribe to a few, not all.

Anthropic Research & Blog

Safety, interpretability, agents, model architecture. Slow but high signal-to-noise.

OpenAI Research & Blog

Frontier capability work. Read selectively — the production patterns are scattered across model and product posts.

Google Research Blog

Foundational systems and ML research. Long-running quality.

Meta AI Research

Especially good for open-weight model releases, infra, and ranking systems.

Netflix Tech Blog

Personalization, A/B testing, and ML infra at streaming scale.

Uber Engineering

Marketplace ML, real-time systems, H3 — the practical end of applied ML.

Pinterest Engineering

Quietly excellent on recsys and embedding-based retrieval at scale.

DoorDash Engineering