Curated Resource Library
57 hand-picked papers, books, blogs, and docs across 9 domains. Opinionated — every entry has a one-line reason it's here. No affiliate links, no SEO bait.
ML & Deep Learning Foundations
The mental models you cannot skip. Re-read the foundations every couple of years — the field moves but these don't.
Still the cleanest single-source treatment of the math and core architectures. Skim chapters 5–9 before any architect-level interview.
Free, code-first companion to Goodfellow. Best for refreshing fundamentals through working PyTorch/TF examples.
Read it once, then re-read every two years. Most architecture interview missteps come from fuzzy memory of this one paper.
The visual companion. Worth keeping bookmarked for explaining transformers to non-ML stakeholders.
Short, sharp argument for why the abstraction leaks. Foundational reading for anyone designing training systems.
LLM & Generative AI
Architecture, training paradigms, and the inference-time tricks that separate prototype from production.
The scaling and in-context learning story. Still the right reference for explaining what made LLMs different.
The math behind why scale works. You will be asked about scaling at architect level — own this.
The correction to Kaplan. Data vs. parameter trade-off is now table stakes in any training-cost conversation.
Production-grade open weights. The training and data recipes are the most-cited reference architecture in industry.
Read alongside RLHF papers. Defines a vocabulary you'll need in any safety-conscious org.
The pragmatic alternative to PPO-based RLHF. Almost every preference-tuning conversation now starts here.
The single optimization that reshaped training and inference economics. Architect-must-know.
Why modern LLM serving looks the way it does. Read with the vLLM source open.
The simplest possible reference implementation. Cuts through library noise.
RAG & Vector Search
Retrieval architectures, embeddings, ANN trade-offs, and the eval methodology that prevents shipping vibes.
The foundational dense retrieval paper. Read before any architecture conversation about retrieval choices.
Late-interaction retrieval. Important context for understanding the dense vs. lexical vs. hybrid landscape.
The map of the territory — chunking, retrieval, reranking, eval. A useful index, not bedtime reading.
Opinionated metrics (faithfulness, answer relevance, context precision/recall) that map well to what interviewers expect you to discuss.
Practical, vendor-flavored but solid intros to ANN, hybrid search, and reranking patterns.
Underpins most production ANN. Worth knowing well enough to discuss recall/latency trade-offs in interviews.
MLOps & Production AI
Deployment, monitoring, feature platforms, governance — the gap between trained model and useful model.
Single best industry-oriented book on ML systems design. Read end-to-end.
Pragmatic, opinionated, and dense. Pairs well with Chip Huyen's book.
43 hard-won rules. Worth re-reading every promotion cycle.
The seminal paper. Still applicable in 2026. Cite it when defending governance budget.
Open-source reference architecture for online/offline feature parity. Read before any feature-platform interview.
Practical patterns for drift detection, monitoring, and alerting that don't generate noise.
Distributed Training & GPU Infra
How large models actually get trained — parallelism strategies, collective comms, and the hardware reality underneath.
Foundational ZeRO stage paper. Read before any large-model training conversation.
Tensor parallelism done right. The mental model for intra-node parallelism.
Pipeline parallelism foundations. Combined with TP and DP, these are the three primitives you need to compose.
The 3D parallelism cookbook. Reference architecture for combining DP + TP + PP.
Practical primer. FSDP is the de facto default for open-source large-model training.
Knowing the collective ops (all-reduce, all-gather, reduce-scatter) and their topology behavior is non-optional at architect level.
Agents, Tool Use & Reasoning
The most architecturally chaotic area in AI right now. Read the originals, not the threads.
The pattern most modern agents are built on. Read it directly rather than via blog summaries.
Foundational for the entire 'thinking' family of techniques.
Self-supervised tool use. Still the cleanest framing of the problem.
Pragmatic, opinionated guide on when not to reach for an agent framework. Required reading before designing one.
Reference for designing reliable tool schemas. Reads as a spec for how to think about tool reliability in general.
Responsible AI, Safety & Governance
Frameworks, regulation, red-teaming, and the architectural patterns that make governance load-bearing.
The map every US-facing org uses. Know the four functions cold.
Read at least the risk-tier classification and the obligations for high-risk systems. Architect-relevant in any EU-touching product.
The original. Treat model cards as load-bearing artifacts, not paperwork.
Required reading before claiming your prompt-injection defenses are sufficient.
Maps familiar AppSec thinking onto LLM-specific risks. Useful for cross-org security conversations.
Older but still the best brief introduction to the safety problem taxonomy.
Technical Leadership & Architecture
What separates Staff from Architect: communication, influence, and durable judgment under uncertainty.
The canonical reference. Worth one full read and an annual skim.
Pairs well with Larson. Stronger on the day-to-day mechanics of influence and trust.
Org design for engineers. Useful even if you never become a manager.
Not AI-specific, but the foundation under every AI system. Re-read every few years.
The single best long-running blog on the senior IC craft. Subscribe.
The simple template that makes you a better technical writer overnight.
Company Engineering Blogs to Follow
Where production AI patterns actually surface. Subscribe to a few, not all.
Safety, interpretability, agents, model architecture. Slow but high signal-to-noise.
Frontier capability work. Read selectively — the production patterns are scattered across model and product posts.
Foundational systems and ML research. Long-running quality.
Especially good for open-weight model releases, infra, and ranking systems.
Personalization, A/B testing, and ML infra at streaming scale.
Marketplace ML, real-time systems, H3 — the practical end of applied ML.
Quietly excellent on recsys and embedding-based retrieval at scale.
Marketplace and operations ML — often the most concrete production case studies on the web.
A note on this list
This is not a survey. It's the shortlist we'd hand to a senior engineer asking "what should I actually read?" If a famous paper or book is missing, it's because something else on the list covers it better for the architect path. Suggestions for additions are welcome.