Module 1 · Lesson 3 · Foundations · 30 min

The Trade-off Vocabulary: Naming What You're Choosing Between

Staff candidates name the dimensions of their trade-offs out loud. Senior candidates describe options. This lesson builds the precise vocabulary so you stop saying 'it depends' and start saying 'it depends on whether [X], and here is the rule.' TRACK is the five dimensions every AI system design trade-off lives on.

Watch any Senior-tier system design interview and you will hear the phrase 'it depends' at least six times. Each instance is a moment where the candidate had the technical knowledge to name the dependency but did not have the reflex. 'It depends' without the dependency named is the dead-giveaway hedge that ends a discussion instead of advancing it. The interviewer cannot ask a follow-up to 'it depends'; they have to ask the same question again, more specifically, which costs them attention they would have spent grading you on something else.

TRACK is the five-dimension vocabulary that converts 'it depends' into a sentence the interviewer can act on. Throughput, Recovery, Accuracy, Cost, Knowledge. Any AI system trade-off you face — small model vs large, cache vs no-cache, streaming vs batch, sync vs async — moves some subset of these five dimensions in specific directions. Naming which ones move which way is the Staff signal. The rest of the conversation flows from that naming.

Framework

The TRACK Dimensions

When an interviewer asks 'what's the trade-off?' the failing answer is 'it depends.' The passing answer is 'it depends on whether X, and here's the rule.' The Staff answer names the specific dimensions the trade-off lives on, and TRACK is the vocabulary that lets you do it on reflex. Five letters, five dimensions; every recsys, LLM, training, or serving trade-off you face in the room is reducible to TRACK.

1
T — Throughput
Volume. QPS, tokens/sec, training examples/sec, requests/replica. The dimension that always lives behind any 'scale' conversation. When a candidate says 'we'd need to be faster,' the Staff translation is 'we're trading T against [other dimension] — let me name the other dimension.'
2
R — Recovery
Blast radius. RTO, RPO, what fraction of users see the failure, how long until they don't, how reversible the bad outcome is. Recovery is the dimension most candidates forget exists. 'We'd add caching' might improve T but degrade R if the cache is a new failure path with worse recovery characteristics. Naming R is a Staff signal because it shows you have lived through incidents.
3
A — Accuracy
Quality. Recall, precision, calibration, faithfulness, fairness — whatever the system's quality metric is. The most-discussed dimension because it's the model's primary output, and often the dimension that exists implicitly when other dimensions are being explicitly traded. 'Smaller model' is T-up, C-down, A-down — a three-dimension trade you should name as such.
4
C — Cost
Money. Per request, per training run, per GPU-hour, per incident. Cost is the dimension that turns trade-off conversations into product conversations. Most engineering candidates underweight C; most Staff conversations end up dominated by it because that is the dimension product cares about.
5
K — Knowledge
Team capacity. What the team knows today, what they'd need to learn, what operational practices they have or don't. The most-missed dimension by candidates, the most-respected by interviewers. 'The right architecture is X if the team has experience with stream processing; otherwise it's Y' is a Staff move because it names a constraint that doesn't appear in the technical content.

When to use

Run TRACK any time you say or hear 'trade-off' in an interview. Run it before saying 'it depends.' Run it when an interviewer pushes back on your architecture — 'why not the bigger model?' is a trade question, and the answer should name which TRACK dimensions move which way.

Worked example

Interviewer: 'Why not just use GPT-4 for everything?' Senior answer: 'It would be too expensive and slow.' Staff answer: 'GPT-4 wins on A — best accuracy — but loses on T (1/10th the QPS per dollar), C (10× cost per request), and K (we'd be locked into one vendor's roadmap). For our cost budget and existing infra, a smaller-model-with-fallback gives us 0.95× the A at 0.15× the C and within our T budget. The right call is the smaller model unless A is the single binding constraint, which it isn't here because we've negotiated a willingness-to-trade ratio with product.' Same content, TRACK-named.

Calibration ladder

The interviewer asks: 'Should we use a vector database or just embeddings in a regular DB column?'

A trade-off probe. The interviewer wants to see whether you can decompose the question or whether you reach for the default answer.

L4 · Mid

Vector database. They're designed for this — faster similarity search.

Missed: Reached for the default. Treated the question as a technical comparison without naming the trade-off dimensions.

L5 · Senior

Vector database for production scale. At small scale a regular DB with a vector column works, but once you have more than a million vectors you need real ANN indexing — HNSW or IVF-PQ — and that's what vector DBs provide.

Missed: Knew the scale-driven heuristic but didn't decompose. Will give a defensible answer in 80% of cases and a wrong answer in the 20% where K or R is binding.

L6 · Staff

Depends on TRACK. T — at sub-million vectors a regular DB column with brute-force search hits the budget; above that, vector DB wins. A — vector DBs add ANN-approximation error, which is usually fine but worth knowing about. C — vector DBs are extra infrastructure; the regular-DB approach reuses existing capacity. K — does the team operate a vector DB today? If no, the cost of learning that operational profile is real and isn't on the technical comparison. R — extra service is a new failure path; consider whether the regular-DB path can be a degraded fallback.

Missed: Strong TRACK decomposition. Missing the meta-move — questioning the 'or' framing and elevating K as the often-binding dimension.

L7 · Principal

Same TRACK answer, with two additions. (1) The decision is rarely vector-DB-or-not. It's almost always a portfolio: regular DB for the metadata join and filtering, vector DB for the similarity lookup, with the result reconciled. The 'or' framing in the question is itself worth unpacking. (2) The K dimension is usually the binding constraint and the one candidates underweight. A team that has never operated a vector DB will struggle with ANN index rebuilds, parameter tuning, and memory-vs-recall trade-offs that are not visible at architecture time. If the team's existing skill is Postgres-heavy, the right answer might be 'extend Postgres with pgvector and pay the small A cost' even when the technical comparison favors a dedicated vector DB. The pattern: K often beats T, A, and C in mature engineering decisions, and naming K is the move that demonstrates the difference between architecting and shipping.

What scored L7

Questioned the framing first (it's a portfolio, not a binary), then named K as the dimension most candidates underweight. The pattern — K is what separates 'right architecture in vacuum' from 'right architecture for this team' — is portable to every greenfield-vs-existing-stack question, every build-vs-buy question, every adopt-new-technology question. Naming K is the L7 signal.

Dimension	Smaller fast model	Mid-size model	Large frontier model
T — Throughput (QPS/$)	Best — high QPS/$.	Moderate QPS/$.	Worst — low QPS/$.
R — Recovery (failure modes)	Simple to operate; well-understood failure modes.	Standard recovery profile; most production stacks support it.	Vendor dependency = worse blast radius if API degrades.
A — Accuracy	Lowest. Often 'good enough' for narrow tasks.	Often close to frontier on practical tasks.	Best on hard tasks; marginal on easy ones.
C — Cost per request	Lowest cost per request.	Moderate cost per request.	10×+ cost per request vs smaller models.
K — Team operational skill	Requires distillation/finetuning skill if you're going below frontier.	Standard ops skill; widely deployed.	Simplest to operate — vendor handles ops.
Choose when	When A bar is achievable with a narrow task, T budget is tight, and team has distillation skill.	Default for most production workloads. Best balance unless one dimension is binding.	When A is the single binding constraint, the task is hard, and the cost budget supports it. Otherwise consider cascading: large model only on uncertain or hard queries.

Verdict

There is no universal answer. The right choice is the one that fits the binding TRACK dimension. Cascading routing — small for most queries, large for uncertain — is often the right Staff answer because it lets you collect the T and C benefits of small while preserving A on the hard cases.

Pattern recognition

When you see

The interviewer asks any question of the form 'why not just use [X]?' or 'why not [the obvious choice]?'

→

Think

Run TRACK out loud. List which dimensions [X] wins on and which it loses on, then state which dimension is binding for this design and why.

'Why not just X' is a probe for whether you can defend the decision you made against the obvious alternative. Most candidates respond with a single dimension ('it's too slow' or 'it's too expensive'), which the interviewer can disagree with. TRACK gives you the full surface: name what X wins on, name what it loses on, then commit to the binding dimension. This converts a defensive answer into a structured one.

Drill · 10 minutes

Practice this. Time yourself.

You have 10 minutes. For each of these three trade-off questions, write a TRACK-decomposed answer in 3-5 sentences. Name which dimensions move which way, which is binding, and what you'd commit to. Then write what's missing if you skipped any TRACK dimension. (a) 'Should we use a streaming pipeline or batch features?' (b) 'Should we cache LLM responses?' (c) 'Should we self-host the model or use an API?'

Self-assessment rubric

Dimension	Weak	Passing	Strong	Staff bar
Dimension naming	Named only T and C. Missed R, A, or K.	Named all five TRACK dimensions for at least 2 of 3 questions.	Named all five for all three questions; correctly identified the binding dimension in each.	Named all five plus the often-missed K dimension in each, and explicitly noted that K is binding in at least one of the three (typically 'self-host vs API').
Commitment + trade-off	Said 'it depends' for any of the three.	Committed to a position for each.	Committed and named the binding dimension.	Committed, named the binding dimension, named the willingness-to-trade ratio that would change the answer (e.g., 'if A budget tightens by 10%, the answer flips to X').
Portfolio framing	Treated each as a binary.	Recognized at least one as a portfolio rather than binary (e.g., 'cache the easy queries, route the hard to live inference').	Framed all three as portfolios where appropriate.	Explicitly named that 'or' framings often hide portfolios, and used TRACK to identify which dimensions improve in the portfolio framing.

Reveal model solution

(a) Streaming vs batch features. TRACK: T — streaming higher per-event throughput, batch higher per-period efficiency. R — streaming has more failure modes (Kafka lag, Flink restarts); batch has fewer but bigger ones (pipeline failure means stale features for a day). A — streaming improves feature freshness, which improves model accuracy on time-sensitive signals; batch is fine for stable features. C — streaming roughly 5-10× cost. K — streaming requires Kafka/Flink ops skill the team may not have. Binding: K and A together. Commit: streaming only if (i) signal staleness materially hurts accuracy AND (ii) team has stream-processing ops experience. Otherwise batch with thin online store for stable attributes. The willingness-to-trade ratio: if accuracy gain from freshness is < 0.5% on primary metric, batch wins. (b) Cache LLM responses. TRACK: T — cache hit improves latency and throughput dramatically. R — cache adds a failure path; stale entries can cause silent quality regressions. A — semantic cache hits can serve wrong answers; exact-match cache is safer. C — major cost reduction on repeated queries. K — cache invalidation is a hard problem; the team needs maturity. Binding: A and C balance. Commit: cache exact-match queries (high A, low risk); semantic cache only with confidence thresholds and freshness budgets. This is a portfolio decision: cache aggressive on easy queries, never cache personalized or time-sensitive ones. (c) Self-host vs API. TRACK: T — self-host can win at high steady-state QPS; API wins at bursty/low QPS. R — self-host = team owns ops including 3am pages; API = vendor owns ops but you own the dependency risk. A — depends on which model is available which way; A is usually comparable for general-purpose models. C — self-host wins at high utilization; API wins at low. K — this is the binding dimension. Self-hosting a 70B model requires GPU ops, autoscaling, model serving expertise the team may not have. Commit: API unless steady-state QPS justifies the K investment, which is usually >1000 sustained QPS for a frontier-class model. K is the often-binding dimension that the technical comparison hides.

Common failures

✗Listed all five dimensions but did not name which was binding. TRACK without a binding dimension is enumeration, not commitment.
✗Missed K on the self-host vs API question. K is usually the dominant dimension for that question, and missing it is the canonical Senior failure.
✗Treated the cache question as binary instead of as a portfolio. Most cache questions in production are 'which queries' not 'cache or not.'
✗Did not name the willingness-to-trade ratio for any of the three. The trade ratio is what converts TRACK from a description into a decision.

Artifact · reference card

The TRACK Wallet Card

TRACK dimensions

T — Throughput: QPS, tokens/sec, training examples/sec. The 'scale' dimension.
R — Recovery: RTO, RPO, blast radius, reversibility. The most-forgotten dimension.
A — Accuracy: Quality metric — recall, precision, calibration, faithfulness.
C — Cost: Per-request, per-training-run, per-incident dollars.
K — Knowledge: What the team knows; the often-binding hidden dimension.

Reflex sentences

Replace 'it depends': 'It depends on whether [TRACK dimension]; if A then X, if B then Y.'
Defending against 'why not X': 'X wins on [T or A]; loses on [R, C, or K]. Binding for us is [Z].'
Naming the trade ratio: 'I'd commit to X. If [TRACK dimension] tightens by Y%, the answer flips to Z.'
Portfolio framing: 'It's rarely a binary — use X for the easy case, Y for the hard case.'

Post-mortem · anonymized

Setup

L6 candidate at a top AI lab, fifth round. Strong technical content throughout the loop, especially on LLM serving and RAG. Interviewer was a Staff engineer on the platform team who pushed hard on trade-off questions.

What happened

Every time the interviewer asked a 'why not X?' question — 'why not GPT-4 for everything?', 'why not just self-host?', 'why not cache?' — the candidate gave a credible single-dimension answer ('too expensive', 'too much ops', 'cache invalidation is hard'). Each answer was technically correct. None of them named more than one TRACK dimension. After six such exchanges, the interviewer's confidence in the candidate's depth dropped — not because the answers were wrong, but because the candidate appeared unable to hold multiple trade-off dimensions in their head simultaneously.

The moment

Minute 38, after the sixth single-dimension trade answer, the interviewer asked the closing question: 'Tell me about a design decision you got wrong, in retrospect.' The candidate told a credible story about underestimating cost. The interviewer's note read 'consistently treats trade-offs as single-dimension; suggests difficulty operating in real Staff-level decisions where multiple dimensions move at once.' The candidate scored L5 on a loop where the technical content alone would have justified L6.

What they should have said

Once per probe, run TRACK out loud. 'GPT-4 wins on A; loses on T and C; K is comparable for us because we already use it. Binding for us is C. Commit: smaller-model-with-fallback.' Each 'why not X' question becomes a 30-second TRACK pass that demonstrates exactly what was missing from the actual answers: the ability to hold multiple dimensions simultaneously. The technical knowledge was already there.

Lesson

Senior candidates know the trade-offs exist. Staff candidates name the dimensions and commit to the binding one. The vocabulary is what makes the difference visible in the room. Practice TRACK as a reflex, not as a checklist — the goal is to be unable to give a single-dimension trade-off answer even under pressure.