The Trade-off Vocabulary: Naming What You're Choosing Between
Staff candidates name the dimensions of their trade-offs out loud. Senior candidates describe options. This lesson builds the precise vocabulary so you stop saying 'it depends' and start saying 'it depends on whether [X], and here is the rule.' TRACK is the five dimensions every AI system design trade-off lives on.
Watch any Senior-tier system design interview and you will hear the phrase 'it depends' at least six times. Each instance is a moment where the candidate had the technical knowledge to name the dependency but did not have the reflex. 'It depends' without the dependency named is the dead-giveaway hedge that ends a discussion instead of advancing it. The interviewer cannot ask a follow-up to 'it depends'; they have to ask the same question again, more specifically, which costs them attention they would have spent grading you on something else.
TRACK is the five-dimension vocabulary that converts 'it depends' into a sentence the interviewer can act on. Throughput, Recovery, Accuracy, Cost, Knowledge. Any AI system trade-off you face — small model vs large, cache vs no-cache, streaming vs batch, sync vs async — moves some subset of these five dimensions in specific directions. Naming which ones move which way is the Staff signal. The rest of the conversation flows from that naming.
The TRACK Dimensions
When an interviewer asks 'what's the trade-off?' the failing answer is 'it depends.' The passing answer is 'it depends on whether X, and here's the rule.' The Staff answer names the specific dimensions the trade-off lives on, and TRACK is the vocabulary that lets you do it on reflex. Five letters, five dimensions; every recsys, LLM, training, or serving trade-off you face in the room is reducible to TRACK.
- 1T — ThroughputVolume. QPS, tokens/sec, training examples/sec, requests/replica. The dimension that always lives behind any 'scale' conversation. When a candidate says 'we'd need to be faster,' the Staff translation is 'we're trading T against [other dimension] — let me name the other dimension.'
- 2R — RecoveryBlast radius. RTO, RPO, what fraction of users see the failure, how long until they don't, how reversible the bad outcome is. Recovery is the dimension most candidates forget exists. 'We'd add caching' might improve T but degrade R if the cache is a new failure path with worse recovery characteristics. Naming R is a Staff signal because it shows you have lived through incidents.
- 3A — AccuracyQuality. Recall, precision, calibration, faithfulness, fairness — whatever the system's quality metric is. The most-discussed dimension because it's the model's primary output, and often the dimension that exists implicitly when other dimensions are being explicitly traded. 'Smaller model' is T-up, C-down, A-down — a three-dimension trade you should name as such.
- 4C — CostMoney. Per request, per training run, per GPU-hour, per incident. Cost is the dimension that turns trade-off conversations into product conversations. Most engineering candidates underweight C; most Staff conversations end up dominated by it because that is the dimension product cares about.
- 5K — KnowledgeTeam capacity. What the team knows today, what they'd need to learn, what operational practices they have or don't. The most-missed dimension by candidates, the most-respected by interviewers. 'The right architecture is X if the team has experience with stream processing; otherwise it's Y' is a Staff move because it names a constraint that doesn't appear in the technical content.
Run TRACK any time you say or hear 'trade-off' in an interview. Run it before saying 'it depends.' Run it when an interviewer pushes back on your architecture — 'why not the bigger model?' is a trade question, and the answer should name which TRACK dimensions move which way.
Interviewer: 'Why not just use GPT-4 for everything?' Senior answer: 'It would be too expensive and slow.' Staff answer: 'GPT-4 wins on A — best accuracy — but loses on T (1/10th the QPS per dollar), C (10× cost per request), and K (we'd be locked into one vendor's roadmap). For our cost budget and existing infra, a smaller-model-with-fallback gives us 0.95× the A at 0.15× the C and within our T budget. The right call is the smaller model unless A is the single binding constraint, which it isn't here because we've negotiated a willingness-to-trade ratio with product.' Same content, TRACK-named.
The interviewer asks: 'Should we use a vector database or just embeddings in a regular DB column?'
A trade-off probe. The interviewer wants to see whether you can decompose the question or whether you reach for the default answer.
Vector database. They're designed for this — faster similarity search.
Vector database for production scale. At small scale a regular DB with a vector column works, but once you have more than a million vectors you need real ANN indexing — HNSW or IVF-PQ — and that's what vector DBs provide.
Depends on TRACK. T — at sub-million vectors a regular DB column with brute-force search hits the budget; above that, vector DB wins. A — vector DBs add ANN-approximation error, which is usually fine but worth knowing about. C — vector DBs are extra infrastructure; the regular-DB approach reuses existing capacity. K — does the team operate a vector DB today? If no, the cost of learning that operational profile is real and isn't on the technical comparison. R — extra service is a new failure path; consider whether the regular-DB path can be a degraded fallback.
Same TRACK answer, with two additions. (1) The decision is rarely vector-DB-or-not. It's almost always a portfolio: regular DB for the metadata join and filtering, vector DB for the similarity lookup, with the result reconciled. The 'or' framing in the question is itself worth unpacking. (2) The K dimension is usually the binding constraint and the one candidates underweight. A team that has never operated a vector DB will struggle with ANN index rebuilds, parameter tuning, and memory-vs-recall trade-offs that are not visible at architecture time. If the team's existing skill is Postgres-heavy, the right answer might be 'extend Postgres with pgvector and pay the small A cost' even when the technical comparison favors a dedicated vector DB. The pattern: K often beats T, A, and C in mature engineering decisions, and naming K is the move that demonstrates the difference between architecting and shipping.
Questioned the framing first (it's a portfolio, not a binary), then named K as the dimension most candidates underweight. The pattern — K is what separates 'right architecture in vacuum' from 'right architecture for this team' — is portable to every greenfield-vs-existing-stack question, every build-vs-buy question, every adopt-new-technology question. Naming K is the L7 signal.
| Dimension | Smaller fast model | Mid-size model | Large frontier model |
|---|---|---|---|
| T — Throughput (QPS/$) | Best — high QPS/$. | Moderate QPS/$. | Worst — low QPS/$. |
| R — Recovery (failure modes) | Simple to operate; well-understood failure modes. | Standard recovery profile; most production stacks support it. | Vendor dependency = worse blast radius if API degrades. |
| A — Accuracy | Lowest. Often 'good enough' for narrow tasks. | Often close to frontier on practical tasks. | Best on hard tasks; marginal on easy ones. |
| C — Cost per request | Lowest cost per request. | Moderate cost per request. | 10×+ cost per request vs smaller models. |
| K — Team operational skill | Requires distillation/finetuning skill if you're going below frontier. | Standard ops skill; widely deployed. | Simplest to operate — vendor handles ops. |
| Choose when | When A bar is achievable with a narrow task, T budget is tight, and team has distillation skill. | Default for most production workloads. Best balance unless one dimension is binding. | When A is the single binding constraint, the task is hard, and the cost budget supports it. Otherwise consider cascading: large model only on uncertain or hard queries. |
There is no universal answer. The right choice is the one that fits the binding TRACK dimension. Cascading routing — small for most queries, large for uncertain — is often the right Staff answer because it lets you collect the T and C benefits of small while preserving A on the hard cases.
The interviewer asks any question of the form 'why not just use [X]?' or 'why not [the obvious choice]?'
Run TRACK out loud. List which dimensions [X] wins on and which it loses on, then state which dimension is binding for this design and why.
Practice this. Time yourself.
You have 10 minutes. For each of these three trade-off questions, write a TRACK-decomposed answer in 3-5 sentences. Name which dimensions move which way, which is binding, and what you'd commit to. Then write what's missing if you skipped any TRACK dimension. (a) 'Should we use a streaming pipeline or batch features?' (b) 'Should we cache LLM responses?' (c) 'Should we self-host the model or use an API?'
Self-assessment rubric
| Dimension | Weak | Passing | Strong | Staff bar |
|---|---|---|---|---|
| Dimension naming | Named only T and C. Missed R, A, or K. | Named all five TRACK dimensions for at least 2 of 3 questions. | Named all five for all three questions; correctly identified the binding dimension in each. | Named all five plus the often-missed K dimension in each, and explicitly noted that K is binding in at least one of the three (typically 'self-host vs API'). |
| Commitment + trade-off | Said 'it depends' for any of the three. | Committed to a position for each. | Committed and named the binding dimension. | Committed, named the binding dimension, named the willingness-to-trade ratio that would change the answer (e.g., 'if A budget tightens by 10%, the answer flips to X'). |
| Portfolio framing | Treated each as a binary. | Recognized at least one as a portfolio rather than binary (e.g., 'cache the easy queries, route the hard to live inference'). | Framed all three as portfolios where appropriate. | Explicitly named that 'or' framings often hide portfolios, and used TRACK to identify which dimensions improve in the portfolio framing. |
Reveal model solution
Common failures
- ✗Listed all five dimensions but did not name which was binding. TRACK without a binding dimension is enumeration, not commitment.
- ✗Missed K on the self-host vs API question. K is usually the dominant dimension for that question, and missing it is the canonical Senior failure.
- ✗Treated the cache question as binary instead of as a portfolio. Most cache questions in production are 'which queries' not 'cache or not.'
- ✗Did not name the willingness-to-trade ratio for any of the three. The trade ratio is what converts TRACK from a description into a decision.
The TRACK Wallet Card
TRACK dimensions
- T — Throughput
- QPS, tokens/sec, training examples/sec. The 'scale' dimension.
- R — Recovery
- RTO, RPO, blast radius, reversibility. The most-forgotten dimension.
- A — Accuracy
- Quality metric — recall, precision, calibration, faithfulness.
- C — Cost
- Per-request, per-training-run, per-incident dollars.
- K — Knowledge
- What the team knows; the often-binding hidden dimension.
Reflex sentences
- Replace 'it depends'
- 'It depends on whether [TRACK dimension]; if A then X, if B then Y.'
- Defending against 'why not X'
- 'X wins on [T or A]; loses on [R, C, or K]. Binding for us is [Z].'
- Naming the trade ratio
- 'I'd commit to X. If [TRACK dimension] tightens by Y%, the answer flips to Z.'
- Portfolio framing
- 'It's rarely a binary — use X for the easy case, Y for the hard case.'
L6 candidate at a top AI lab, fifth round. Strong technical content throughout the loop, especially on LLM serving and RAG. Interviewer was a Staff engineer on the platform team who pushed hard on trade-off questions.
Every time the interviewer asked a 'why not X?' question — 'why not GPT-4 for everything?', 'why not just self-host?', 'why not cache?' — the candidate gave a credible single-dimension answer ('too expensive', 'too much ops', 'cache invalidation is hard'). Each answer was technically correct. None of them named more than one TRACK dimension. After six such exchanges, the interviewer's confidence in the candidate's depth dropped — not because the answers were wrong, but because the candidate appeared unable to hold multiple trade-off dimensions in their head simultaneously.
Minute 38, after the sixth single-dimension trade answer, the interviewer asked the closing question: 'Tell me about a design decision you got wrong, in retrospect.' The candidate told a credible story about underestimating cost. The interviewer's note read 'consistently treats trade-offs as single-dimension; suggests difficulty operating in real Staff-level decisions where multiple dimensions move at once.' The candidate scored L5 on a loop where the technical content alone would have justified L6.
Once per probe, run TRACK out loud. 'GPT-4 wins on A; loses on T and C; K is comparable for us because we already use it. Binding for us is C. Commit: smaller-model-with-fallback.' Each 'why not X' question becomes a 30-second TRACK pass that demonstrates exactly what was missing from the actual answers: the ability to hold multiple dimensions simultaneously. The technical knowledge was already there.
Senior candidates know the trade-offs exist. Staff candidates name the dimensions and commit to the binding one. The vocabulary is what makes the difference visible in the room. Practice TRACK as a reflex, not as a checklist — the goal is to be unable to give a single-dimension trade-off answer even under pressure.