The Interviewer's Rubric: What L6/L7 Actually Scores
The official rubric is generic. The unspoken rubric is what gets you hired. This lesson names the 7 specific signals interviewers grade in any AI design round, and the moves that signal each one. Every later lesson in this course is one or two of these seven signals expanded into a technique.
The phrase 'demonstrates strong technical depth' appears on every major company's official interview rubric. It does not tell you what to do. After the loop, the debrief is conducted in different language entirely — 'committed to a position,' 'named the trade-off,' 'tied the failure mode back to her own design,' 'punted on the objective.' The official rubric is the surface. The seven signals in this lesson are what the debrief is actually about. Practicing them individually is more productive than practicing 'depth' as one thing.
None of these signals are technical. They are behaviors a senior engineer demonstrates while doing technical work. You already have the technical content — the loop is not testing whether you know what an embedding is. It is testing whether you can convert that knowledge into the moves a Staff engineer makes by reflex. The rest of the course is technique-on-technique-on-technique against these seven signals.
The 7-Signal Rubric
The official rubric at every major company says things like 'demonstrates strong technical depth.' That sentence is not what interviewers are scoring. They are scoring seven specific behaviors that compose into 'depth' but that you can practice individually and signal deliberately. This lesson names those seven signals, and the rest of the course is built against them.
- 1Signal 1 — Commitment under uncertaintyDo you commit to a position with appropriate hedges, or do you enumerate options without choosing? 'I'd go with X, knowing it costs us Y' scores higher than 'we could do X, Y, or Z, depending.' The interviewer is hiring someone who can decide, not someone who can survey.
- 2Signal 2 — Trade-off explicitnessDo you name the dimensions of your trade-offs? 'It's a latency-vs-throughput trade-off' beats 'it depends.' The vocabulary of trade-offs is the vocabulary of engineering maturity. Lesson 1.3 builds it out.
- 3Signal 3 — Diagnostic before fixWhen the interviewer hands you a problem, do you diagnose first or jump to a solution? The diagnostic move ('what's the actual bottleneck?', 'what's the objective?') is what separates designing the right system from solving the wrong one quickly.
- 4Signal 4 — Operational realityDo you ask about the team that will own this — deploy frequency, on-call burden, current skills, existing infra? Or do you design greenfield and ignore that the team has to live with it? Operational questions signal that you have led teams, not just shipped code.
- 5Signal 5 — Closing the loopWhen you make a decision at minute 4, do you reference it at minute 35? Tying a late decision back to an early commitment is the highest-craft move the interview rewards. It demonstrates that you held the design coherent across 45 minutes — a Staff signal that no technical content can substitute for.
- 6Signal 6 — Failure modes downstream of your own designWhen asked 'what's the biggest failure mode?', do you name a generic failure (latency spike, OOM) or a failure that your specific design choices make more likely? The latter is what distinguishes someone who has operated their own systems from someone who has only built them.
- 7Signal 7 — Objective integrityDo you hold the objective — commit to a primary metric, propose a trade-off ratio, revise it under pushback — or do you punt to 'the team decides'? Holding the objective is the L6/L7 watershed. Punting reads as 'I am ready to execute on someone else's objective,' which is Senior.
Read this rubric before any system design loop. Use it as the lens that converts technical knowledge into the moves the interviewer is actually scoring. Every other lesson in this course is one or two of these seven signals expanded into a technique.
Consider the question 'how would you reduce latency?' The Senior answer ('quantization, batching, more GPUs') hits zero signals. The Staff answer ('I'd want TTFT and inter-token separately first — different phases, different fixes; commit me to the diagnostic before I propose anything') hits Signals 1, 2, 3, and primes Signal 5. Same technical content, very different score.
The interviewer asks: 'Walk me through what you'd do if the system started failing at p99 in production at 3am.' Your first sentence is graded.
An operational-reality probe disguised as a technical question. The interviewer is checking Signal 4 (operational reality) and Signal 6 (failure modes you can name).
I'd look at the logs and dashboards to figure out what's failing, then start triaging.
I'd start with the on-call dashboard — check latency, error rate, and resource utilization across the request path. Look for recent deploys that correlate with the regression. If it's a known failure mode, follow the runbook; if not, page in a teammate.
Three things in parallel. One — check whether there was a deploy in the last 24 hours; most p99 regressions in this kind of system are caused by recent pushes, not external shifts, so rollback is the default. Two — open the per-layer latency dashboard and see which layer's p99 moved. The system is decomposed by phase for exactly this moment. Three — check whether the upstream caller is sending a different shape of request — long prompts, new traffic class — that the system hasn't seen before. After 90 seconds I should know whether to rollback, mitigate at re-rank, or escalate. The runbook reflects that decision tree.
Same triage with two operational additions. (1) I'd own the on-call for this system. Not as a stretch goal — explicitly, written down. The team that designs the system has to be the team paged on it; otherwise the design ossifies because the people who would change it don't feel the cost. (2) The 3am page itself is data: which failure mode triggered the page, and was the dashboard the on-call needed actually populated? Every page that didn't lead to a 60-second diagnosis is a missing piece of observability — I'd track those as defects against the observability roadmap. The pattern: on-call is not an external cost imposed on the design team; it is the closed-loop signal that makes the design improve. Treating it that way is what separates a system you built from a system you own.
Named on-call ownership as a design decision, not a staffing decision. Connected the 3am page to the observability roadmap as a closed-loop improvement. Demonstrated Signal 4 (operational reality) and Signal 6 (the failure mode that 'observability didn't catch this' is a failure of your own design choices) in the same answer. The pattern of 'the page is data; the observability gap is the defect' is a portable move the reader uses on every system design question.
The interviewer asks an open-ended question with no obvious 'right' answer ('how would you start', 'what would you build first', 'where would you spend a saved millisecond').
These are commitment probes (Signal 1). The wrong response is enumeration; the right response is a single committed position with the trade-off named.
The seven signals as they appear in the post-interview debrief.
What they score
- ·'Committed to a position' — Signal 1. Did the candidate say 'I'd go with X,' or did they offer a survey of options?
- ·'Named the trade-off' — Signal 2. Did they say what they were choosing between, in two-or-three-word dimensions, or did they say 'it depends'?
- ·'Diagnosed before fixing' — Signal 3. When given a problem, did they ask the diagnostic question, or did they propose a fix and then explain it?
- ·'Asked about operational reality' — Signal 4. Did they ask about on-call, deploys, team capacity, or did they design greenfield in a vacuum?
- ·'Closed the loop' — Signal 5. Did they reference a decision from minute 4 in an answer at minute 35?
- ·'Owned the failure mode' — Signal 6. Did they name a failure that their own design choices made more likely, or a generic failure?
- ·'Held the objective' — Signal 7. Did they commit to a primary metric and a trade-off ratio, or did they defer to the team?
Why it's not on the rubric
These bullets are not on the rubric document because they are behaviors, not knowledge. The rubric is written to be calibratable across interviewers; the debrief is conducted in the real language people use about their colleagues. The signals are how you get talked about after the loop — and that conversation is what determines the offer.
How to signal it
- →Practice committing. Replace 'we could do X, Y, or Z' with 'I'd commit to Y; the trade-off is Z.'
- →Build trade-off vocabulary (Lesson 1.3 — TRACK). Make 'it depends' into 'it depends on whether [specific variable]; if A then X, if B then Y.'
- →Lead with diagnosis. The first sentence in response to any 'how would you fix' question should be a diagnostic question, not a fix.
- →Ask one operational question per interview: 'how often does this team currently deploy?' or 'what's the current on-call burden?' The question itself is the signal.
- →At minute 30+, deliberately reference a decision you made at minute 5. Even saying 'this connects to the willingness-to-trade ratio we established earlier' is enough.
- →When asked about failure modes, name one your own design caused. The phrase 'this system's daily retraining makes silent drift more likely' is worth more than 'GPU OOM.'
- →When asked about the objective, commit. The phrase 'I'd commit to X as primary with Y as guardrails and a Z ratio' beats every form of 'the team decides.'
Practice this. Time yourself.
You have 7 minutes. The interviewer just asked: 'You've designed this RAG system. What's the most important failure mode?' Write three answers to this question — one each scoring at L4, L6, and L7 against the 7-Signal Rubric. For each, name which signals it hits and which it misses. Time yourself. The goal is to internalize the difference between Senior and Staff answers on the same factual content.
Self-assessment rubric
| Dimension | Weak | Passing | Strong | Staff bar |
|---|---|---|---|---|
| L4 answer authenticity | L4 answer is a strawman ('we'd crash'). | L4 names a generic failure mode (latency, OOM). | L4 reads like a real Senior-tier answer — credible but generic. | L4 captures the specific failure mode an inexperienced candidate would actually propose under interview pressure (often 'the vector database goes down'). |
| L6 answer signal hit | L6 is just a more detailed L4. | L6 hits Signals 1 and 2 (commits, names trade-off). | L6 hits Signals 1, 2, 3, and proposes a diagnostic before a fix. | L6 hits Signals 1, 2, 3, 4 and names operational reality — what the team has to know to debug this. |
| L7 answer signal hit + meta-pattern | L7 is L6 with more words. | L7 hits Signal 6 (failure mode downstream of own design). | L7 hits Signals 5 and 6 — connects back to an earlier design decision. | L7 hits Signals 5, 6, 7 — references the earlier objective commitment, names the failure that the design's own choices made more likely, and connects it to a portable pattern the reader can use elsewhere. |
Reveal model solution
Common failures
- ✗Wrote three different failure modes instead of three different framings of the same failure mode. The drill is about how the same content sounds at different levels, not about cataloging failures.
- ✗L7 was just longer than L6. Length is not the signal; signal-hit count is.
- ✗Didn't reference any earlier-decision commitment in L7. Signal 5 (closing the loop) requires an earlier decision to close on — write the L7 answer as if it's coming at minute 35 of a 45-minute interview where prior commitments exist.
- ✗Used generic 'best practices' language. The rubric grades specificity. 'Silent drift' beats 'monitoring.' 'Nightly index rebuild' beats 'staleness.'
The 7-Signal Wallet Card
The seven signals (memorize the names)
- 1. Commitment
- Did you commit to a position with the trade-off named?
- 2. Trade-off
- Did you say what dimension you were choosing between?
- 3. Diagnostic
- Did you diagnose before proposing a fix?
- 4. Operational
- Did you ask about on-call, deploys, team capacity?
- 5. Close the loop
- Did you reference an earlier decision in a later answer?
- 6. Own the failure
- Did you name a failure downstream of your own design choices?
- 7. Hold the objective
- Did you commit to a primary metric and trade ratio?
Reflex sentences (memorize these)
- Commitment
- 'I'd commit to X; the trade-off is Y.'
- Trade-off
- 'It depends on whether [specific]; if A then X, if B then Y.'
- Diagnostic
- 'Before I propose a fix, I want [specific signal] first.'
- Operational
- 'How often does this team currently deploy?'
- Loop-closing
- 'This connects back to [earlier decision] — here's how.'
- Own failure
- 'The biggest failure mode is downstream of our choice to [X].'
- Objective
- 'I'd commit to X as primary; willingness-to-trade ratio is Y.'
Composite from interviewer debriefs across three companies. Candidates with strong technical content and 5+ years of relevant experience, scoring at L5 or low-L6, repeatedly missed promotion to L6/L7 despite excellent system design fundamentals.
In every case the failure was the same: the candidate could answer technical questions correctly but defaulted to enumeration over commitment, to surveys over decisions, and to clarifying questions in place of held positions. They asked good questions but did not act on the answers as commitments. They referenced techniques but did not name trade-offs. They proposed fixes but did not lead with diagnosis. The interviewer left with the impression of a strong engineer who would execute well on someone else's design — not someone who would set the design themselves.
Each interview had a moment where the candidate could have crossed from Senior to Staff with a single sentence. 'I'd commit to X.' 'The trade-off is Y.' 'Before I propose a fix, what's the actual signal?' 'This connects back to the choice we made at minute 4.' In every case, the candidate had the technical knowledge to say the sentence. They had not internalized the move as reflex, and under interview pressure they defaulted to enumeration.
Practice the seven reflex sentences from the wallet card until they come without thought. The technical content is already there; the moves are the gap. Most candidates can close that gap in two weeks of deliberate mock interviews focused on signal-hit count rather than technical correctness.
Senior engineers know things. Staff engineers commit to things. The seven signals are the practical manifestation of that distinction in an interview room. Practice them as moves, not as principles.