← Course
Module 1 · Lesson 1 · Foundations · 28 min

The Interviewer's Rubric: What L6/L7 Actually Scores

The official rubric is generic. The unspoken rubric is what gets you hired. This lesson names the 7 specific signals interviewers grade in any AI design round, and the moves that signal each one. Every later lesson in this course is one or two of these seven signals expanded into a technique.

The phrase 'demonstrates strong technical depth' appears on every major company's official interview rubric. It does not tell you what to do. After the loop, the debrief is conducted in different language entirely — 'committed to a position,' 'named the trade-off,' 'tied the failure mode back to her own design,' 'punted on the objective.' The official rubric is the surface. The seven signals in this lesson are what the debrief is actually about. Practicing them individually is more productive than practicing 'depth' as one thing.

None of these signals are technical. They are behaviors a senior engineer demonstrates while doing technical work. You already have the technical content — the loop is not testing whether you know what an embedding is. It is testing whether you can convert that knowledge into the moves a Staff engineer makes by reflex. The rest of the course is technique-on-technique-on-technique against these seven signals.

Framework

The 7-Signal Rubric

The official rubric at every major company says things like 'demonstrates strong technical depth.' That sentence is not what interviewers are scoring. They are scoring seven specific behaviors that compose into 'depth' but that you can practice individually and signal deliberately. This lesson names those seven signals, and the rest of the course is built against them.

  1. 1
    Signal 1 — Commitment under uncertainty
    Do you commit to a position with appropriate hedges, or do you enumerate options without choosing? 'I'd go with X, knowing it costs us Y' scores higher than 'we could do X, Y, or Z, depending.' The interviewer is hiring someone who can decide, not someone who can survey.
  2. 2
    Signal 2 — Trade-off explicitness
    Do you name the dimensions of your trade-offs? 'It's a latency-vs-throughput trade-off' beats 'it depends.' The vocabulary of trade-offs is the vocabulary of engineering maturity. Lesson 1.3 builds it out.
  3. 3
    Signal 3 — Diagnostic before fix
    When the interviewer hands you a problem, do you diagnose first or jump to a solution? The diagnostic move ('what's the actual bottleneck?', 'what's the objective?') is what separates designing the right system from solving the wrong one quickly.
  4. 4
    Signal 4 — Operational reality
    Do you ask about the team that will own this — deploy frequency, on-call burden, current skills, existing infra? Or do you design greenfield and ignore that the team has to live with it? Operational questions signal that you have led teams, not just shipped code.
  5. 5
    Signal 5 — Closing the loop
    When you make a decision at minute 4, do you reference it at minute 35? Tying a late decision back to an early commitment is the highest-craft move the interview rewards. It demonstrates that you held the design coherent across 45 minutes — a Staff signal that no technical content can substitute for.
  6. 6
    Signal 6 — Failure modes downstream of your own design
    When asked 'what's the biggest failure mode?', do you name a generic failure (latency spike, OOM) or a failure that your specific design choices make more likely? The latter is what distinguishes someone who has operated their own systems from someone who has only built them.
  7. 7
    Signal 7 — Objective integrity
    Do you hold the objective — commit to a primary metric, propose a trade-off ratio, revise it under pushback — or do you punt to 'the team decides'? Holding the objective is the L6/L7 watershed. Punting reads as 'I am ready to execute on someone else's objective,' which is Senior.
When to use

Read this rubric before any system design loop. Use it as the lens that converts technical knowledge into the moves the interviewer is actually scoring. Every other lesson in this course is one or two of these seven signals expanded into a technique.

Worked example

Consider the question 'how would you reduce latency?' The Senior answer ('quantization, batching, more GPUs') hits zero signals. The Staff answer ('I'd want TTFT and inter-token separately first — different phases, different fixes; commit me to the diagnostic before I propose anything') hits Signals 1, 2, 3, and primes Signal 5. Same technical content, very different score.

Calibration ladder

The interviewer asks: 'Walk me through what you'd do if the system started failing at p99 in production at 3am.' Your first sentence is graded.

An operational-reality probe disguised as a technical question. The interviewer is checking Signal 4 (operational reality) and Signal 6 (failure modes you can name).

L4 · Mid

I'd look at the logs and dashboards to figure out what's failing, then start triaging.

Missed: Treated the question as 'what do you do during triage' rather than 'how do you think about operating this system.' Will be a strong incident-response engineer, not a Staff designer.
L5 · Senior

I'd start with the on-call dashboard — check latency, error rate, and resource utilization across the request path. Look for recent deploys that correlate with the regression. If it's a known failure mode, follow the runbook; if not, page in a teammate.

Missed: Knew the textbook triage steps. Missed the chance to demonstrate ownership and the design-feedback loop that on-call provides.
L6 · Staff

Three things in parallel. One — check whether there was a deploy in the last 24 hours; most p99 regressions in this kind of system are caused by recent pushes, not external shifts, so rollback is the default. Two — open the per-layer latency dashboard and see which layer's p99 moved. The system is decomposed by phase for exactly this moment. Three — check whether the upstream caller is sending a different shape of request — long prompts, new traffic class — that the system hasn't seen before. After 90 seconds I should know whether to rollback, mitigate at re-rank, or escalate. The runbook reflects that decision tree.

Missed: Strong triage answer. Missing the meta-move — naming that the design team should own the on-call and that pages are signal for the observability roadmap.
L7 · Principal

Same triage with two operational additions. (1) I'd own the on-call for this system. Not as a stretch goal — explicitly, written down. The team that designs the system has to be the team paged on it; otherwise the design ossifies because the people who would change it don't feel the cost. (2) The 3am page itself is data: which failure mode triggered the page, and was the dashboard the on-call needed actually populated? Every page that didn't lead to a 60-second diagnosis is a missing piece of observability — I'd track those as defects against the observability roadmap. The pattern: on-call is not an external cost imposed on the design team; it is the closed-loop signal that makes the design improve. Treating it that way is what separates a system you built from a system you own.

What scored L7

Named on-call ownership as a design decision, not a staffing decision. Connected the 3am page to the observability roadmap as a closed-loop improvement. Demonstrated Signal 4 (operational reality) and Signal 6 (the failure mode that 'observability didn't catch this' is a failure of your own design choices) in the same answer. The pattern of 'the page is data; the observability gap is the defect' is a portable move the reader uses on every system design question.

Pattern recognition
When you see

The interviewer asks an open-ended question with no obvious 'right' answer ('how would you start', 'what would you build first', 'where would you spend a saved millisecond').

Think

These are commitment probes (Signal 1). The wrong response is enumeration; the right response is a single committed position with the trade-off named.

Open-ended questions are the rubric's primary instrument for measuring commitment. Candidates who enumerate options look thorough but score poorly because the interviewer cannot grade 'I'd consider X, Y, or Z.' Candidates who commit ('I'd go with Y because of the trade-off Z') can be graded — and the grading is based on the quality of the trade-off named, not the choice itself. You almost never lose points for committing to a defensible position with an explicit trade-off. You almost always lose points for refusing to choose.
Unspoken rubric

The seven signals as they appear in the post-interview debrief.

What they score
  • ·'Committed to a position' — Signal 1. Did the candidate say 'I'd go with X,' or did they offer a survey of options?
  • ·'Named the trade-off' — Signal 2. Did they say what they were choosing between, in two-or-three-word dimensions, or did they say 'it depends'?
  • ·'Diagnosed before fixing' — Signal 3. When given a problem, did they ask the diagnostic question, or did they propose a fix and then explain it?
  • ·'Asked about operational reality' — Signal 4. Did they ask about on-call, deploys, team capacity, or did they design greenfield in a vacuum?
  • ·'Closed the loop' — Signal 5. Did they reference a decision from minute 4 in an answer at minute 35?
  • ·'Owned the failure mode' — Signal 6. Did they name a failure that their own design choices made more likely, or a generic failure?
  • ·'Held the objective' — Signal 7. Did they commit to a primary metric and a trade-off ratio, or did they defer to the team?
Why it's not on the rubric

These bullets are not on the rubric document because they are behaviors, not knowledge. The rubric is written to be calibratable across interviewers; the debrief is conducted in the real language people use about their colleagues. The signals are how you get talked about after the loop — and that conversation is what determines the offer.

How to signal it
  • Practice committing. Replace 'we could do X, Y, or Z' with 'I'd commit to Y; the trade-off is Z.'
  • Build trade-off vocabulary (Lesson 1.3 — TRACK). Make 'it depends' into 'it depends on whether [specific variable]; if A then X, if B then Y.'
  • Lead with diagnosis. The first sentence in response to any 'how would you fix' question should be a diagnostic question, not a fix.
  • Ask one operational question per interview: 'how often does this team currently deploy?' or 'what's the current on-call burden?' The question itself is the signal.
  • At minute 30+, deliberately reference a decision you made at minute 5. Even saying 'this connects to the willingness-to-trade ratio we established earlier' is enough.
  • When asked about failure modes, name one your own design caused. The phrase 'this system's daily retraining makes silent drift more likely' is worth more than 'GPU OOM.'
  • When asked about the objective, commit. The phrase 'I'd commit to X as primary with Y as guardrails and a Z ratio' beats every form of 'the team decides.'
Drill · 7 minutes

Practice this. Time yourself.

You have 7 minutes. The interviewer just asked: 'You've designed this RAG system. What's the most important failure mode?' Write three answers to this question — one each scoring at L4, L6, and L7 against the 7-Signal Rubric. For each, name which signals it hits and which it misses. Time yourself. The goal is to internalize the difference between Senior and Staff answers on the same factual content.

Self-assessment rubric

DimensionWeakPassingStrongStaff bar
L4 answer authenticityL4 answer is a strawman ('we'd crash').L4 names a generic failure mode (latency, OOM).L4 reads like a real Senior-tier answer — credible but generic.L4 captures the specific failure mode an inexperienced candidate would actually propose under interview pressure (often 'the vector database goes down').
L6 answer signal hitL6 is just a more detailed L4.L6 hits Signals 1 and 2 (commits, names trade-off).L6 hits Signals 1, 2, 3, and proposes a diagnostic before a fix.L6 hits Signals 1, 2, 3, 4 and names operational reality — what the team has to know to debug this.
L7 answer signal hit + meta-patternL7 is L6 with more words.L7 hits Signal 6 (failure mode downstream of own design).L7 hits Signals 5 and 6 — connects back to an earlier design decision.L7 hits Signals 5, 6, 7 — references the earlier objective commitment, names the failure that the design's own choices made more likely, and connects it to a portable pattern the reader can use elsewhere.
Reveal model solution
L4: "The vector database could go down. I'd add caching and retries to mitigate that." Hits zero signals. Generic infrastructure failure with a textbook mitigation. The interviewer learns nothing about the candidate's design judgment. L6: "The most likely failure mode is silent retrieval-quality drift. Two diagnostics first — is the retrieval system failing (low recall@k on a labeled set) or is the generation failing (low faithfulness on a labeled answer set)? Without that decomposition I'd be guessing. If retrieval, the fix is index health monitoring plus shadow-evaluation of new index builds. If generation, the fix is faithfulness eval as a release gate. The operational reality is the team has to commit to maintaining both label sets, which is a continuing investment, not a one-time setup." Hits Signals 1 (commits), 2 (names the dimension: retrieval-quality vs generation-quality), 3 (diagnostic first), 4 (operational reality of label-set maintenance). L7: "The single most likely failure is silent retrieval-quality drift, and it's downstream of a choice we made at minute 5 — we decided to rebuild the vector index nightly to support fast content turnover. That choice is what makes drift more likely than it would be in a system that rebuilt weekly with stronger validation gates. The willingness-to-trade ratio we set up earlier — accept some retrieval-quality variance for content freshness — tells us how aggressively to monitor: if drift exceeds the ratio's tolerance, we rollback the nightly cadence. So the failure mode and the mitigation are both consequences of the same earlier decision. The portable pattern: any system that ships fast on a daily cadence is buying drift risk; the mitigation is observability calibrated to the willingness-to-trade ratio negotiated upstream." Hits Signals 5 (closes the loop on the minute-5 cadence decision and the willingness-to-trade ratio), 6 (failure mode is downstream of own design choice), 7 (references the objective commitment), and reveals the portable pattern about daily-cadence systems buying drift risk.

Common failures

  • Wrote three different failure modes instead of three different framings of the same failure mode. The drill is about how the same content sounds at different levels, not about cataloging failures.
  • L7 was just longer than L6. Length is not the signal; signal-hit count is.
  • Didn't reference any earlier-decision commitment in L7. Signal 5 (closing the loop) requires an earlier decision to close on — write the L7 answer as if it's coming at minute 35 of a 45-minute interview where prior commitments exist.
  • Used generic 'best practices' language. The rubric grades specificity. 'Silent drift' beats 'monitoring.' 'Nightly index rebuild' beats 'staleness.'
Artifact · reference card

The 7-Signal Wallet Card

The seven signals (memorize the names)

1. Commitment
Did you commit to a position with the trade-off named?
2. Trade-off
Did you say what dimension you were choosing between?
3. Diagnostic
Did you diagnose before proposing a fix?
4. Operational
Did you ask about on-call, deploys, team capacity?
5. Close the loop
Did you reference an earlier decision in a later answer?
6. Own the failure
Did you name a failure downstream of your own design choices?
7. Hold the objective
Did you commit to a primary metric and trade ratio?

Reflex sentences (memorize these)

Commitment
'I'd commit to X; the trade-off is Y.'
Trade-off
'It depends on whether [specific]; if A then X, if B then Y.'
Diagnostic
'Before I propose a fix, I want [specific signal] first.'
Operational
'How often does this team currently deploy?'
Loop-closing
'This connects back to [earlier decision] — here's how.'
Own failure
'The biggest failure mode is downstream of our choice to [X].'
Objective
'I'd commit to X as primary; willingness-to-trade ratio is Y.'
Post-mortem · anonymized
Setup

Composite from interviewer debriefs across three companies. Candidates with strong technical content and 5+ years of relevant experience, scoring at L5 or low-L6, repeatedly missed promotion to L6/L7 despite excellent system design fundamentals.

What happened

In every case the failure was the same: the candidate could answer technical questions correctly but defaulted to enumeration over commitment, to surveys over decisions, and to clarifying questions in place of held positions. They asked good questions but did not act on the answers as commitments. They referenced techniques but did not name trade-offs. They proposed fixes but did not lead with diagnosis. The interviewer left with the impression of a strong engineer who would execute well on someone else's design — not someone who would set the design themselves.

The moment

Each interview had a moment where the candidate could have crossed from Senior to Staff with a single sentence. 'I'd commit to X.' 'The trade-off is Y.' 'Before I propose a fix, what's the actual signal?' 'This connects back to the choice we made at minute 4.' In every case, the candidate had the technical knowledge to say the sentence. They had not internalized the move as reflex, and under interview pressure they defaulted to enumeration.

What they should have said

Practice the seven reflex sentences from the wallet card until they come without thought. The technical content is already there; the moves are the gap. Most candidates can close that gap in two weeks of deliberate mock interviews focused on signal-hit count rather than technical correctness.

Lesson

Senior engineers know things. Staff engineers commit to things. The seven signals are the practical manifestation of that distinction in an interview room. Practice them as moves, not as principles.