Interviews Vector
Back to Roadmap
15
22 lessons

Autonomous Systems

Long-horizon agents, self-improvement, and the 2026 safety stack.

01

From Chatbots to Long-Horizon Agents (METR)

Learn
Python

In 2023 a chatbot answered a question in one turn. In 2026 a frontier model routinely runs minutes to hours on a single task. METR's Time Horizon 1.1 benchmark (January 2026) pu…

02

STaR, V-STaR, Quiet-STaR: Self-Taught Reasoning

Learn
Python

The smallest possible self-improvement loop sits inside the rationale. A model generates a chain of thought, keeps the ones that land on correct answers, and fine-tunes on those…

03

AlphaEvolve: Evolutionary Coding Agents

Learn
Python

Pair a frontier coding model with an evolutionary loop and a machine-checkable evaluator. Let the loop run long enough. It discovers a 4x4 complex-matrix multiplication procedur…

04

Darwin Gödel Machine: Self-Modifying Agents

Learn
Python

Schmidhuber's 2003 Godel Machine required a formal proof that any self-modification was beneficial before accepting it. That proof is impossible in practice. Darwin Godel Machin…

05

AI Scientist v2: Workshop-Level Research

Learn
Python

Sakana's AI Scientist v2 (Yamada et al., arXiv:2504.08066) runs the full research loop: hypothesis, code, experiments, figures, writeup, submission. It is the first system to ha…

06

Automated Alignment Research (Anthropic AAR)

Learn
Python

Anthropic ran parallel teams of Claude Opus 4.6 Autonomous Alignment Researchers in independent sandboxes, coordinating via a shared forum whose logs live outside any sandbox (s…

07

Recursive Self-Improvement: Capability vs Alignment

Learn
Python

Recursive self-improvement (RSI) is no longer speculation. The ICLR 2026 RSI Workshop in Rio (April 23-27) framed it as an engineering problem with concrete tooling. Demis Hassa…

08

Bounded Self-Improvement Designs

Learn
Python

Research has converged on four primitives for bounding a self-improvement loop. Formal invariants that must hold across every edit. Alignment anchors that cannot be modified. Mu…

09

Autonomous Coding Agent Landscape (SWE-bench, CodeAct)

Learn
Python

SWE-bench Verified went from 4% to 80.9% in under three years. Same Claude Sonnet 4.5 scored 43.2% on SWE-agent v1 and 59.8% on Cline autonomous — the scaffolding around the mod…

10

Claude Code Permission Modes and Auto Mode

Learn
Python

Claude Code exposes seven permission modes. "plan" asks before every action, "default" asks only for risky ones, "acceptEdits" auto-approves file writes but still confirms shell…

11

Browser Agents and Indirect Prompt Injection

Learn
Python

ChatGPT agent (July 2025) merged Operator and deep research into one browser/terminal agent and set BrowseComp SOTA at 68.9%. OpenAI shut Operator down August 31, 2025 — consoli…

12

Durable Execution for Long-Running Agents

Learn
Python

Production long-horizon agents do not run in `while True`. Every LLM call becomes an activity with checkpoint, retry, and replay. Temporal's OpenAI Agents SDK integration went G…

13

Action Budgets, Iteration Caps, Cost Governors

Learn
Python

A mid-sized e-commerce agent's monthly LLM cost jumped from $1,200 to $4,800 after its team enabled the "order-tracking" skill. That is not a pricing bug. That is an agent that …

14

Kill Switches, Circuit Breakers, Canary Tokens

Learn
Python

A kill switch is a boolean held outside the agent's edit surface — a Redis key, a feature flag, a signed config — that disables the agent entirely. A circuit breaker is finer-gr…

15

HITL: Propose-Then-Commit

Learn
Python

The 2026 consensus on HITL is specific. It is not "the agent asks, the user clicks Approve." It is propose-then-commit: the proposed action is persisted to a durable store with …

16

Checkpoints and Rollback

Learn
Python

Every graph-state transition persists. When a worker crashes, its lease expires and another worker picks up at the latest checkpoint. Cloudflare Durable Objects hold state acros…

17

Constitutional AI and Rule Overrides

Learn
Python

Anthropic's January 22, 2026 Claude Constitution runs 79 pages and is CC0. It moves from rule-based to reason-based alignment and establishes a four-tier priority hierarchy: (1)…

18

Llama Guard and Input/Output Classification

Learn
Python

Llama Guard 3 (Meta, Llama-3.1-8B base, fine-tuned for content safety) classifies both LLM inputs and outputs against an MLCommons 13-hazard taxonomy across 8 languages. A 1B-IN…

19

Anthropic Responsible Scaling Policy v3.0

Learn
Python

RSP v3.0 went into effect February 24, 2026, replacing the 2023 policy. Two-tier mitigation: what Anthropic will do unilaterally vs what is framed as an industry-wide recommenda…

20

OpenAI Preparedness Framework and DeepMind FSF

Learn
Python

OpenAI Preparedness Framework v2 (April 2025) introduces Research Categories — Long-range Autonomy, Sandbagging, Autonomous Replication and Adaptation, Undermining Safeguards — …

21

METR Time Horizons and External Evaluation

Learn
Python

METR (ex-ARC Evals) is an independent 501(c)(3) since December 2023. Their Time Horizon 1.1 benchmark (January 2026) fits a logistic curve to task-success probability vs log(exp…

22

CAIS, CAISI, and Societal-Scale Risk

Learn
Python

The Center for AI Safety (CAIS, San Francisco, founded 2022 by Hendrycks and Zhang) publishes the four-risk framework — malicious use, AI races, organizational risks, rogue AIs …