Media

Video Encoding & Ingestion Pipeline

Turn raw uploaded video into device-ready renditions reliably and cheaply at scale.

Scale to anchor on

Hundreds of thousands of new videos/day (UGC) or hundreds/week (studio), petabyte ingest, days-to-minutes turnaround depending on tier.

Requirements

Functional

  • Ingest video; validate; transcode into target ladders.
  • Generate thumbnails, captions, audio tracks.
  • Persist outputs to durable storage.
  • Trigger downstream (CDN propagation, search index update).

Non-functional

  • Resumable on worker failure.
  • Idempotent steps so retries don't corrupt outputs.
  • Cost-efficient — encoding is compute-dominant.

High-level architecture

Uploads land in object storage. A workflow engine orchestrates validation, transcoding, thumbnail extraction, and indexing. Workers consume from queues, persist outputs to storage, and emit events on completion. Spot/preemptible compute for cost.

Components

Upload service
Multipart upload to object storage; emits ingest event.
Workflow engine
Durable, resumable orchestration of the encoding DAG.
Worker pool
Stateless encoders running ffmpeg or similar.
Output store
Object storage for renditions, thumbnails, captions.
Notifier
Emits completion events to CDN, search, and product surfaces.

Key decisions

Durable workflow over ad-hoc retry logic.
Encoding pipelines have many steps and many failure modes; a workflow engine handles them once.
Idempotent step outputs.
Retries are routine; non-idempotent steps cause corruption and duplicated work.
Spot/preemptible for encode workers.
Encoding is batch — preemption is cheap if steps are resumable.
Storage classes per tier.
Hot renditions stay on fast storage; rarely-watched titles move to cheaper tiers.

Pitfalls

  • Tracking state in workers — crashes lose progress.
  • Non-idempotent encode steps cause silent corruption.
  • No back-pressure when the queue grows unbounded.
  • Storing originals in the wrong tier and accumulating cost.

Follow-up questions

  • How do you handle a 4-hour encode that fails 80% through?
  • What's the durability story for the master file?
  • How do you prioritize a popular new release ahead of background work?

Related patterns

Further reading