Search

Marketplace Search & Ranking

Search and rank listings across a two-sided marketplace with personalization and business constraints.

Scale to anchor on

Hundreds of millions of listings, sub-200 ms p99 search latency, billions of queries/day.

Requirements

Functional

  • Full-text and filter-based search.
  • Personalized ranking based on user history and intent.
  • Diversity, freshness, and policy constraints.
  • Trust and safety: hide flagged or fraudulent listings.

Non-functional

  • Low latency.
  • Index update latency < 1 minute for newly listed inventory.
  • Resilience to abusive query patterns.

High-level architecture

An inverted index (Elasticsearch / Lucene) handles lexical recall and filters. A dense retriever produces semantic candidates. Fusion combines both; a learned ranker scores final ordering. Trust signals and business rules apply in a re-ranking stage.

Components

Indexing pipeline
Streams new and updated listings into search index.
Lexical index
Inverted index for keywords, filters, geo.
Vector index
ANN for semantic / personalized retrieval.
Fusion + ranking
RRF or weighted fusion; learned ranker for ordering.
Trust & safety filter
Removes flagged listings post-ranking.

Key decisions

Hybrid retrieval.
Pure lexical misses semantics; pure dense misses exact-match codes and rare entities.
Async index updates.
Synchronous index writes couple search availability to write availability.
T&S as a final filter, not in ranking score.
Flagged content should be hidden, not down-ranked — separation keeps policy clean.
Personalization via dense retrieval.
Personalization at the index level keeps ranking simpler and reuses precomputed user embeddings.

Pitfalls

  • Synchronous indexing — write throughput drops when search has issues.
  • Single-stage ranking — no separation of recall and precision.
  • Trust signals leaking into ranking scores without clear policy.
  • Not handling cold-start sellers.

Follow-up questions

  • How do you handle 1M new listings per hour?
  • How do you measure search quality?
  • How do you handle a regulatory takedown request quickly?

Related patterns

Further reading