Search
Marketplace Search & Ranking
Search and rank listings across a two-sided marketplace with personalization and business constraints.
Scale to anchor on
Hundreds of millions of listings, sub-200 ms p99 search latency, billions of queries/day.
Requirements
Functional
- Full-text and filter-based search.
- Personalized ranking based on user history and intent.
- Diversity, freshness, and policy constraints.
- Trust and safety: hide flagged or fraudulent listings.
Non-functional
- Low latency.
- Index update latency < 1 minute for newly listed inventory.
- Resilience to abusive query patterns.
High-level architecture
An inverted index (Elasticsearch / Lucene) handles lexical recall and filters. A dense retriever produces semantic candidates. Fusion combines both; a learned ranker scores final ordering. Trust signals and business rules apply in a re-ranking stage.
Components
Indexing pipeline
Streams new and updated listings into search index.
Lexical index
Inverted index for keywords, filters, geo.
Vector index
ANN for semantic / personalized retrieval.
Fusion + ranking
RRF or weighted fusion; learned ranker for ordering.
Trust & safety filter
Removes flagged listings post-ranking.
Key decisions
Hybrid retrieval.
Pure lexical misses semantics; pure dense misses exact-match codes and rare entities.
Async index updates.
Synchronous index writes couple search availability to write availability.
T&S as a final filter, not in ranking score.
Flagged content should be hidden, not down-ranked — separation keeps policy clean.
Personalization via dense retrieval.
Personalization at the index level keeps ranking simpler and reuses precomputed user embeddings.
Pitfalls
- Synchronous indexing — write throughput drops when search has issues.
- Single-stage ranking — no separation of recall and precision.
- Trust signals leaking into ranking scores without clear policy.
- Not handling cold-start sellers.
Follow-up questions
- How do you handle 1M new listings per hour?
- How do you measure search quality?
- How do you handle a regulatory takedown request quickly?