← All posts

Scaling Vector Search: Lessons from HNSW Performance Optimization

How we optimized Hierarchical Navigable Small World (HNSW) graphs to achieve sub-10ms query latencies on millions of 1536-dimensional embeddings.

Vector search engines form the backbone of modern RAG pipelines. However, scaling index size without ballooning latency requires a deep understanding of HNSW graph construction hyperparameters, specifically M (max outgoing links per node) and efConstruction (size of dynamic candidate list).

In this post, we explore cache locality optimizations, vector quantization techniques (like Scalar Quantization), and memory layouts that reduced memory footprint by 40% while preserving recall rates above 98%.