Why Your B-tree Index Is Useless for Semantic Search — And What HNSW Does Instead

For software engineers who know what an index is and what an embedding is.

Start Here: A Question You Think You Already Know the Answer To

You have a chunks table with 50,000 rows. Each row has a text column and a vector column — a list of 1536 floating point numbers representing the semantic meaning of that text.

A user query arrives. You embed it into another 1536-dimensional vector. Now you want the most semantically similar chunks.

Your instinct: Just put an index on the vector column.

Before you do that — stop. Ask yourself: what property of a B-tree index makes it fast?

A B-tree works because values have a natural, universal sort order. “Mumbai” always comes after “Kolkata” regardless of what you’re searching for. That stable ordering lets the database eliminate half the search space at every level — giving you O(log n) instead of O(n).

Now here is the uncomfortable question:

Can you sort 1536-dimensional vectors the same way?

Why B-tree Fails: Similarity Has No Universal Order

Cosine similarity — the measure of closeness between two vectors — is not a property of a single vector. It is a relationship between two vectors.

The similarity ranking of your 50,000 chunks changes completely with every new query. A chunk about “Typhoid symptoms” ranks highest for one query and near the bottom for another.

There is no universal ordering. No “this chunk always comes before that chunk.” B-tree’s entire superpower — pre-sorted, query-independent structure — simply does not exist here.

B-tree is dead for vector search. So what’s left?

The Honest Fallback: Brute Force

Without a pre-sorted structure, the only option is to compare the query vector against every single chunk vector — compute cosine similarity 50,000 times and return the top results.

Let’s calculate the real cost:

50,000 chunks
× 1,536 dimensions (floating point operations per comparison)
= ~77M operations per query

Scale that to 1,000 queries per second and you’re doing 77 billion floating-point operations every second - just for retrieval, before the LLM has even started thinking.

This is the full table scan of the vector world — correct, exact, and completely unacceptable at scale.

Something has to give. But what?

The Key Insight: You Don’t Need the Perfect Answer

Here is where semantic search differs fundamentally from relational queries.

In a relational database, being wrong is catastrophic. Returning the wrong user_id breaks everything. Exactness is non-negotiable.

In semantic search, ask yourself: if exact search returns the chunk ranked #1 with similarity 0.921, and an approximate method returns the chunk ranked #2 with similarity 0.918 — does the student’s answer actually get worse?

Almost never. Because:

The LLM synthesises answers from multiple chunks anyway
Chunks about the same topic cluster closely in vector space — their similarity scores are near-identical
“Close enough” is genuinely good enough

This trade-off has a name: Approximate Nearest Neighbour (ANN) search.

Sacrifice occasional exactness. Gain dramatic speed. For semantic search, this is a completely valid trade.

The “how often do we find the true nearest neighbour” is measured by recall. A recall of 0.95 means that, on average, 95% of the true nearest neighbours end up in the results you return. For a top-10 retrieval, that’s 9.5 of the right chunks appearing in your 10-chunk output. For RAG pipelines, this is more than sufficient.

Now the real engineering question: how do you build an index that avoids comparing against all 50,000 chunks?

The Graph Idea: Navigate Like a Network

Imagine you move to a new city and want to find people with similar interests. You have two strategies:

Strategy A: Interview all 10 million residents one by one
Strategy B: Find one person with similar interests, ask them who they know, and follow the network

Strategy B is faster for one simple reason: similarity clusters. People with similar interests know other people with similar interests. You navigate through a connected graph toward your target rather than scanning everyone.

The same logic applies to vectors. Semantically similar chunks live near each other in vector space. Build a proximity graph — each chunk node connects to its most similar neighbouring chunks — and you can navigate toward your query vector by following edges.

This is the Navigable Small World (NSW) idea. Even with only local connections, a well-connected graph lets you reach any node from any other node in a surprisingly small number of hops.

But this flat graph has a serious weakness.

The Flat Graph Problem: You Dropped In at the Wrong Neighbourhood

With a flat proximity graph, every node connects only to its nearest neighbours. If your query is about “Typhoid symptoms” but you start your traversal at a node about “antibiotic resistance” — you are making dozens of small local hops to cross a large semantic distance.

The starting point is essentially random, and local connections alone cannot navigate large distances efficiently.

How do you solve a navigation problem that requires both long-range and short-range movement?

Think about how you navigate geographically. You don’t open a street-level map of the entire country when driving from Mumbai to a specific street in Delhi. You:

First zoom out — find the right city
Zoom in — find the right neighbourhood
Zoom in further — find the exact street

Each level uses a different scale of connections. Highways at the top. Local roads at the bottom.

Apply the same hierarchy to your proximity graph.

Hierarchical Layers: The H in HNSW

Build multiple layers of graphs stacked on top of each other:

Layer 3 (top):  ~50 nodes    — long-range connections, coarse navigation
Layer 2:        ~500 nodes   — medium-range connections
Layer 1:        ~5,000 nodes — shorter-range connections
Layer 0 (base): 50,000 nodes — all chunks, fine-grained local search

Search always starts at the top layer, navigates to the best entry point, then drops into the next layer — using the landing point from above as the starting point below. Each layer narrows the search region. By the time you reach Layer 0, you are already in the right semantic neighbourhood and only need local hops to find the nearest chunk.

This is Hierarchical Navigable Small World — HNSW.

One question remains: which nodes get promoted to upper layers?

Not the “most important” ones. Not the most central ones. Content-based selection is dangerous — it could leave entire topics unrepresented in upper layers, crippling recall for those topics.

The answer is elegantly simple: random selection via a probability decay function.

Every node  → enters Layer 0
~10% of nodes → randomly promoted to Layer 1
~1% of nodes  → randomly promoted to Layer 2
~0.1% of nodes → randomly promoted to Layer 3

The values mentioned above are a teaching simplification. The exact decay is controlled by a parameter (mL) tied to M, but the shape is what matters: exponentially fewer nodes as you climb. Layer 0 has everything. The top layer usually has a handful.

Randomness guarantees unbiased coverage. No topic gets systematically excluded. No human judgement needed about what is “important.”

How Search Actually Works

When a query arrives:

Start at Layer 3 — compare against the handful of nodes there, find the closest one
Drop to Layer 2 — use that node as the entry point, greedily hop to closer neighbours
Drop to Layer 1 — same process, narrowing into the right semantic region
Drop to Layer 0 — now do a thorough local search among nearby chunks
Return top-k results

At no point do you compare against all 50,000 chunks. You navigate toward the answer through a hierarchy of connected graphs.

The Three Trade-offs You’re Actually Making

HNSW is not free. Here is the honest cost:

	Brute Force	HNSW
Query speed	Slow (O(n×d))	Fast (O(log n))
Exactness	Yes	No (approximate)
Build cost	None	High — graph construction at ingest time
Memory cost	Low	Higher — stores graph connections alongside vectors
Works for semantic search	Yes, slowly	Yes, efficiently

The Three Knobs

HNSW exposes three parameters that let you tune the recall-speed trade-off:

M — Number of connections per node (graph density)

Higher M → denser graph → better recall → more memory and slower build

ef_construction — How many candidate neighbours are evaluated when building the graph

Higher ef_construction → better quality graph → slower index build

ef_search — How many candidates are evaluated at query time

Higher ef_search → better recall → slower queries
Lower ef_search → faster queries → occasionally misses the true nearest neighbour

Practical Guidance: Which Knob to Turn First

If your recall drops below your acceptable threshold after evaluation:

Step 1 — Increase ef_search first. Zero rebuild cost. Takes effect immediately. Trades a few milliseconds of query latency for better recall. Try doubling it and re-evaluate.

Step 2 — Only if that’s not enough, increase M and ef_construction. This requires a full index rebuild. The embeddings themselves are reused - you’re only reconstructing the graph - but for large tables it still takes real CPU and IO time and locks the index during rebuild. Exhaust query-time tuning first.

Always exhaust query-time tuning before paying the build-time cost.

The Full Picture in One Glance

Why not B-tree?
→ No universal sort order for vectors
  (similarity is relative to each query)

Why not brute force?
→ ~77M operations per query, unacceptable at scale

Why ANN?
→ Semantic search doesn't need exactness
  "Good enough" chunk → same answer quality
  Trade recall for speed

Why graph-based (NSW)?
→ Navigate similarity space like a network
  Follow connections toward query region

Why hierarchical (H)?
→ Flat graph = too many hops across large distances
  Multiple layers = coarse navigation first, fine search last
  Upper layers selected randomly → unbiased topic coverage

The three knobs:
M              → graph density
ef_construction → build-time thoroughness
ef_search      → query-time thoroughness (tune this first)

The One-Line Summary

HNSW trades a small, controlled amount of exactness — which semantic search doesn’t need anyway — for search that is orders of magnitude faster than scanning every vector, by navigating a hierarchical graph from coarse to fine.

Your B-tree knew where to go because values had order. HNSW knows where to go because similar things connect to similar things — and some of those connections skip far ahead.

If you’re implementing this in pgvector, the default HNSW parameters are m=16 and ef_construction=64. Start with ef_search=40 and tune upward if your recall evaluation shows gaps.

Start Here: A Question You Think You Already Know the Answer To#

Why B-tree Fails: Similarity Has No Universal Order#

The Honest Fallback: Brute Force#

The Key Insight: You Don’t Need the Perfect Answer#

The Graph Idea: Navigate Like a Network#

The Flat Graph Problem: You Dropped In at the Wrong Neighbourhood#

Hierarchical Layers: The H in HNSW#

How Search Actually Works#

The Three Trade-offs You’re Actually Making#

The Three Knobs#

Practical Guidance: Which Knob to Turn First#

The Full Picture in One Glance#

The One-Line Summary#

Comments