Stop trusting the vector hype
I remember the first time I replaced a standard keyword search with vector embeddings. It felt like magic. I threw a messy, vague sentence at the database, and it actually returned relevant documents. I felt like a genius. Then, about three days later, a user searched for a specific error code—”ERR_CONNECTION_REFUSED”—and my fancy AI search engine returned a blog post about “emotional disconnection in remote work.” That was embarrassing. The problem with document querying using AI isn’t that the technology is bad. It’s that we’ve been sold a simplified lie: that you can just dump text into an embedding model, store the vectors, and call it a day. If you’re building this for production in 2026, you know that pure vector search has a massive blind spot. It loves “vibes” but hates specifics. So, I stopped trying to force vectors to do everything. Instead, I moved to an adaptive hybrid approach. It’s messier, sure, but it actually works. Here is how I build document query systems now, combining the precision of SQL with the semantic understanding of AI.The “Adaptive” Part
Setting Up the Foundation
I use PostgreSQL withpgvector. It’s standard, it’s reliable, and I don’t have to manage a separate vector database (which is a nightmare for keeping data in sync).
Here is the schema I use. Note that I’m keeping both the raw text for lexical search (BM25/tsvector) and the embedding for semantic search.
-- We need the vector extension, obviously
CREATE EXTENSION IF NOT EXISTS vector;
-- The main documents table
CREATE TABLE document_store (
doc_id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
-- Metadata for filtering (critical for performance)
category_id INT,
created_at TIMESTAMPTZ DEFAULT NOW(),
-- The semantic brain: 1536 dimensions is standard for many models
embedding vector(1536),
-- The keyword anchor: standard Postgres full-text search
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', content), 'B')
) STORED
);
-- Indexing is where you win or lose on speed
-- HNSW index for the vectors (fast approximate nearest neighbor)
CREATE INDEX ON document_store USING hnsw (embedding vector_cosine_ops);
-- GIN index for the keywords
CREATE INDEX ON document_store USING GIN (search_vector);
I can’t stress this enough: use GENERATED ALWAYS AS ... STORED for your tsvector. I used to update these manually in application code, and eventually, I forgot to update one, leading to ghost data. Let the database handle the synchronization.
The Query: Where the Magic Happens
SELECT * FROM items ORDER BY embedding <-> query_embedding LIMIT 5.
That’s fine for a toy app. But for a real document query system, we need to combine the “expanded” semantic query with the precise keyword requirements.
Here is a CTE (Common Table Expression) approach I use to blend the scores. It normalizes the rank from the vector search and the rank from the text search, then combines them based on an “alpha” weight we can tune.
WITH semantic_search AS (
SELECT
doc_id,
1 - (embedding <=> $1) AS vector_score -- Convert distance to similarity
FROM document_store
WHERE 1 - (embedding <=> $1) > 0.7 -- Cutoff for noise
ORDER BY embedding <=> $1
LIMIT 100
),
keyword_search AS (
SELECT
doc_id,
ts_rank_cd(search_vector, plainto_tsquery('english', $2)) AS text_score
FROM document_store
WHERE search_vector @@ plainto_tsquery('english', $2)
LIMIT 100
)
SELECT
COALESCE(s.doc_id, k.doc_id) AS doc_id,
d.title,
d.content,
-- Weighted Hybrid Score
-- Adjust the multipliers (0.7 vs 0.3) based on if you trust vectors or keywords more
(COALESCE(s.vector_score, 0) * 0.7) +
(COALESCE(normalize_rank(k.text_score), 0) * 0.3) AS final_score
FROM semantic_search s
FULL OUTER JOIN keyword_search k ON s.doc_id = k.doc_id
JOIN document_store d ON COALESCE(s.doc_id, k.doc_id) = d.doc_id
ORDER BY final_score DESC
LIMIT 10;
You see what’s happening there?
If the vector search finds a document because it’s semantically related (e.g., “internet issues” matching “wifi dropped”), it gets a score. If the keyword search finds it because of an exact match, it gets a score. If both find it, the document rockets to the top of the list.
The $2 parameter isn’t just the user’s raw query. It’s the expanded query containing synonyms and related entities that we generated before hitting the DB.
Transactional Safety
Another thing that drives me up the wall is seeing AI implementations that ignore ACID compliance. When you ingest a document, you usually chunk it, embed the chunks, and store them. If your embedding API fails halfway through a batch of 50 documents, you end up with a corrupted state where the document exists in your main table but has no search vectors. Always wrap your ingestion logic in a transaction.BEGIN;
-- 1. Insert the raw document metadata
INSERT INTO document_store (title, content, category_id)
VALUES ('System Architecture v2', 'Full content here...', 5)
RETURNING doc_id;
-- (Application layer pauses here to generate embedding from the content)
-- 2. Update with the vector
-- If the embedding API fails, we ROLLBACK and nothing is saved.
UPDATE document_store
SET embedding = '[0.012, -0.234, ...]'
WHERE doc_id = 12345;
COMMIT;
It sounds basic, but I’ve debugged production systems where “missing” search results were just failed API calls that left half-inserted records.
The “Gotchas” I Learned the Hard Way
1. The “Recent” Bias Vector models are static. They don’t know that “the current president” changed or that a software version was updated yesterday. If you don’t filter by date or boost newer documents in your SQL query, your AI will confidently serve outdated specs. I usually add a decay function to the score based oncreated_at.
2. Short Queries are Dangerous
If a user searches for “bank”, the vector distance between “bank” (financial) and “bank” (river) might be surprisingly close depending on the model’s training data. This is where Adaptive Query Expansion saves you. The LLM sees “bank” and asks the user “Did you mean financial institution or river side?” or it looks at the user’s history to contextually expand the query to “financial bank account deposit”.
3. Re-ranking is expensive but necessary
The SQL query above gets you the top 10 candidates. But sometimes, I pull the top 50 and run them through a Cross-Encoder (a reranking model) to get the final order. It adds about 200ms of latency, but the jump in quality is massive.
