Why Your AI Search Sucks (And How to Fix It)

Stop trusting the vector hype

I remember the first time I replaced a standard keyword search with vector embeddings. It felt like magic. I threw a messy, vague sentence at the database, and it actually returned relevant documents. I felt like a genius. Then, about three days later, a user searched for a specific error code—”ERR_CONNECTION_REFUSED”—and my fancy AI search engine returned a blog post about “emotional disconnection in remote work.” That was embarrassing. The problem with document querying using AI isn’t that the technology is bad. It’s that we’ve been sold a simplified lie: that you can just dump text into an embedding model, store the vectors, and call it a day. If you’re building this for production in 2026, you know that pure vector search has a massive blind spot. It loves “vibes” but hates specifics. So, I stopped trying to force vectors to do everything. Instead, I moved to an adaptive hybrid approach. It’s messier, sure, but it actually works. Here is how I build document query systems now, combining the precision of SQL with the semantic understanding of AI.

The “Adaptive” Part

Search bar with glowing data stream - a blue and purple background with lines of light
Search bar with glowing data stream – a blue and purple background with lines of light
Most RAG (Retrieval-Augmented Generation) pipelines are dumb. They take a user query, turn it into a vector, and find the nearest neighbors. But users are terrible at querying. They write things like “help” or “broken.” An adaptive system inserts a brain between the user and the database. Before we even touch the SQL, we use a small, fast LLM to analyze the query. We ask: *Does this query need expansion?* If the user types “printer paper jam,” we expand that to include “tray 2 blockage,” “roller feed issue,” and “clear path.” If the user types “Error 404,” we don’t expand it. We treat it as a strict keyword match. This decision logic changes how we query the database. We aren’t just doing a cosine similarity search anymore. We are doing a weighted hybrid search.

Setting Up the Foundation

I use PostgreSQL with pgvector. It’s standard, it’s reliable, and I don’t have to manage a separate vector database (which is a nightmare for keeping data in sync). Here is the schema I use. Note that I’m keeping both the raw text for lexical search (BM25/tsvector) and the embedding for semantic search.
-- We need the vector extension, obviously
CREATE EXTENSION IF NOT EXISTS vector;

-- The main documents table
CREATE TABLE document_store (
    doc_id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    
    -- Metadata for filtering (critical for performance)
    category_id INT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    
    -- The semantic brain: 1536 dimensions is standard for many models
    embedding vector(1536),
    
    -- The keyword anchor: standard Postgres full-text search
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', title), 'A') ||
        setweight(to_tsvector('english', content), 'B')
    ) STORED
);

-- Indexing is where you win or lose on speed
-- HNSW index for the vectors (fast approximate nearest neighbor)
CREATE INDEX ON document_store USING hnsw (embedding vector_cosine_ops);

-- GIN index for the keywords
CREATE INDEX ON document_store USING GIN (search_vector);
I can’t stress this enough: use GENERATED ALWAYS AS ... STORED for your tsvector. I used to update these manually in application code, and eventually, I forgot to update one, leading to ghost data. Let the database handle the synchronization.

The Query: Where the Magic Happens

Search bar with glowing data stream - a black room with a blue light in it
Search bar with glowing data stream – a black room with a blue light in it
This is the part most tutorials skip. They show you SELECT * FROM items ORDER BY embedding <-> query_embedding LIMIT 5. That’s fine for a toy app. But for a real document query system, we need to combine the “expanded” semantic query with the precise keyword requirements. Here is a CTE (Common Table Expression) approach I use to blend the scores. It normalizes the rank from the vector search and the rank from the text search, then combines them based on an “alpha” weight we can tune.
WITH semantic_search AS (
    SELECT 
        doc_id, 
        1 - (embedding <=> $1) AS vector_score -- Convert distance to similarity
    FROM document_store
    WHERE 1 - (embedding <=> $1) > 0.7 -- Cutoff for noise
    ORDER BY embedding <=> $1
    LIMIT 100
),
keyword_search AS (
    SELECT 
        doc_id, 
        ts_rank_cd(search_vector, plainto_tsquery('english', $2)) AS text_score
    FROM document_store
    WHERE search_vector @@ plainto_tsquery('english', $2)
    LIMIT 100
)
SELECT 
    COALESCE(s.doc_id, k.doc_id) AS doc_id,
    d.title,
    d.content,
    -- Weighted Hybrid Score
    -- Adjust the multipliers (0.7 vs 0.3) based on if you trust vectors or keywords more
    (COALESCE(s.vector_score, 0) * 0.7) + 
    (COALESCE(normalize_rank(k.text_score), 0) * 0.3) AS final_score
FROM semantic_search s
FULL OUTER JOIN keyword_search k ON s.doc_id = k.doc_id
JOIN document_store d ON COALESCE(s.doc_id, k.doc_id) = d.doc_id
ORDER BY final_score DESC
LIMIT 10;
You see what’s happening there? If the vector search finds a document because it’s semantically related (e.g., “internet issues” matching “wifi dropped”), it gets a score. If the keyword search finds it because of an exact match, it gets a score. If both find it, the document rockets to the top of the list. The $2 parameter isn’t just the user’s raw query. It’s the expanded query containing synonyms and related entities that we generated before hitting the DB.

Transactional Safety

Another thing that drives me up the wall is seeing AI implementations that ignore ACID compliance. When you ingest a document, you usually chunk it, embed the chunks, and store them. If your embedding API fails halfway through a batch of 50 documents, you end up with a corrupted state where the document exists in your main table but has no search vectors. Always wrap your ingestion logic in a transaction.
BEGIN;

-- 1. Insert the raw document metadata
INSERT INTO document_store (title, content, category_id)
VALUES ('System Architecture v2', 'Full content here...', 5)
RETURNING doc_id;

-- (Application layer pauses here to generate embedding from the content)

-- 2. Update with the vector
-- If the embedding API fails, we ROLLBACK and nothing is saved.
UPDATE document_store 
SET embedding = '[0.012, -0.234, ...]' 
WHERE doc_id = 12345;

COMMIT;
It sounds basic, but I’ve debugged production systems where “missing” search results were just failed API calls that left half-inserted records.

The “Gotchas” I Learned the Hard Way

1. The “Recent” Bias Vector models are static. They don’t know that “the current president” changed or that a software version was updated yesterday. If you don’t filter by date or boost newer documents in your SQL query, your AI will confidently serve outdated specs. I usually add a decay function to the score based on created_at. 2. Short Queries are Dangerous If a user searches for “bank”, the vector distance between “bank” (financial) and “bank” (river) might be surprisingly close depending on the model’s training data. This is where Adaptive Query Expansion saves you. The LLM sees “bank” and asks the user “Did you mean financial institution or river side?” or it looks at the user’s history to contextually expand the query to “financial bank account deposit”. 3. Re-ranking is expensive but necessary The SQL query above gets you the top 10 candidates. But sometimes, I pull the top 50 and run them through a Cross-Encoder (a reranking model) to get the final order. It adds about 200ms of latency, but the jump in quality is massive.

Why this matters

We are past the “wow” phase of AI search. Users don’t care that you used a transformer model. They care that they found the document. By mixing the old school (SQL, exact keywords, transactions) with the new school (embeddings, adaptive expansion), you build something that doesn’t just look cool in a demo—it actually survives contact with real users. And honestly? It’s much more satisfying to build a system that understands *context* rather than just matching words. Just don’t forget to index your columns. Please.

Leave a Reply

Your email address will not be published. Required fields are marked *