Advanced RAG on Azure: Hybrid Search & Re-ranking Implementation

The “Hello World” of RAG (Retrieval-Augmented Generation) is simple: chunk text, embed it, and use cosine similarity. But in production, this often fails to retrieve specific entities (like product SKUs) or understands precise user intent.

At StackMindset, we implement Hybrid Search with Semantic Re-ranking to improve retrieval accuracy by up to 30%. Here is the technical blueprint.

1. Why Hybrid Search?

Vector search (Dense Retrieval) is great for concepts (“shoes for running”) but terrible at exact matches (“Nike Pegasus 39”). Keyword search (BM25) is the opposite.

Hybrid Search combines both:

Keyword Search (BM25): Matches exact terms.
Vector Search (Cosine): Matches semantic meaning.
Reciprocal Rank Fusion (RRF): Merges the two lists.

2. Setting Up Azure AI Search for Hybrid

You need an index that supports both vector and keyword fields.

from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex,
    SearchField,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile
)

# Define the Index
fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True),
    # Keyword Searchable
    SearchableField(name="content", type=SearchFieldDataType.String, analyzer_name="en.microsoft"),
    # Vector Searchable
    SearchField(
        name="content_vector",
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        searchable=True,
        vector_search_dimensions=1536,
        vector_search_profile_name="my-vector-profile"
    ),
    SimpleField(name="metadata", type=SearchFieldDataType.String)
]

vector_search = VectorSearch(
    profiles=[VectorSearchProfile(name="my-vector-profile", algorithm_configuration_name="my-hnsw")],
    algorithms=[HnswAlgorithmConfiguration(name="my-hnsw")]
)

index = SearchIndex(name="rag-index", fields=fields, vector_search=vector_search)
client.create_index(index)

3. The Secret Sauce: Semantic Re-ranking

Even with Hybrid search, the top 5 results might not be the best answer for the LLM.

Semantic Re-ranking uses a powerful cross-encoder model (Bing’s ranking model) to re-score the top 50 results based on how well they answer the query.

Implementation Logic

from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

def search_documents(query: str, vector: list[float]):
    results = search_client.search(
        search_text=query,
        vector_queries=[VectorizedQuery(vector=vector, k_nearest_neighbors=50, fields="content_vector")],
        top=5,
        select=["content", "metadata"],
        # Enable Hybrid + Semantic Re-ranking
        query_type="semantic",
        semantic_configuration_name="my-semantic-config"
    )
    
    return [doc for doc in results]

4. Chunking Strategy Matters

Don’t just split by 500 characters. Context is lost.

Recursive Character Splitter (LangChain): Allows keeping related text together (paragraphs, sentences).

Parent-Child Indexing:

Child Chunks: Small (200 tokens) for precise retrieval.
Parent Document: The full section (1000 tokens) returned to the LLM for context.

This ensures the match is precise, but the answer has enough context.

5. Performance Benchmarks

In our internal tests:

Vector Only: 65% recall on technical queries.
Hybrid (Vector + BM25): 78% recall.
Hybrid + Semantic Re-ranking: 89% recall.

Conclusion

Stop relying on naive vector search. If you are building a RAG application for enterprise data (contracts, technical manuals, financial reports), Hybrid Search with Re-ranking is the minimum viable architecture for production.