Advanced RAG on Azure: Hybrid Search & Re-ranking Implementation
Mon Feb 09 2026
The “Hello World” of RAG (Retrieval-Augmented Generation) is simple: chunk text, embed it, and use cosine similarity. But in production, this often fails to retrieve specific entities (like product SKUs) or understands precise user intent.
At StackMindset, we implement Hybrid Search with Semantic Re-ranking to improve retrieval accuracy by up to 30%. Here is the technical blueprint.
1. Why Hybrid Search?
Vector search (Dense Retrieval) is great for concepts (“shoes for running”) but terrible at exact matches (“Nike Pegasus 39”). Keyword search (BM25) is the opposite.
Hybrid Search combines both:
- Keyword Search (BM25): Matches exact terms.
- Vector Search (Cosine): Matches semantic meaning.
- Reciprocal Rank Fusion (RRF): Merges the two lists.
2. Setting Up Azure AI Search for Hybrid
You need an index that supports both vector and keyword fields.
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex,
SearchField,
SearchFieldDataType,
SimpleField,
SearchableField,
VectorSearch,
HnswAlgorithmConfiguration,
VectorSearchProfile
)
# Define the Index
fields = [
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
# Keyword Searchable
SearchableField(name="content", type=SearchFieldDataType.String, analyzer_name="en.microsoft"),
# Vector Searchable
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1536,
vector_search_profile_name="my-vector-profile"
),
SimpleField(name="metadata", type=SearchFieldDataType.String)
]
vector_search = VectorSearch(
profiles=[VectorSearchProfile(name="my-vector-profile", algorithm_configuration_name="my-hnsw")],
algorithms=[HnswAlgorithmConfiguration(name="my-hnsw")]
)
index = SearchIndex(name="rag-index", fields=fields, vector_search=vector_search)
client.create_index(index)
3. The Secret Sauce: Semantic Re-ranking
Even with Hybrid search, the top 5 results might not be the best answer for the LLM.
Semantic Re-ranking uses a powerful cross-encoder model (Bing’s ranking model) to re-score the top 50 results based on how well they answer the query.
Implementation Logic
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery
def search_documents(query: str, vector: list[float]):
results = search_client.search(
search_text=query,
vector_queries=[VectorizedQuery(vector=vector, k_nearest_neighbors=50, fields="content_vector")],
top=5,
select=["content", "metadata"],
# Enable Hybrid + Semantic Re-ranking
query_type="semantic",
semantic_configuration_name="my-semantic-config"
)
return [doc for doc in results]
4. Chunking Strategy Matters
Don’t just split by 500 characters. Context is lost.
Recursive Character Splitter (LangChain): Allows keeping related text together (paragraphs, sentences).
Parent-Child Indexing:
- Child Chunks: Small (200 tokens) for precise retrieval.
- Parent Document: The full section (1000 tokens) returned to the LLM for context.
This ensures the match is precise, but the answer has enough context.
5. Performance Benchmarks
In our internal tests:
- Vector Only: 65% recall on technical queries.
- Hybrid (Vector + BM25): 78% recall.
- Hybrid + Semantic Re-ranking: 89% recall.
Conclusion
Stop relying on naive vector search. If you are building a RAG application for enterprise data (contracts, technical manuals, financial reports), Hybrid Search with Re-ranking is the minimum viable architecture for production.