Glossary term
Glossary term
Memory and Retrieval
Retrieval model that jointly processes query and document pairs to score relevance, used for reranking.
Sentence Transformers' cross-encoder models are used as rerankers in RAG pipelines - after dense retrieval returns top-100 candidates, a cross-encoder scores each query-document pair and reranks to top-5.
Cohere Rerank uses a cross-encoder that jointly encodes query and retrieved documents - achieving 15-30% improvement in NDCG@10 over bi-encoder retrieval alone in enterprise search deployments.
ms-marco-MiniLM cross-encoder (Hugging Face) is used in production search pipelines at medium scale - the 22M parameter model runs in 5ms per query-document pair, enabling real-time reranking of 50 candidates.