Glossary term
Glossary term
Memory and Retrieval
Classic keyword-based retrieval algorithm using term frequency and inverse document frequency for relevance scoring.
Elasticsearch's default relevance scoring uses BM25 - Wikipedia's internal search engine uses BM25 to serve 60 million monthly visitors with sub-100ms full-text search across 55 million articles.
BM25 is used as the sparse-retrieval component in hybrid search systems - combining with dense vector retrieval in Weaviate and Qdrant to handle both exact keyword matches and semantic queries.
OpenAI's file search tool in the Assistants API uses BM25 combined with vector search for hybrid retrieval - documents with exact keyword matches are scored alongside semantically similar documents.