NanoRuMTEB
Overview
NanoRuMTEB is a compact Russian retrieval group based on ruMTEB retrieval tasks. It contains three Russian-language subtasks: MIRACL-style Wikipedia passage retrieval, RIA news headline-to-article retrieval, and RuBQ question-to-Wikipedia paragraph retrieval. The group is small, but it covers two important Russian retrieval modes: short question answering over Wikipedia and headline/article matching in news.
The group contains 600 queries, 30,000 task-local documents, and 1,113 positive qrel rows. All tasks are Russian, and all candidate pools contain 10,000 documents. The group is useful for checking whether a retriever handles Russian morphology, named entities, inflection, and native-language query phrasing rather than only English or translated retrieval.
What This Group Measures
The group measures native Russian lexical and semantic retrieval. miracl_ru uses short Russian questions and Russian Wikipedia passages. ria_news maps Russian news headlines to the corresponding news articles. ru_bq retrieves Wikipedia paragraphs that support answers to Russian open-domain questions.
The main evaluation contrast is between a lexically strong headline-to-article task and QA-style passage retrieval tasks. News headlines and articles often share event terms and named entities. Wikipedia QA tasks require matching short questions to answer-bearing paragraphs, where the answer passage may use different inflected forms or broader context.
Task Families
- Wikipedia passage retrieval:
miracl_ruretrieves Russian Wikipedia passages for short information needs. - News retrieval:
ria_newsretrieves Russian news articles from headline strings. - Open-domain QA retrieval:
ru_bqretrieves Russian Wikipedia paragraphs that support answers to questions.
Dataset Shape
Each task has 200 queries and 10,000 candidate documents. ria_news is single-positive, while miracl_ru and ru_bq are multi-positive, averaging 2.90 and 1.67 positives per query. The group therefore combines one exact headline-to-article retrieval task with two evidence retrieval tasks where several passages may be relevant.
Queries are short Russian questions or headlines. Documents are longer Wikipedia passages or news articles. The task-local corpus sizes are equal, so differences in retrieval behavior mostly reflect source family and relevance relation rather than corpus size.
Retrieval Behavior
BM25 Profile
BM25 is not the best profile for any task, but it is strong on ria_news. The news split reaches 0.9135 nDCG@10, because headlines and articles often share names, places, dates, and event vocabulary. BM25 is also usable on ru_bq, reaching 0.6979 nDCG@10, where some question terms and entities appear in the answer passage.
BM25 is weakest on miracl_ru at 0.5154 nDCG@10. Short Russian questions do not always repeat the exact wording of a relevant Wikipedia passage, and morphology or paraphrase can reduce sparse overlap. BM25 is therefore a strong Russian lexical baseline, but not the leading retrieval profile for this group.
Dense Profile
Dense retrieval with harrier-oss-270m is the best profile for all three tasks. It reaches 0.9478 nDCG@10 on ria_news, 0.8739 on ru_bq, and 0.7938 on miracl_ru. The improvements on the two QA-style tasks show that embedding similarity helps connect Russian questions to answer-bearing passages beyond exact token overlap.
Dense also slightly improves the already strong news task, suggesting that headline-to-article retrieval benefits from both event-term overlap and broader semantic matching. At group level, dense retrieval is the clearest default profile for NanoRuMTEB.
Reranking Hybrid Profile
The reranking hybrid profile is not the best nDCG@10 profile for any task, but it has the best recall@100 on all three tasks. It reaches 0.9948 recall@100 on miracl_ru, 0.9900 on ria_news, and 0.9790 on ru_bq. This means hybrid retrieval is a strong candidate-generation strategy even when dense retrieval orders the top 10 better.
The top-10 pattern suggests that adding sparse evidence can slightly dilute dense ranking for Russian QA tasks, while still recovering complementary candidate passages. For reranking pipelines, hybrid candidates are useful; for first-stage top-10 scoring in this Nano slice, dense retrieval leads.
Task Summary
| Task | Family | Language | Queries | Docs | Positives | Positives/query | BM25 nDCG@10 | Dense nDCG@10 | Reranking hybrid nDCG@10 | Best profile |
| miracl_ru | Wikipedia passage retrieval | ru | 200 | 10,000 | 579 | 2.90 | 0.5154 | 0.7938 | 0.6646 | Dense |
| ria_news | News retrieval | ru | 200 | 10,000 | 200 | 1.00 | 0.9135 | 0.9478 | 0.9272 | Dense |
| ru_bq | Open-domain QA retrieval | ru | 200 | 10,000 | 334 | 1.67 | 0.6979 | 0.8739 | 0.7767 | Dense |
Interpretation Notes for Model Researchers
NanoRuMTEB is a compact but clean Russian diagnostic. Dense retrieval dominates all three tasks, especially the QA-style Wikipedia tasks. BM25 remains strong on news retrieval and is still useful for entity-heavy Russian queries, but it does not capture all paraphrase and answer-evidence relations.
Because there are only three tasks, group averages are easy to inspect. A model that improves ria_news may be improving event and headline matching, while a model that improves miracl_ru or ru_bq is more likely improving Russian question-to-passage retrieval. Hybrid recall should be considered if the system uses a reranker after first-stage retrieval.
Training and Leakage Notes
Useful training data includes Russian MIRACL or Wikipedia question-passage pairs, RuBQ-style QA retrieval data, Russian news headline-to-article pairs, and hard negatives mined from same-topic Russian passages. Native Russian training data is preferable to English-only data translated after the fact.
Leakage control should exclude NanoRuMTEB evaluation queries, qrels, and positive documents. Source dev/test overlap should be audited before using ruMTEB, MIRACL, RIA news, or RuBQ-derived data for training.
Source Reference Table
| Source | Year | Type | URL |
| The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design | 2025 | benchmark paper | https://aclanthology.org/2025.naacl-long.12/ |
| MIRACL | project page | http://miracl.ai/ | |
| mteb/RiaNewsRetrieval_test_top_250_only_w_correct-v2 | dataset card | https://huggingface.co/datasets/mteb/RiaNewsRetrieval_test_top_250_only_w_correct-v2 | |
| ai-forever/rubq-retrieval | dataset card | https://huggingface.co/datasets/ai-forever/rubq-retrieval | |
| mteb/MIRACLRetrievalHardNegatives | dataset card | https://huggingface.co/datasets/mteb/MIRACLRetrievalHardNegatives |
Metadata Summary
| Field | Value |
| Task pages | 3 |
| Queries | 600 |
| Split-local documents | 30,000 |
| Positive qrels | 1,113 |
| Languages | ru |
| Categories | natural_language |
| Positives / query avg | 1.85 |
Task Metadata Summary
| Task | Backing dataset | Lang | Category | Queries | Docs | Positives | BM25 nDCG@10 | Dense nDCG@10 | Reranking hybrid nDCG@10 | Best profile |
| miracl_ru | NanoRuMTEB | ru | natural_language | 200 | 10,000 | 579 | 0.5154 | 0.7938 | 0.6646 | Dense |
| ria_news | NanoRuMTEB | ru | natural_language | 200 | 10,000 | 200 | 0.9135 | 0.9478 | 0.9272 | Dense |
| ru_bq | NanoRuMTEB | ru | natural_language | 200 | 10,000 | 334 | 0.6979 | 0.8739 | 0.7767 | Dense |