HAKARI-Bench

NanoRuMTEB / miracl_ru

Overview

miracl_ru is a Russian factual passage retrieval task from NanoRuMTEB. The queries are short Russian natural-language questions, and the documents are Russian Wikipedia passages. Many queries have multiple relevant passages, often from the same entity or topic. The task measures native Russian ad hoc retrieval rather than translation or English-centered search. Dense retrieval is the strongest top-rank profile, reranking_hybrid has the best recall@100, and BM25 remains a strong lexical baseline because entity names and titles often overlap directly.

Details

What the Original Data Measures

MIRACL is a multilingual retrieval benchmark covering 18 languages, including Russian. It uses Wikipedia passages, native-language queries, and relevance judgments created for multilingual information retrieval.

ruMTEB includes MIRACL retrieval as a Russian benchmark task. The Nano version uses the MIRACL Russian hard-negative setting, where candidates are drawn from retrieval pools and the evaluation focuses on ranking answer-bearing passages.

Observed Data Profile

The Nano split contains 200 queries, 10,000 documents, and 579 positive qrel rows. Queries average 45.37 characters, while documents average 517.26 characters. Positives per query average 2.90, with a minimum of 1, a median of 2, and a maximum of 10. There are 136 multi-positive queries, 68.0% of the split.

Example queries ask whether "Agents of S.H.I.E.L.D." is a drama series, whether China is a socialist state, whether the Cipher Bureau worked on breaking Enigma, how long Tolstoy wrote "War and Peace", and how many days Euromaidan lasted in Ukraine.

BM25 Evaluation Profile

The BM25 candidate subset uses top-500 candidates and reaches nDCG@10 of 0.5154, hit@10 of 0.8250, and recall@100 of 0.9326. BM25 is a strong baseline because many Russian questions contain entity names, titles, or distinctive factual terms that appear in relevant Wikipedia passages.

Its limitation is passage selection. BM25 may retrieve the correct article but not the answer-bearing passage, or overrank a prominent entity passage that shares surface terms but does not answer the question.

Dense Evaluation Profile

The dense candidate subset from harrier_oss_v1_270m uses top-500 candidates and reaches nDCG@10 of 0.7938, hit@10 of 0.9550, and recall@100 of 0.9585. Dense retrieval is the strongest profile for early ranking.

This suggests that embedding similarity captures Russian question-passage semantics beyond direct word overlap. It helps with paraphrase, inflectional variation, and cases where the answer is expressed in a later explanatory passage.

Reranking Hybrid Evaluation Profile

The reranking_hybrid subset uses top-100 candidates, with 1 row receiving the optional rank-101 safeguard. It reaches nDCG@10 of 0.6646, hit@10 of 0.9050, and recall@100 of 0.9948. Hybrid retrieval has the best recall@100 but lower top-rank quality than dense retrieval.

The pattern shows that sparse and dense signals are complementary for candidate coverage. BM25 contributes exact entity and title matching, while dense retrieval is better at ordering answer-bearing passages near the top.

Metric Interpretation for Model Researchers

Because many queries have multiple positives, nDCG@10 measures whether several relevant passages are ranked early, hit@10 measures whether at least one positive appears in the first ten, and recall@100 measures how much relevant material is available for reranking.

For miracl_ru, dense nDCG@10 is the main first-stage quality signal. Hybrid recall@100 is valuable when a reranker can distinguish answer-bearing passages from related passages.

Query and Relevance Type Tendencies

Queries are short Russian fact, definition, and yes/no questions. Relevant documents are Russian Wikipedia passages, often with named entities, dates, titles, and explanatory facts.

Relevance is answer-bearing passage relevance. A passage from the correct article is not necessarily relevant if it does not contain the fact needed by the question.

Representative Failure Modes

Common failures include retrieving the right entity but wrong passage, overmatching quoted titles, missing inflected or paraphrased Russian wording, and confusing closely related people, works, or events. BM25 is sensitive to exact terms; dense retrieval can still overrank topically related passages without the answer.

Training Data That May Help

Useful training data includes non-overlapping MIRACL Russian train pairs, Russian Wikipedia question-passage retrieval, native Russian factual QA retrieval, and same-language multilingual retrieval data with overlap removed. Evaluation queries, positive passages, and qrels should be excluded.

Model Improvement Notes

Models should handle Russian morphology, entity aliases, title variants, and passage-level answer grounding. Hard negatives should come from the same article or nearby entity pages. Dense retrieval is the best direct ranker, while hybrid retrieval is useful for high-recall reranking pools.

Example Data

QueryPositive document
«Агенты "Щ.И.Т."» - это драматический сериал? [45 chars]Агенты «Щ.И.Т.» «Аге́нты „Щ.И.Т.“» () — американский супергеройский телесериал, созданный Джоссом Уидоном и основанный на одноимённом комиксе компании Marvel о вымышленной организации по борьбе с преступностью, является частью кинематографической вселенной Marvel. История начинается с того, что агент Фил Колсон (Кларк Грегг), который выжил после событий фильма «Мстители», работает в «Щ.И.Т.» вместе с новой командой. [420 chars]
Китай социалистическое государство? [35 chars]Китай Официально, Китайская Народная Республика — унитарная республика, социалистическое государство демократической диктатуры народа. Основным законом государства является конституция, принятая в 1982 году. Высший орган государственной власти — однопалатное Всекитайское собрание народных представителей (ВСНП), состоящее из 2979 депутатов, избираемых региональными собраниями народных представителей сроком на 5 лет. Сессии ВСНП созываются на ежегодной основе. Между сессиями полномочия ВСНП осуществляет Постоянный комитет Всекитайского собрания народных представителей. [574 chars]
Занималось Бюро шифров взломом шифров немецкой Энигмы? [54 chars]Бюро шифров Главным ведомством Бюро и отделением, ответственным за криптоанализ немецких систем шифрования, стало BS4, позже основной задачей отделения стал взлом немецкой шифровальной машины «Энигма». Начальником немецкого отделения (BS4) и заместителем начальника Бюро шифров стал капитан , который в 1946 году получил звание подполковника. Одними из ключевых специалистов, работавших над расшифровкой немецких систем, в частности над взломом «Энигмы», были Мариан Реевский, Ежи Ружицкий и Генрих Зыгальский — молодые выпускники курса в Познани, которых Максимильян Ценжкий нанял в Бюро в сентября 1932 года. Обработкой и чтением расшифрованных сообщений активно занимался . [677 chars]

Source Reference Table

TitleYearTypeURL
MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages2023arXiv paperhttps://arxiv.org/abs/2210.09984
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design2025arXiv paperhttps://arxiv.org/abs/2408.12503
MIRACL project page2023project pagehttp://miracl.ai/
mteb/MIRACLRetrievalHardNegatives2025dataset cardhttps://huggingface.co/datasets/mteb/MIRACLRetrievalHardNegatives

Dataset Information

FieldValue
Nano setNanoRuMTEB
Backing datasetNanoRuMTEB
Task / splitmiracl_ru
Hugging Face datasethakari-bench/NanoRuMTEB
Languageru
Categorynatural_language
Queries200
Documents10,000
Positive qrels579
Positives / query avg2.90
Positives / query min1
Positives / query median2.00
Positives / query max10
Multi-positive queries136 (68.00%)
Query length avg chars45.37
Document length avg chars517.26

Candidate Subsets

ProfileConfignDCG@10Hit@10Recall@100Candidates
BM25bm250.51540.82500.9326top-500
Denseharrier_oss_v1_270m0.79380.95500.9585top-500
Reranking hybridreranking_hybrid0.66460.90500.9948top-100

Training and Leakage Metadata