HAKARI-Bench

NanoMTEB-Dutch

Overview

NanoMTEB-Dutch is a compact Dutch retrieval group covering translated BEIR-NL tasks, native Dutch MTEB tasks, cross-lingual Belebele retrieval, legal and medical retrieval, scientific evidence retrieval, news and tender retrieval, web FAQ retrieval, and Dutch Wikipedia retrieval. Ten of the twenty-seven tasks are Dutch CQADupStack duplicate-question splits, but the group is broader than duplicate QA.

The group should be read as both a Dutch-language benchmark and a translation robustness benchmark. Some tasks are native Dutch resources, such as legal, news, tender, FAQ, or bibliography retrieval. Others carry BEIR-style relevance relations into Dutch. BM25 exposes exact Dutch terms and named entities; dense retrieval tests paraphrase, translation, and cross-lingual matching; reranking_hybrid is useful when sparse and dense candidates recover different positives.

What This Group Measures

NanoMTEB-Dutch follows Dutch retrieval coverage assembled for MTEB-NL and BEIR-NL. It includes translated BEIR-style datasets, cross-lingual Belebele directions, Dutch legal resources, Dutch news, public procurement, Flemish academic bibliography, FAQ retrieval, and Wikipedia retrieval.

The group measures Dutch retrieval across task semantics. A relevant document may be a duplicate question, a statute article, a FAQ answer, a news article, a procurement notice, a scientific paper, a medical abstract, or a Wikipedia passage. A model must preserve both Dutch lexical details and the original task relation after translation or cross-lingual mapping.

Task Families

Dataset Shape

NanoMTEB-Dutch contains 27 task pages, 5,299 queries, 227,987 split-local documents, and 13,018 positive qrel rows. Most tasks have 200 queries. The group has a mix of single-positive and multi-positive tasks; NFCorpus, SCIDOCS, Quora, LegalQA-NL, bBSARD, FEVER, and NQ require multi-positive interpretation.

Text lengths and formats are diverse. Argument queries are long, FAQ and web queries are short, legal tasks contain formal statute language, CQADupStack contains technical forum text, and scientific tasks contain paper or abstract language. This makes task-family breakdown more useful than one aggregate Dutch score.

Retrieval Behavior

BM25 Profile

BM25 is strongest on tasks with direct named entities, titles, article terms, or Dutch surface overlap. FEVER, Dutch news, Wikipedia, Quora, LegalQA-NL, WebFAQ, OpenTender, and VABB all show substantial sparse signal. BM25 is also useful on same-language Belebele where question and passage share Dutch answer context.

BM25 is weaker on cross-lingual Belebele, bBSARD, SCIDOCS, and many CQADupStack technical duplicate tasks. These require language alignment, paraphrase, legal concept mapping, scientific relatedness, or duplicate intent beyond exact word overlap.

Dense Profile

Dense retrieval is the best profile for many Dutch tasks, especially cross-lingual Belebele, Quora, WebFAQ, Wikipedia, VABB, NQ, SciFact, and several CQADupStack splits. It helps when the Dutch query and document express the same intent with different words, or when translated benchmark artifacts weaken literal matching.

Dense retrieval still needs exact anchors. Legal articles, procurement notices, scientific terms, and technical forum posts can depend on specific names, codes, product terms, or statute phrases. Dense gains should be checked against candidate recall for those anchors.

Reranking Hybrid Profile

reranking_hybrid is strongest on same-language Belebele, LegalQA-NL, and several CQADupStack tasks. It is most useful where exact Dutch terms and dense paraphrase matching are both needed. For FAQ, Quora, and Wikipedia-style tasks, hybrid remains competitive even when dense has the best nDCG@10.

For reranker experiments, multi-positive tasks such as NFCorpus, SCIDOCS, Quora, and legal retrieval should be read with recall and candidate diversity in mind, not only first-hit success.

Task Summary

TaskFamilyQueriesDocsBM25 nDCG@10Dense nDCG@10Reranking hybrid nDCG@10Best profile
argu_ana_nlargument retrieval1998,6240.29700.37230.3529Dense
b_bsardnllegal retrieval20010,0000.12490.27490.2234Dense
belebele_eng_latn_nld_latncross-lingual QA2004880.47380.89180.6283Dense
belebele_nld_latn_eng_latncross-lingual QA2004880.32880.93060.4456Dense
belebele_nld_latn_nld_latnDutch QA2004880.83640.88990.8999Reranking hybrid
cqadupstack_androidduplicate question20010,0000.29440.38620.3836Dense
cqadupstack_englishduplicate question20010,0000.27690.35870.3248Dense
cqadupstack_gisduplicate question20010,0000.27900.32020.3272Reranking hybrid
cqadupstack_mathematicaduplicate question20010,0000.18260.19920.2181Reranking hybrid
cqadupstack_physicsduplicate question20010,0000.32690.40200.3756Dense
cqadupstack_programmersduplicate question20010,0000.29910.39060.3638Dense
cqadupstack_statsduplicate question20010,0000.28270.32240.3337Reranking hybrid
cqadupstack_texduplicate question20010,0000.21060.26110.2926Reranking hybrid
cqadupstack_webmastersduplicate question20010,0000.23070.29470.2968Reranking hybrid
cqadupstack_wordpressduplicate question20010,0000.26080.30570.3371Reranking hybrid
dutch_news_articlesnews retrieval20010,0000.88680.89960.8954Dense
feverevidence retrieval20010,0000.92210.92070.9215BM25
legal_qanllegal QA retrieval10210,0000.81430.80500.8455Reranking hybrid
nfcorpus_nlbiomedical retrieval1993,5930.26830.25900.2656BM25
nqopen-domain QA20010,0000.45050.63350.5473Dense
open_tendertender retrieval19910,0000.67120.60440.6556BM25
quoraduplicate question20010,0000.83910.92890.8772Dense
sci_fact_nlscientific evidence2005,1830.61600.67580.6709Dense
scidocs_nlrelated scientific documents20010,0000.13350.22640.1835Dense
vabbbibliography retrieval2009,1230.69520.78040.7540Dense
web_faq_nldFAQ retrieval20010,0000.76980.87760.8442Dense
wikipedia_multilingual_nlWikipedia QA retrieval20010,0000.84440.89480.8840Dense

Interpretation Notes for Model Researchers

NanoMTEB-Dutch should be interpreted by source type. Translated BEIR tasks stress whether the original relevance relation survives in Dutch. Native Dutch legal, FAQ, news, tender, and bibliography tasks stress domain vocabulary and local data conventions. Cross-lingual Belebele should be read separately from same-language Dutch retrieval.

The BM25/dense profile is a useful diagnostic. BM25-led tasks show direct Dutch surface anchors. Dense-led tasks show paraphrase, translation, or semantic intent matching. Hybrid-led tasks show that both exact terms and semantic alignment are needed for candidate generation.

Training and Leakage Notes

Useful training data includes Dutch search logs, legal question-article pairs, FAQ retrieval, translated BEIR-NL training data, duplicate-question pairs, scientific and biomedical retrieval, and cross-lingual QA pairs. Hard negatives should be drawn from same legal article families, same technical forums, same scientific topic, or same answer-bearing article.

Exclude NanoMTEB-Dutch evaluation queries, positives, qrels, duplicate clusters, translated test examples, statute articles, and source rows. Cross-lingual tasks should avoid direct translations of evaluation queries as synthetic training seeds.

Source Reference Table

SourceYearTypeURL
MTEB: Massive Text Embedding Benchmark2022paperhttps://arxiv.org/abs/2210.07316
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models2021paperhttps://arxiv.org/abs/2104.08663
Belebele: a Parallel Reading Comprehension Dataset in 122 Language Variants2023paperhttps://arxiv.org/abs/2308.16884

Metadata Summary

FieldValue
Task pages27
Queries5,299
Split-local documents227,987
Positive qrels13,018
Languagesmultilingual, nl
Categoriesnatural_language
Positives / query avg2.46

Task Metadata Summary

TaskBacking datasetLangCategoryQueriesDocsPositivesBM25 nDCG@10Dense nDCG@10Reranking hybrid nDCG@10Best profile
argu_ana_nlNanoMTEB-Dutchnlnatural_language1998,6241990.29700.37230.3529Dense
b_bsardnlNanoMTEB-Dutchnlnatural_language20010,0009230.12490.27490.2234Dense
belebele_eng_latn_nld_latnNanoMTEB-Dutchmultilingualnatural_language2004882000.47380.89180.6283Dense
belebele_nld_latn_eng_latnNanoMTEB-Dutchmultilingualnatural_language2004882000.32880.93060.4456Dense
belebele_nld_latn_nld_latnNanoMTEB-Dutchnlnatural_language2004882000.83640.88990.8999Reranking hybrid
cqadupstack_androidNanoMTEB-Dutchnlnatural_language20010,0002000.29440.38620.3836Dense
cqadupstack_englishNanoMTEB-Dutchnlnatural_language20010,0002000.27690.35870.3248Dense
cqadupstack_gisNanoMTEB-Dutchnlnatural_language20010,0002000.27900.32020.3272Reranking hybrid
cqadupstack_mathematicaNanoMTEB-Dutchnlnatural_language20010,0002000.18260.19920.2181Reranking hybrid
cqadupstack_physicsNanoMTEB-Dutchnlnatural_language20010,0002000.32690.40200.3756Dense
cqadupstack_programmersNanoMTEB-Dutchnlnatural_language20010,0002000.29910.39060.3638Dense
cqadupstack_statsNanoMTEB-Dutchnlnatural_language20010,0002000.28270.32240.3337Reranking hybrid
cqadupstack_texNanoMTEB-Dutchnlnatural_language20010,0002000.21060.26110.2926Reranking hybrid
cqadupstack_webmastersNanoMTEB-Dutchnlnatural_language20010,0002000.23070.29470.2968Reranking hybrid
cqadupstack_wordpressNanoMTEB-Dutchnlnatural_language20010,0002000.26080.30570.3371Reranking hybrid
dutch_news_articlesNanoMTEB-Dutchnlnatural_language20010,0002000.88680.89960.8954Dense
feverNanoMTEB-Dutchnlnatural_language20010,0002330.92210.92070.9215BM25
legal_qanlNanoMTEB-Dutchnlnatural_language10210,0001570.81430.80500.8455Reranking hybrid
nfcorpus_nlNanoMTEB-Dutchmultilingualnatural_language1993,5935,8800.26830.25900.2656BM25
nqNanoMTEB-Dutchnlnatural_language20010,0002420.45050.63350.5473Dense
open_tenderNanoMTEB-Dutchnlnatural_language19910,0001990.67120.60440.6556BM25
quoraNanoMTEB-Dutchnlnatural_language20010,0005730.83910.92890.8772Dense
sci_fact_nlNanoMTEB-Dutchnlnatural_language2005,1832260.61600.67580.6709Dense
scidocs_nlNanoMTEB-Dutchnlnatural_language20010,0009860.13350.22640.1835Dense
vabbNanoMTEB-Dutchnlnatural_language2009,1232000.69520.78040.7540Dense
web_faq_nldNanoMTEB-Dutchnlnatural_language20010,0002000.76980.87760.8442Dense
wikipedia_multilingual_nlNanoMTEB-Dutchnlnatural_language20010,0002000.84440.89480.8840Dense