HAKARI-Bench

NanoMTEB-French

Overview

NanoMTEB-French is a compact French retrieval group drawn from MTEB-French and related MTEB tasks. It combines educational resource retrieval, Belgian statutory article retrieval, French Wikipedia QA passage retrieval, French Mintaka answer retrieval, Syntec collective-agreement retrieval, and French-English product question answering. The group tests monolingual French retrieval and cross-lingual product QA in one small suite.

The source tasks differ more than their French surface suggests. Alloprof and BSARD map user-facing questions to long educational or legal documents. FQuAD retrieves answer-bearing passages. Mintaka retrieves short entity-like answers. Syntec retrieves labor-agreement clauses. xPQA tests product answerability across French and English directions. BM25 exposes exact wording and named entities; dense retrieval tests French paraphrase and cross-lingual matching; reranking_hybrid shows where both signals are useful.

What This Group Measures

NanoMTEB-French follows MTEB-French retrieval coverage and includes xPQA directions that involve French. The group measures whether a model can retrieve the correct French or French-English answer source across education, law, Wikipedia QA, entity answers, labor agreements, and product QA.

The shared challenge is not only French language handling. Each task has its own relevance definition: an educational lesson should answer a student question, a statute should satisfy a legal need, a product snippet should answer compatibility or specification questions, and a Mintaka answer may be a very short label.

Task Families

Dataset Shape

NanoMTEB-French contains 8 task pages, 1,500 queries, 19,397 split-local documents, and 2,212 positive qrel rows. Most tasks are single-positive. The three xPQA tasks are multi-positive, with roughly two positive snippets per query on average.

Document formats vary widely. Alloprof lessons are long educational documents, BSARD statutes are formal legal articles, FQuAD passages are compact evidence paragraphs, Mintaka targets are short answer strings, and xPQA documents are short product answer snippets. The group should therefore be read by target type rather than as one generic French retrieval benchmark.

Retrieval Behavior

BM25 Profile

BM25 is strongest on fquad, syntec, and monolingual xpqa_fra_fra. These tasks often preserve names, article terms, product specifications, or agreement language. FQuAD in particular has strong lexical overlap between question and Wikipedia evidence.

BM25 is weakest on cross-lingual xPQA and BSARD. French-to-English or English-to-French product QA has limited lexical bridge beyond product names, numbers, and units. BSARD is difficult because lay legal questions often use different wording from statutory articles.

Dense Profile

Dense retrieval is the best profile for most NanoMTEB-French tasks. It improves Alloprof, BSARD, Mintaka, Syntec, and all xPQA directions by matching semantic answerability and cross-lingual product information beyond exact terms.

Dense retrieval is not universally best: FQuAD remains BM25-led because answer-bearing passages share strong lexical cues with questions. This contrast makes the group useful for separating French semantic matching from exact evidence-word matching.

Reranking Hybrid Profile

reranking_hybrid is best on Alloprof and BSARD in the current metadata, and competitive on FQuAD, Syntec, and monolingual xPQA. These tasks benefit from both exact French terms and semantic matching. In cross-lingual xPQA, dense is much stronger than hybrid, suggesting that sparse retrieval contributes less when the language bridge is weak.

For reranker experiments, xPQA should be read as a multi-positive answer ranking task. Several snippets can answer the same product question, so candidate coverage matters.

Task Summary

TaskRetrieval focusLangQueriesDocsPositivesBM25 nDCG@10Dense nDCG@10Reranking hybrid nDCG@10Best profile
alloprofstudent question to lesson resourcefr2002,5562000.34470.51390.5214Reranking hybrid
bsardlay legal question to statutefr20010,0002000.19430.30230.3048Reranking hybrid
fquadFrench QA question to evidence passagefr2002692000.88990.81020.8666BM25
mintaka_frcomplex question to answer labelmultilingual2001,7142000.29950.36760.3400Dense
synteclabor question to agreement clausefr100901000.71800.86600.8463Dense
xpqa_eng_fraEnglish product question to French answermultilingual2001,6744510.10610.36390.1775Dense
xpqa_fra_engFrench product question to English answermultilingual2001,5474370.29180.64790.3724Dense
xpqa_fra_fraFrench product question to French answerfr2001,5474240.56440.64000.6208Dense

Interpretation Notes for Model Researchers

NanoMTEB-French should be read by retrieval relation. FQuAD and Syntec have strong exact-language cues. BSARD and Alloprof require mapping user questions to formal or explanatory documents. xPQA tests product answerability and cross-lingual matching. Mintaka tests short answer labels, where document text may be too short for ordinary passage retrieval assumptions.

Dense-led cross-lingual xPQA rows are especially informative for multilingual embedding models. BM25-led FQuAD is a reminder that exact French entity and answer wording remains valuable.

Training and Leakage Notes

Useful training data includes French educational QA, statute retrieval, labor agreement QA, French Wikipedia QA, Mintaka-style entity QA, and product QA ranking in French and English. For xPQA, preserve multiple valid answer snippets per question.

Exclude NanoMTEB-French evaluation queries, positives, qrels, answer strings, statutes, lesson resources, agreement clauses, and product snippets. Cross- lingual examples should avoid direct translations of evaluation queries as synthetic seeds.

Source Reference Table

SourceYearTypeURL
MTEB: Massive Text Embedding Benchmark2022paperhttps://arxiv.org/abs/2210.07316
FQuAD: French Question Answering Dataset2020paperhttps://aclanthology.org/2020.findings-emnlp.107/
Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering2022paperhttps://aclanthology.org/2022.coling-1.138/

Metadata Summary

FieldValue
Task pages8
Queries1,500
Split-local documents19,397
Positive qrels2,212
Languagesfr, multilingual
Categoriesnatural_language
Positives / query avg1.47

Task Metadata Summary

TaskBacking datasetLangCategoryQueriesDocsPositivesBM25 nDCG@10Dense nDCG@10Reranking hybrid nDCG@10Best profile
alloprofNanoMTEB-Frenchfrnatural_language2002,5562000.34470.51390.5214Reranking hybrid
bsardNanoMTEB-Frenchfrnatural_language20010,0002000.19430.30230.3048Reranking hybrid
fquadNanoMTEB-Frenchfrnatural_language2002692000.88990.81020.8666BM25
mintaka_frNanoMTEB-Frenchmultilingualnatural_language2001,7142000.29950.36760.3400Dense
syntecNanoMTEB-Frenchfrnatural_language100901000.71800.86600.8463Dense
xpqa_eng_fraNanoMTEB-Frenchmultilingualnatural_language2001,6744510.10610.36390.1775Dense
xpqa_fra_engNanoMTEB-Frenchmultilingualnatural_language2001,5474370.29180.64790.3724Dense
xpqa_fra_fraNanoMTEB-Frenchfrnatural_language2001,5474240.56440.64000.6208Dense