NanoMTEB-French
Overview
NanoMTEB-French is a compact French retrieval group drawn from MTEB-French and related MTEB tasks. It combines educational resource retrieval, Belgian statutory article retrieval, French Wikipedia QA passage retrieval, French Mintaka answer retrieval, Syntec collective-agreement retrieval, and French-English product question answering. The group tests monolingual French retrieval and cross-lingual product QA in one small suite.
The source tasks differ more than their French surface suggests. Alloprof and BSARD map user-facing questions to long educational or legal documents. FQuAD retrieves answer-bearing passages. Mintaka retrieves short entity-like answers. Syntec retrieves labor-agreement clauses. xPQA tests product answerability across French and English directions. BM25 exposes exact wording and named entities; dense retrieval tests French paraphrase and cross-lingual matching; reranking_hybrid shows where both signals are useful.
What This Group Measures
NanoMTEB-French follows MTEB-French retrieval coverage and includes xPQA directions that involve French. The group measures whether a model can retrieve the correct French or French-English answer source across education, law, Wikipedia QA, entity answers, labor agreements, and product QA.
The shared challenge is not only French language handling. Each task has its own relevance definition: an educational lesson should answer a student question, a statute should satisfy a legal need, a product snippet should answer compatibility or specification questions, and a Mintaka answer may be a very short label.
Task Families
- Educational retrieval:
alloprofretrieves French educational resources for student questions. - Legal and workplace retrieval:
bsardandsyntecretrieve Belgian statutes or collective-agreement clauses. - French QA retrieval:
fquadretrieves French Wikipedia evidence passages. - Answer-label retrieval:
mintaka_frretrieves short canonical answer strings. - Product QA retrieval:
xpqa_eng_fra,xpqa_fra_eng, andxpqa_fra_fraretrieve product answer snippets across French-English directions.
Dataset Shape
NanoMTEB-French contains 8 task pages, 1,500 queries, 19,397 split-local documents, and 2,212 positive qrel rows. Most tasks are single-positive. The three xPQA tasks are multi-positive, with roughly two positive snippets per query on average.
Document formats vary widely. Alloprof lessons are long educational documents, BSARD statutes are formal legal articles, FQuAD passages are compact evidence paragraphs, Mintaka targets are short answer strings, and xPQA documents are short product answer snippets. The group should therefore be read by target type rather than as one generic French retrieval benchmark.
Retrieval Behavior
BM25 Profile
BM25 is strongest on fquad, syntec, and monolingual xpqa_fra_fra. These tasks often preserve names, article terms, product specifications, or agreement language. FQuAD in particular has strong lexical overlap between question and Wikipedia evidence.
BM25 is weakest on cross-lingual xPQA and BSARD. French-to-English or English-to-French product QA has limited lexical bridge beyond product names, numbers, and units. BSARD is difficult because lay legal questions often use different wording from statutory articles.
Dense Profile
Dense retrieval is the best profile for most NanoMTEB-French tasks. It improves Alloprof, BSARD, Mintaka, Syntec, and all xPQA directions by matching semantic answerability and cross-lingual product information beyond exact terms.
Dense retrieval is not universally best: FQuAD remains BM25-led because answer-bearing passages share strong lexical cues with questions. This contrast makes the group useful for separating French semantic matching from exact evidence-word matching.
Reranking Hybrid Profile
reranking_hybrid is best on Alloprof and BSARD in the current metadata, and competitive on FQuAD, Syntec, and monolingual xPQA. These tasks benefit from both exact French terms and semantic matching. In cross-lingual xPQA, dense is much stronger than hybrid, suggesting that sparse retrieval contributes less when the language bridge is weak.
For reranker experiments, xPQA should be read as a multi-positive answer ranking task. Several snippets can answer the same product question, so candidate coverage matters.
Task Summary
| Task | Retrieval focus | Lang | Queries | Docs | Positives | BM25 nDCG@10 | Dense nDCG@10 | Reranking hybrid nDCG@10 | Best profile |
| alloprof | student question to lesson resource | fr | 200 | 2,556 | 200 | 0.3447 | 0.5139 | 0.5214 | Reranking hybrid |
| bsard | lay legal question to statute | fr | 200 | 10,000 | 200 | 0.1943 | 0.3023 | 0.3048 | Reranking hybrid |
| fquad | French QA question to evidence passage | fr | 200 | 269 | 200 | 0.8899 | 0.8102 | 0.8666 | BM25 |
| mintaka_fr | complex question to answer label | multilingual | 200 | 1,714 | 200 | 0.2995 | 0.3676 | 0.3400 | Dense |
| syntec | labor question to agreement clause | fr | 100 | 90 | 100 | 0.7180 | 0.8660 | 0.8463 | Dense |
| xpqa_eng_fra | English product question to French answer | multilingual | 200 | 1,674 | 451 | 0.1061 | 0.3639 | 0.1775 | Dense |
| xpqa_fra_eng | French product question to English answer | multilingual | 200 | 1,547 | 437 | 0.2918 | 0.6479 | 0.3724 | Dense |
| xpqa_fra_fra | French product question to French answer | fr | 200 | 1,547 | 424 | 0.5644 | 0.6400 | 0.6208 | Dense |
Interpretation Notes for Model Researchers
NanoMTEB-French should be read by retrieval relation. FQuAD and Syntec have strong exact-language cues. BSARD and Alloprof require mapping user questions to formal or explanatory documents. xPQA tests product answerability and cross-lingual matching. Mintaka tests short answer labels, where document text may be too short for ordinary passage retrieval assumptions.
Dense-led cross-lingual xPQA rows are especially informative for multilingual embedding models. BM25-led FQuAD is a reminder that exact French entity and answer wording remains valuable.
Training and Leakage Notes
Useful training data includes French educational QA, statute retrieval, labor agreement QA, French Wikipedia QA, Mintaka-style entity QA, and product QA ranking in French and English. For xPQA, preserve multiple valid answer snippets per question.
Exclude NanoMTEB-French evaluation queries, positives, qrels, answer strings, statutes, lesson resources, agreement clauses, and product snippets. Cross- lingual examples should avoid direct translations of evaluation queries as synthetic seeds.
Source Reference Table
| Source | Year | Type | URL |
| MTEB: Massive Text Embedding Benchmark | 2022 | paper | https://arxiv.org/abs/2210.07316 |
| FQuAD: French Question Answering Dataset | 2020 | paper | https://aclanthology.org/2020.findings-emnlp.107/ |
| Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering | 2022 | paper | https://aclanthology.org/2022.coling-1.138/ |
Metadata Summary
| Field | Value |
| Task pages | 8 |
| Queries | 1,500 |
| Split-local documents | 19,397 |
| Positive qrels | 2,212 |
| Languages | fr, multilingual |
| Categories | natural_language |
| Positives / query avg | 1.47 |
Task Metadata Summary
| Task | Backing dataset | Lang | Category | Queries | Docs | Positives | BM25 nDCG@10 | Dense nDCG@10 | Reranking hybrid nDCG@10 | Best profile |
| alloprof | NanoMTEB-French | fr | natural_language | 200 | 2,556 | 200 | 0.3447 | 0.5139 | 0.5214 | Reranking hybrid |
| bsard | NanoMTEB-French | fr | natural_language | 200 | 10,000 | 200 | 0.1943 | 0.3023 | 0.3048 | Reranking hybrid |
| fquad | NanoMTEB-French | fr | natural_language | 200 | 269 | 200 | 0.8899 | 0.8102 | 0.8666 | BM25 |
| mintaka_fr | NanoMTEB-French | multilingual | natural_language | 200 | 1,714 | 200 | 0.2995 | 0.3676 | 0.3400 | Dense |
| syntec | NanoMTEB-French | fr | natural_language | 100 | 90 | 100 | 0.7180 | 0.8660 | 0.8463 | Dense |
| xpqa_eng_fra | NanoMTEB-French | multilingual | natural_language | 200 | 1,674 | 451 | 0.1061 | 0.3639 | 0.1775 | Dense |
| xpqa_fra_eng | NanoMTEB-French | multilingual | natural_language | 200 | 1,547 | 437 | 0.2918 | 0.6479 | 0.3724 | Dense |
| xpqa_fra_fra | NanoMTEB-French | fr | natural_language | 200 | 1,547 | 424 | 0.5644 | 0.6400 | 0.6208 | Dense |