HAKARI-Bench

NanoMTEB-French / bsard

Overview

bsard is the Belgian Statutory Article Retrieval Dataset in French. Queries are lay legal questions, and documents are Belgian statutory articles. The Nano split contains 200 queries, 10,000 documents, and 200 positive qrel rows, with exactly one positive article per query. It evaluates whether a retrieval model can map ordinary legal problems to the statutory article that grounds the answer.

This is a hard legal retrieval task because citizen language and statutory language often diverge. BM25 is weak, dense retrieval with harrier_oss_v1_270m is substantially stronger, and reranking_hybrid has the highest nDCG@10 but lower hit@10 and recall@100 than dense. The task is useful for testing French legal semantic retrieval, especially the ability to connect plain-language questions about debt, tenancy, inheritance, legal aid, and procedure to formal Belgian law articles.

Details

What the Original Data Measures

A Statutory Article Retrieval Dataset in French introduces BSARD as a French legal retrieval dataset with questions labeled by jurists against Belgian statutory articles. The paper emphasizes the mismatch between non-expert legal questions and formal law text, as well as the hierarchical structure of legal codes.

In this Nano version, the query is a lay legal question and the relevant document is a statute article. The task is not general legal FAQ retrieval; it requires article-level statutory grounding.

Observed Data Profile

Queries average 144.97 characters and often include both a natural question and category-like legal context. Documents average 793.01 characters and are formal statutory text with article sections, legal clauses, and enumerated conditions. The split has one positive article per query.

Examples cover annual campsite caravan rental in Brussels, modifying a testament, court costs after contesting a social-security decision, repairs that a landlord does not perform, and understanding a water bill in Wallonia. The positive article may not use the same words as the lay question.

BM25 Evaluation Profile

BM25 reaches nDCG@10 = 0.1943, hit@10 = 0.3350, and recall@100 = 0.5500 over top-500 candidate lists. This weak sparse profile reflects the lay-to-law vocabulary gap. A query may mention a practical problem, while the statute uses formal legal categories and article language.

BM25 succeeds when the query contains exact terms from the code, such as a law name, legal procedure, or distinctive phrase. It fails when the user describes the situation in ordinary language or when many articles share similar legal terms.

Dense Evaluation Profile

Dense retrieval with harrier_oss_v1_270m reaches nDCG@10 = 0.3023, hit@10 = 0.4550, and recall@100 = 0.7250. Dense retrieval is much stronger than BM25 because it better connects lay legal intent with formal statutory concepts. It is the best source for hit@10 and recall@100.

Dense retrieval still leaves many failures because legal articles are precise. Several provisions can concern the same procedure or right while differing in jurisdiction, condition, or exception. A dense model must distinguish statutory scope, not only legal topic.

Reranking Hybrid Evaluation Profile

The reranking_hybrid candidate column reaches nDCG@10 = 0.3048, hit@10 = 0.4350, and recall@100 = 0.6750, with 100 to 101 candidates per query and 65 rank-101 safeguard rows. It has the highest nDCG@10 but lower hit and recall than dense retrieval. This suggests that hybrid search can rank some positives slightly higher when sparse legal terms help, but the restricted hybrid pool misses more positives than dense top-500.

For reranking, the hybrid candidate set is useful but should be monitored for coverage. Dense retrieval appears to supply broader access to the correct law articles, while hybrid order can help when exact code terms are present.

Metric Interpretation for Model Researchers

This is a single-positive task, so nDCG@10 measures the rank of the one target statutory article. Hit@10 measures practical legal-search visibility, and recall@100 measures whether a reranker can access the correct provision.

The central observation is that sparse legal terms alone are not enough. Dense semantic matching is critical, but final ranking still requires legal scope discrimination.

Query and Relevance Type Tendencies

Queries are French lay legal questions about tenancy, debt, social-security procedure, legal aid, family matters, wills, and administrative benefits. Relevant documents are formal Belgian statutory articles.

Relevance is statutory basis. A document about the same legal topic is not sufficient unless it provides the article needed to address the question.

Representative Failure Modes

BM25 can fail when the query uses everyday language instead of statute terminology. Dense retrieval can fail when it retrieves the right legal topic but the wrong article or jurisdiction. Hybrid retrieval can over-rank articles that share legal terms but do not answer the specific question.

Hard negatives should come from the same Belgian code, legal topic, or procedure, because those are the realistic confusions.

Training Data That May Help

Useful training data includes non-overlapping BSARD train examples, French legal question-to-statute retrieval pairs, legal FAQ to code article mappings, and hard negatives from the same Belgian code or legal topic. Training should exclude BSARD test questions, Nano queries, qrels, and positive Belgian law articles likely to overlap with this evaluation.

Synthetic data can pair formal French Belgian-style statutory articles with lay French legal questions about debt, rental law, family law, legal aid, procedure, and benefits. Each positive article should provide the statutory basis needed to address the lay question.

Model Improvement Notes

Improving this task requires legal semantic alignment and article-level precision. Dense encoders should be trained on lay-question-to-statute pairs with same-code hard negatives. Rerankers should compare the user's situation against legal conditions, exceptions, and jurisdiction.

The task is a strong diagnostic for French legal retrieval because it exposes both vocabulary mismatch and statutory-scope errors.

Example Data

QueryPositive document
Je loue une caravane dans un camping à l'année. Quelles règles s'appliquent à mon bail à Bruxelles ? Bail de résidence principale (Bruxelles), Champ d'application [162 chars]PrincipesLe présent chapitre s'applique aux baux portant sur le logement que le preneur, avec l'accord exprès ou tacite du bailleur, affecte dès l'entrée en jouissance à sa résidence principale. Est réputée non écrite la clause interdisant l'affectation du bien à la résidence principale du preneur lorsqu'elle n'est pas appuyée par une justification expresse et sérieuse, relative notamment à la destination naturelle du bien loué, et n'est pas accompagnée de l'indication de la résidence principale du preneur au cours du bail.Le présent chapitre s'applique également si l'affectation à la résidence principale se fait en cours de bail avec l'accord écrit du bailleur. Dans ce cas, le bail prend cours à la date de cet accord.Le présent chapitre s'applique à la sous-location conclue conformément à l'article 230, dans les limites prévues à ce même article.Sauf disposition contraire, le présent chapitre n'est pas applicable lorsque le contrat par lequel le logement est accordé au preneur est l'a... [1,000 / 1,082 chars]
J’ai fait un testament. Puis-je le modifier ? Démarches avant décès, Donation et testament, Testament [101 chars]Le testament par acte public est celui qui est reçu par un notaire. Le testament par acte public est celui qui est reçu par un notaire, en présence de deux témoins, ou par deux notaires. [186 chars]
Dois-je payer les frais de justice si je conteste une décision d’un organisme de sécurité sociale ? [99 chars]L'indemnité de procédure est une intervention forfaitaire dans les frais et honoraires d'avocat de la partie ayant obtenu gain de cause.Après avoir pris l'avis de l'Ordre des barreaux francophones et germanophone et de l'Orde van Vlaamse Balies, le Roi établit par arrêté délibéré en Conseil des ministres, les montants de base, minima et maxima de l'indemnité de procédure, en fonction notamment de la nature de l'affaire et de l'importance du litige.(A la demande d'une des parties, éventuellement formulée sur interpellation par le juge, celui-ci peut, par décision spécialement motivée,) soit réduire l'indemnité soit l'augmenter, sans pour autant dépasser les montants maxima et minima prévus par le Roi. Dans son appréciation, le juge tient compte : - de la capacité financière de la partie succombante, pour diminuer le montant de l'indemnité;- de la complexité de l'affaire;- des indemnités contractuelles convenues pour la partie qui obtient gain de cause;- du caractère manifestement dérais... [1,000 / 2,341 chars]

Source Reference Table

TitleYearTypeURL
A Statutory Article Retrieval Dataset in French2022arXiv paperhttps://arxiv.org/abs/2108.11792
MTEB-French: Resources for French Sentence Embedding Evaluation and Analysis2024arXiv paperhttps://arxiv.org/abs/2405.20468
mteb/BSARDRetrievaldataset cardhttps://huggingface.co/datasets/mteb/BSARDRetrieval

Dataset Information

FieldValue
Nano setNanoMTEB-French
Backing datasetNanoMTEB-French
Task / splitbsard
Hugging Face datasethakari-bench/NanoMTEB-French
Languagefr
Categorynatural_language
Queries200
Documents10,000
Positive qrels200
Positives / query avg1.00
Positives / query min1
Positives / query median1.00
Positives / query max1
Multi-positive queries0 (0.00%)
Query length avg chars144.97
Document length avg chars793.01

Candidate Subsets

ProfileConfignDCG@10Hit@10Recall@100Candidates
BM25bm250.19430.33500.5500top-500
Denseharrier_oss_v1_270m0.30230.45500.7250top-500
Reranking hybridreranking_hybrid0.30480.43500.6750top-100

Training and Leakage Metadata