HAKARI-Bench

NanoMuPLeR / sl

Overview

NanoMuPLeR / sl is the Slovenian split of MuPLeR-retrieval, a multilingual legal retrieval benchmark based on European Union legal passages. Queries are synthetic Slovenian legal questions, and documents are Slovenian DGT-Acquis passages. Each query has exactly one relevant passage. The split is useful because BM25 and dense retrieval are close and neither fully dominates, while the hybrid pool improves both early precision and candidate coverage. It tests whether models can combine exact legal terminology with semantic matching under Slovenian morphology and translated EU legal style.

Details

What the Original Data Measures

MuPLeR-retrieval evaluates same-language legal passage retrieval across European languages. The source dataset card describes 10,000 DGT-Acquis passages and 200 synthetic parallel queries for each language. DGT-Acquis is part of the European Union's multilingual legal corpus resources and is documented in work on highly multilingual EU parallel corpora.

For Slovenian, a model must identify the single passage that answers a specific legal, regulatory, market, or policy question, rather than merely retrieving a passage from the same broad EU topic.

Observed Data Profile

The Nano split contains 200 queries, 10,000 documents, and 200 positive qrel rows. Every query has exactly one positive. Queries average 136.35 characters, while documents average 607.82 characters.

Examples include telecommunications dominance under a 1998 framework, support for limiting small denominations, beverage packaging weight share, national candy supply controlled by a franchisor, and a 2004 merger that did not create dominance or close online music markets. The content mixes competition policy, market structure, packaging systems, and regulatory interpretation.

BM25 Evaluation Profile

The BM25 candidate subset uses top-500 candidates and reaches nDCG@10 of 0.7455, hit@10 of 0.8350, and recall@100 of 0.9000. BM25 is moderately strong because Slovenian EU legal passages contain exact anchors such as dates, percentages, market terms, and named regulatory frameworks.

However, the sparse profile is not sufficient for the full task. The queries often describe legal consequences or market findings in paraphrased form, and many near negatives share the same market or policy vocabulary without answering the exact question.

Dense Evaluation Profile

The dense candidate subset from harrier_oss_v1_270m uses top-500 candidates and reaches nDCG@10 of 0.7428, hit@10 of 0.8250, and recall@100 of 0.9250. Dense retrieval is close to BM25 by nDCG@10 and hit@10, and stronger by recall@100. This indicates that embedding similarity finds additional positives at deeper ranks but does not consistently improve early ordering.

The split is therefore a balanced diagnostic. Dense retrieval helps with paraphrase and conceptual matching, while sparse retrieval keeps an advantage for exact regulatory names, numeric shares, and market-specific terms.

Reranking Hybrid Evaluation Profile

The reranking_hybrid subset uses top-100 candidates, with five rows receiving the optional rank-101 safeguard. It reaches nDCG@10 of 0.7983, hit@10 of 0.8950, and recall@100 of 0.9750. Hybrid retrieval is clearly strongest.

The result shows that BM25 and dense retrieval contribute complementary evidence. The hybrid pool gives a reranker access to more positives and better top-ten placement than either standalone method, which is important for Slovenian legal retrieval where both exact terms and semantic relations matter.

Metric Interpretation for Model Researchers

With one positive per query, nDCG@10 measures how early the correct passage appears, hit@10 measures first-page success, and recall@100 measures candidate availability for reranking. For Slovenian MuPLeR, BM25 and dense retrieval are comparable but incomplete, while hybrid retrieval is the target candidate-generation profile.

Researchers should treat this split as a test of complementary sparse-dense behavior rather than a case where one method clearly wins.

Query and Relevance Type Tendencies

Queries are formal Slovenian questions about market dominance, surveys, packaging categories, national supply shares, and merger effects. Relevant documents are translated EU legal, regulatory, or administrative passages with compact factual claims and formal terminology.

Relevance is exact. A passage from the same market, sector, or policy debate is a hard negative if it does not contain the requested legal finding or quantitative condition.

Representative Failure Modes

Failures include matching a telecommunications passage with the wrong dominance criterion, retrieving packaging text without the requested beverage share, confusing national supply structure with broader market descriptions, and selecting merger discussion that lacks the specific non-dominance or foreclosure conclusion. Sparse systems miss paraphrase; dense systems can over-rank adjacent market discussions.

Training Data That May Help

Useful training data includes non-overlapping Slovenian EUR-Lex and DGT-Acquis retrieval pairs, Slovenian legal QA, multilingual legal bitext, and hard negatives from similar EU market or policy passages. Evaluation queries and exact positives should be excluded.

Model Improvement Notes

Models should handle Slovenian legal morphology, exact numeric expressions, market terminology, and semantic paraphrase together. Hard negatives should share the same regulatory or market domain but differ in the requested condition. The hybrid result suggests that reranking over combined sparse-dense candidates is the most informative evaluation setup.

Example Data

QueryPositive document
Kateri regulativni okvir je organom omogočal opredeliti podjetja kot prevladujoča pri 25% tržnem deležu, upoštevajoč dostop končnih uporabnikov in finance? [155 chars]V skladu z regulativnim okvirom iz leta 1998 so bila področja trga telekomunikacijskega sektorja, za katera je veljala ureditev ex ante, določena v ustreznih direktivah, vendar ti trgi niso bili opredeljeni v skladu z načeli konkurenčnega prava. Na teh področjih, opredeljenih v skladu z regulativnim okvirom iz leta 1998, so imeli nacionalni regulativni organi pooblastila, da podjetja, ki imajo 25 % tržni delež, opredelijo kot podjetja s pomembno tržno močjo, pri čemer je mogoče odstopanje od tega praga ob upoštevanju sposobnosti podjetja, da vpliva na trg, njegovega prometa glede na velikost trga, njegovega nadzora nad sredstvi dostopa do končnih uporabnikov, njegovega dostopa do finančnih virov in njegovih izkušenj pri zagotavljanju proizvodov in storitev na trgu. [775 chars]
Katere države so v raziskavi zabeležile približno štiri petine podpore za omejitev manjših apoenov? [99 chars]Bankovci in kovanci. Glede zadovoljstva s sedanjimi apoeni bankovcev in kovancev, je raziskava pokazala, da pri bankovcih spremembe niso potrebne, precejšen odstotek anketirancev (od 80 % na Finskem in v Nemčiji do 33–35 % na Irskem in v Italiji) pa zagovarja zmanjšanje števila eurokovancev, zlasti ukinitev kovancev za 1 in 2 centa, kar bi bilo udobneje in bi poenostavilo plačila. Po drugi strani se večina boji, da bi odstranitev majhnih apoenov eura lahko povzročila rast cen: ta bojazen je zelo razširjena tudi v državah, kjer bi večina želela ukinitev manjših kovancev. [576 chars]
Kateri segment embalaže za pijačo v razpravah v EU predstavlja približno petino skupne embalaže po teži? [104 chars]Nacionalni sistemi za ponovno uporabo embalaže upoštevajo več vrst embalaže. Nekateri od teh sistemov delujejo zelo dobro, zlasti tisti za prevozno embalažo, kakršne so gajbe in palete, pa tudi za embalaže za pijačo v gostinstvu. Na drugih področjih pa je mogoče potreben poseg javnih organov za spodbuditev sistemov ponovne uporabe, ne glede na njihovo dejansko poslovno upravičenost. Pri tem se večji del razprave v Evropski uniji osredotoča na potrošniško embalažo za pijačo (ki znaša okoli 20 % skupne embalaže po teži). [524 chars]

Source Reference Table

TitleYearTypeURL
MuPLeR: Multilingual Parallel Legal Retrievaldataset cardhttps://huggingface.co/datasets/mteb/MuPLeR-retrieval
An overview of the European Union's highly multilingual parallel corpora2014source paperhttps://link.springer.com/article/10.1007/s10579-014-9277-0
DGT-Acquissource corpushttps://joint-research-centre.ec.europa.eu/language-technology-resources/dgt-acquis_en

Dataset Information

FieldValue
Nano setNanoMuPLeR
Backing datasetNanoMuPLeR
Task / splitsl
Hugging Face datasethakari-bench/NanoMuPLeR
Languagesl
Categorynatural_language
Queries200
Documents10,000
Positive qrels200
Positives / query avg1.00
Positives / query min1
Positives / query median1.00
Positives / query max1
Multi-positive queries0 (0.00%)
Query length avg chars136.35
Document length avg chars607.82

Candidate Subsets

ProfileConfignDCG@10Hit@10Recall@100Candidates
BM25bm250.74550.83500.9000top-500
Denseharrier_oss_v1_270m0.74280.82500.9250top-500
Reranking hybridreranking_hybrid0.79830.89500.9750top-100