HAKARI-Bench

NanoMuPLeR / pt

Overview

NanoMuPLeR / pt is the Portuguese split of MuPLeR-retrieval, a multilingual legal retrieval benchmark based on European Union legal passages. Queries are synthetic Portuguese legal questions, and documents are Portuguese DGT-Acquis passages. Each query has exactly one relevant passage, so the task measures whether a retriever can identify the precise legal condition, actor, threshold, model, or administrative rule that answers the question. The split is useful because dense retrieval is stronger than BM25, while hybrid retrieval improves still further by combining semantic and lexical evidence.

Details

What the Original Data Measures

MuPLeR-retrieval measures multilingual legal passage retrieval over DGT-Acquis-derived European Union text. The source dataset card describes 10,000 passages and 200 synthetic parallel queries for each language. DGT-Acquis is part of the EU's multilingual legal corpus resources and is documented in work on highly multilingual parallel corpora.

For Portuguese, retrieval is same-language and single-positive. A model must rank the passage that grounds a legal question above other EU passages that may share the same institution, policy area, product, or legal vocabulary.

Observed Data Profile

The Nano split contains 200 queries, 10,000 documents, and 200 positive qrel rows. Each query has one positive. Queries average 135.46 characters, while documents average 702.90 characters.

Examples include regional state aid proportionality, small and medium enterprise wording in accession documents, criticism of the 1983 Baxter model for interchange fees, cosmetic revisions in marketing-authorization files, and retroactive state compensation without objective prior criteria. The split mixes competition law, state aid, market authorization, and administrative interpretation.

BM25 Evaluation Profile

The BM25 candidate subset uses top-500 candidates and reaches nDCG@10 of 0.8222, hit@10 of 0.8950, and recall@100 of 0.9750. BM25 is strong because many Portuguese questions retain legal terms, percentages, dates, institutional names, and specialized phrases from the positive passage.

However, BM25 is not the strongest standalone profile. The questions often compress or paraphrase long legal reasoning, so exact term overlap can retrieve a nearby provision without placing the answer passage first.

Dense Evaluation Profile

The dense candidate subset from harrier_oss_v1_270m uses top-500 candidates and reaches nDCG@10 of 0.8552, hit@10 of 0.9300, and recall@100 of 0.9750. Dense retrieval improves top-rank quality and hit@10 while matching BM25's recall@100. This indicates that Portuguese MuPLeR has many cases where embedding similarity captures the legal relation better than term frequency alone.

Dense retrieval is especially helpful for argumentative and explanatory passages, such as state-aid proportionality or competition-policy models, where the query asks for a legal rationale rather than a surface phrase.

Reranking Hybrid Evaluation Profile

The reranking_hybrid subset uses top-100 candidates, with two rows receiving the optional rank-101 safeguard. It reaches nDCG@10 of 0.8895, hit@10 of 0.9650, and recall@100 of 0.9900. This is the strongest profile for the split.

The hybrid result shows that dense retrieval supplies strong semantic ranking while BM25 still contributes exact legal anchors and complementary coverage. A reranker should benefit from the combined pool, especially for questions involving both precise numbers and paraphrased legal consequences.

Metric Interpretation for Model Researchers

With one positive per query, nDCG@10 reflects how early the correct passage appears, hit@10 measures whether it is in the first ten results, and recall@100 measures candidate availability for reranking. For Portuguese MuPLeR, dense retrieval is the stronger standalone ranker, and hybrid retrieval is the best candidate-generation setting.

This makes the split a useful test for models that claim to improve semantic legal retrieval without losing exact-match behavior on EU terminology, dates, and institutional references.

Query and Relevance Type Tendencies

Queries are formal Portuguese legal questions about aid proportionality, accession wording, economic models, authorization-file revisions, and Treaty rules on state aid. Relevant passages are formal EU legal or administrative texts that often explain a rule through long clauses.

The relevance relation is exact. A passage may share the same legal domain or vocabulary and still be wrong if it does not answer the requested condition or rationale.

Representative Failure Modes

Failures include retrieving a related aid provision without the proportionality rule, matching the same economic topic but the wrong model assumption, confusing authorization-file procedures, or selecting a state-aid passage that lacks the requested objective-criteria condition. Sparse systems miss paraphrase; dense systems can overgeneralize among nearby legal topics.

Training Data That May Help

Useful training data includes non-overlapping Portuguese EUR-Lex and DGT-Acquis retrieval pairs, Portuguese legal QA, multilingual legal bitext, and hard negatives from adjacent EU acts or opinions. Evaluation queries and exact positive passages should be excluded.

Model Improvement Notes

Portuguese legal retrieval models should preserve exact legal names, percentages, dates, and institutions while learning semantic alignment for legal rationale and procedural paraphrase. Hard negatives should share the same policy area and many surface terms but fail the requested legal condition. Hybrid candidate generation is the strongest setup for downstream reranking.

Example Data

QueryPositive document
Por que auxílio público escasso deve ser proporcional, dirigido a regiões desfavorecidas e justificado apesar de maior distorção da concorrência? [145 chars]Os auxílios regionais só podem ser eficazes se forem utilizados com parcimónia e de forma proporcional e se se concentrarem nas regiões mais desfavorecidas da União Europeia. Em especial, os limites máximos admissíveis devem reflectir a gravidade relativa dos problemas que afectam o desenvolvimento das regiões em causa. Além disso, as vantagens dos auxílios em termos de desenvolvimento de uma região desfavorecida devem ser superiores às distorções da concorrência provocadas. O peso atribuído às vantagens dos auxílios é susceptível de variar consoante a derrogação aplicada, podendo aceitar-se uma distorção mais significativa no caso das regiões mais desfavorecidas abrangidas pelo n.o 3, alínea a), do artigo 87.o do que no caso das regiões abrangidas pela alínea c) do mesmo número. [790 chars]
Que intenção sinalizam comentários que estendem concessão a empresas com volume de negócios intracomunitário abaixo de seis dígitos em euros? [141 chars]Cabe notar, por outro lado, que a Comissão refere as pequenas e médias empresas, quando os documentos oficiais e, em particular, os protocolos de adesão evocam apenas as pequenas empresas: trata-se de um indício evidente da vontade de minimizar a importância da autorização e de a estender, na prática, a empresas de dimensão muito diferente. A proposta de directiva evita aprofundar a questão da classificação, limitando-se a falar de sujeitos passivos com um volume de negócios intracomunitário que não exceda 100000 euros. A vontade de estender o benefício da isenção a todas as empresas, independentemente da sua dimensão, é, pois, evidente. [645 chars]
Qual modelo analítico de 1983 é criticado por presumir benefícios uniformes a comerciantes, compradores e vendedores não reativos nos pagamentos? [145 chars]O modelo específico subjacente às CIM da Master Card foi elaborado por William Baxter em 1983. Contudo, este modelo padece de limitações importantes por considerar que a procura dos consumidores e dos comerciantes é um dado adquirido e que nenhum deles reage estrategicamente às eventuais acções do outro. O modelo Baxter também se baseia no pressuposto irrealista de que não se verifica qualquer variação dos benefícios para os comerciantes que aceitam cartões, ou seja, pressupõe que os comerciantes são um universo homogéneo. Por último, os resultados do modelo Baxter baseiam-se no pressuposto irrealista de que a actividade bancária em questão é exercida em condições de concorrência perfeita. [698 chars]

Source Reference Table

TitleYearTypeURL
MuPLeR: Multilingual Parallel Legal Retrievaldataset cardhttps://huggingface.co/datasets/mteb/MuPLeR-retrieval
An overview of the European Union's highly multilingual parallel corpora2014source paperhttps://link.springer.com/article/10.1007/s10579-014-9277-0
DGT-Acquissource corpushttps://joint-research-centre.ec.europa.eu/language-technology-resources/dgt-acquis_en

Dataset Information

FieldValue
Nano setNanoMuPLeR
Backing datasetNanoMuPLeR
Task / splitpt
Hugging Face datasethakari-bench/NanoMuPLeR
Languagept
Categorynatural_language
Queries200
Documents10,000
Positive qrels200
Positives / query avg1.00
Positives / query min1
Positives / query median1.00
Positives / query max1
Multi-positive queries0 (0.00%)
Query length avg chars135.46
Document length avg chars702.90

Candidate Subsets

ProfileConfignDCG@10Hit@10Recall@100Candidates
BM25bm250.82220.89500.9750top-500
Denseharrier_oss_v1_270m0.85520.93000.9750top-500
Reranking hybridreranking_hybrid0.88950.96500.9900top-100