HAKARI-Bench

MNanoBEIR / NanoBEIR-pt / NanoDBPedia

Overview

NanoBEIR-pt NanoDBPedia is a Portuguese entity retrieval task derived from DBpedia-Entity. Queries are short keyword or natural-language entity needs, and documents are translated DBpedia-style entity descriptions. The task is useful for evaluating entity search under many-positive relevance: a query may have many valid entities, and a good system should retrieve a diverse set of matching descriptions rather than only the most obvious name match. It also tests how well multilingual retrieval models combine exact entity cues with broader semantic category matching.

Details

What the Original Data Measures

DBpedia-Entity evaluates ranking entities for information needs over DBpedia. In BEIR, the dataset is used as an entity retrieval task with heterogeneous queries ranging from exact names to category-like descriptions. The MNanoBEIR Portuguese version preserves this objective after translation. It measures whether a retriever can match Portuguese entity needs to concise entity descriptions by using names, aliases, locations, occupations, types, and descriptive attributes.

Observed Data Profile

This Nano subset contains 50 queries, 6,045 documents, and 1,158 positive qrels. It is strongly multi-positive: the average is 23.16 positives per query, with a minimum of 1, median of 18.00, and maximum of 81. There are 48 multi-positive queries, covering 96.0% of the task. Queries are short at 36.62 characters on average, while documents average 354.37 characters. This makes the task a high-coverage entity search benchmark rather than a single-answer retrieval problem.

BM25 Evaluation Profile

BM25 uses the bm25 top-500 candidate subset. It reaches nDCG@10 0.5110, hit@10 0.9200, and recall@100 0.6114. Entity search is favorable to lexical retrieval because names, places, and category terms often appear directly in both query and document. The high hit@10 confirms that BM25 is very effective at finding at least one relevant entity. The harder part is ranking many valid entities and covering the broader positive set. BM25 can overvalue exact term overlap and miss descriptions that satisfy the query through type or attribute matching rather than direct wording.

Dense Evaluation Profile

Dense retrieval uses the harrier_oss_v1_270m top-500 candidate subset. It scores nDCG@10 0.5816, hit@10 0.9200, and recall@100 0.6865. Dense retrieval improves ranking and coverage over BM25 while matching its first-page hit rate. This suggests that embedding similarity is better at capturing entity type, category, and attribute relationships that are not expressed with the same surface words. Dense retrieval is particularly helpful for category-style queries such as films, republics, architecture, or collections, where the relevant entities may share meaning more than exact vocabulary.

Reranking Hybrid Evaluation Profile

The reranking hybrid subset uses reranking_hybrid with exactly 100 candidates per query and no safeguard rows. It reaches nDCG@10 0.5620, hit@10 0.9600, and recall@100 0.7098. The hybrid profile has the best hit@10 and recall@100, while dense retrieval has the best nDCG@10. This shows that lexical and dense signals are complementary for Portuguese entity search: hybrid retrieval brings more positives into the candidate pool, but dense ordering is slightly cleaner near the top. A reranker can use the hybrid pool to improve both coverage and early ordering.

Metric Interpretation for Model Researchers

Because each query often has many positives, hit@10 is not enough to evaluate success. Recall@100 shows how much of the relevant entity set is available, and nDCG@10 shows whether useful entities appear early. The observed scores show that BM25 is strong for exact entity cues, dense retrieval improves semantic category matching, and reranking hybrid gives the broadest top-100 coverage. This task is therefore useful for separating exact-name retrieval from true entity set retrieval.

Query and Relevance Type Tendencies

Queries include exact or near-exact entity references, short category descriptions, and natural-language requests. Relevant documents are compact entity descriptions containing names, types, locations, dates, or identifying facts. Examples include an auto mall, Alice Munro, Gallo-Roman architecture in Paris, former Yugoslav republics, and films shot in Venice. The task favors models that preserve both surface entity clues and semantic constraints.

Representative Failure Modes

BM25 may retrieve descriptions that repeat a rare name or category word but do not satisfy the full entity need. Dense systems may retrieve semantically related entities that fail a specific location, type, or time constraint. Hybrid systems improve coverage but may still require reranking to diversify and enforce constraints. Translation can also alter category wording while preserving names, making exact lexical matching uneven across query types.

Training Data That May Help

Helpful training data includes non-overlapping entity search, Wikipedia and DBpedia retrieval, alias matching, multilingual entity linking, and short-query to entity-description ranking. Hard negatives should share entity types, places, occupations, or names while violating one query constraint. Training should exclude DBpedia-Entity, BEIR, NanoBEIR, and translated duplicate evaluation records.

Model Improvement Notes

NanoDBPedia-pt is a strong benchmark for entity-oriented retrieval. Dense retrieval is the best early ranker, while reranking hybrid gives the best coverage and first-page success. Improvements should focus on entity type representation, alias and attribute handling, and reranking that checks constraints rather than only broad semantic similarity. A production entity search system would likely use hybrid candidates followed by a constraint-aware reranker.

Example Data

QueryPositive document
Fitzgerald Auto Mall em Chambersburg, PA [40 chars]Fitzgerald Auto Malls é uma concessionária de automóveis de propriedade e operação familiar fundada em 1966, com sua primeira localização abrindo em Bethesda, Maryland. Em 2014, a Fitzgerald Auto Malls ficou em 59º lugar na lista das "Top 125 Concessionárias" dos EUA, publicada anualmente pela Automotive News. As localizações da Fitzgerald aparecem cinco vezes na lista WardsAuto e-Dealer 100 de 2013, nas posições 8, 25, 30, 43 e 98. [436 chars]
Coleção de contos de 1994 de Alice Munro está disponível [56 chars]Alice Ann Munro (nascida Laidlaw; 10 de julho de 1931) é uma autora canadense. O trabalho de Munro é frequentemente descrito como tendo revolucionado a arquitetura dos contos, especialmente por sua tendência de avançar e retroceder no tempo. Suas histórias são conhecidas por "embutir mais do que anunciar, revelar mais do que exibir." A ficção de Munro é, na maioria das vezes, ambientada em seu condado natal, Huron, no sudoeste de Ontário. Seus contos exploram as complexidades humanas em um estilo de prosa simples e direto. [528 chars]
Arquitetura galo-romana em Paris [32 chars]A Arte em Paris é um artigo sobre a cultura e a história da arte em Paris, a capital da França. Há séculos, Paris atrai artistas de todo o mundo, que chegam à cidade para se educarem e buscar inspiração em seus recursos artísticos e galerias. Como resultado, Paris adquiriu a reputação de "Cidade da Arte". [306 chars]

Source Reference Table

TitleYearTypeURL
DBpedia Entity Retrieval2017task paperhttps://doi.org/10.1145/3077136.3080751
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models2021benchmark paperhttps://arxiv.org/abs/2104.08663
MMTEB: Massive Multilingual Text Embedding Benchmark2025benchmark paperhttps://arxiv.org/abs/2502.13595
NanoBEIR: Smaller BEIR dataset subsets2024dataset collectionhttps://huggingface.co/collections/zeta-alpha-ai/nanobeir

Dataset Information

FieldValue
Nano setMNanoBEIR
Backing datasetNanoBEIR-pt
Task / splitNanoDBPedia
Hugging Face datasethakari-bench/NanoBEIR-pt
Languagept
Categorynatural_language
Queries50
Documents6,045
Positive qrels1,158
Positives / query avg23.16
Positives / query min1
Positives / query median18.00
Positives / query max81
Multi-positive queries48 (96.00%)
Query length avg chars36.62
Document length avg chars354.37

Candidate Subsets

ProfileConfignDCG@10Hit@10Recall@100Candidates
BM25bm250.51100.92000.6114top-500
Denseharrier_oss_v1_270m0.58160.92000.6865top-500
Reranking hybridreranking_hybrid0.56200.96000.7098top-100