MNanoBEIR / NanoBEIR-ja / NanoDBPedia

Overview

NanoBEIR-ja__NanoDBPedia is the Japanese NanoBEIR version of DBpedia-Entity, an entity retrieval benchmark. The task uses Japanese translated entity-style queries and asks a retriever to rank Japanese translated DBpedia entity descriptions. The Nano split contains 50 queries, 6,045 documents, and 1,158 positive qrels. It is strongly multi-positive, with 23.16 positives per query on average and 48 of 50 queries having more than one positive. This makes it a short-query entity search task where exact names are useful, but category-level and description-level matching are also important.

Details

What the Original Data Measures

DBpedia-Entity V2 evaluates entity search over DBpedia descriptions. BEIR includes it as an entity retrieval task, and this Japanese NanoBEIR version evaluates the same setting through translated queries and translated entity descriptions. Queries may contain an exact entity name, a partial entity reference, or a category-style information need such as films shot in a location or entities related to a historical region. The retriever must rank compact entity descriptions that satisfy that entity need.

Observed Data Profile

The task has 50 queries and 6,045 documents. It contains 1,158 positive qrels, with positives per query ranging from 1 to 81 and a median of 18.00. Queries are short, averaging 28.16 characters, while documents are compact descriptions averaging 174.64 characters. The large number of positives means that many queries represent categories or sets of entities rather than one exact answer. This makes both early precision and result coverage important: a model should surface representative relevant entities near the top and cover many valid entities in the top 100.

BM25 Evaluation Profile

The BM25 top-500 subset reaches nDCG@10 = 0.5843, hit@10 = 0.9200, and Recall@100 = 0.6788. BM25 is strong because entity retrieval often preserves names, locations, titles, and category words. Exact Japanese surface forms, transliterations, and distinctive DBpedia labels can anchor retrieval well. However, BM25 is weaker than dense and hybrid retrieval on every reported metric. Category-style queries can match many descriptions that do not reuse the same wording, and short queries provide limited lexical context.

Dense Evaluation Profile

The dense harrier-oss-270m top-500 subset reaches nDCG@10 = 0.6098, hit@10 = 0.9400, and Recall@100 = 0.7090. Dense retrieval improves both top-10 ranking and top-100 coverage over BM25. This suggests that embedding similarity helps with category and description matching, where a relevant entity can be described through attributes rather than the exact query words. Dense retrieval is especially useful when the query expresses a class of entities, a relation, or a property that is only indirectly stated in the entity description.

Reranking Hybrid Evaluation Profile

The reranking_hybrid subset uses 100 candidates per query and reaches nDCG@10 = 0.6081, hit@10 = 0.9600, and Recall@100 = 0.7263, with no rank-101 safeguard rows. Hybrid retrieval has the best hit@10 and Recall@100, while dense retrieval is very slightly higher on nDCG@10. This is a balanced hybrid pattern: lexical search protects exact entity-name matches, dense search adds category and semantic matches, and the combined candidate pool gives the broadest coverage of relevant entities.

Metric Interpretation for Model Researchers

This task is not a simple exact-name lookup benchmark. BM25 performs well, but the dense and hybrid profiles show that entity search also depends on semantic description matching. Dense retrieval is best for fine top-10 ranking quality, while hybrid retrieval is best for finding at least one relevant entity and for covering more positives in the top 100. Researchers should inspect whether model gains come from exact entity label handling, category matching, or broad multi-positive coverage.

Query and Relevance Type Tendencies

The examples include a dealership and location, a short story collection and author relation, Gallo-Roman architecture in Paris, former Yugoslav republics, and films shot in Venice. Some queries include enough unique terms for lexical retrieval, while others represent classes of entities where many descriptions can be relevant. Relevant documents are short DBpedia-style summaries, so rank errors often come from confusing related entities within the same category.

Representative Failure Modes

BM25 can over-rank exact word matches that do not satisfy the entity relation or category. Dense retrieval can retrieve semantically adjacent entities that are plausible but not judged relevant. Hybrid retrieval can improve coverage but may still miss less obvious positives when a query has dozens of valid entities. For Japanese translation, transliteration variants and translated entity titles can also affect both lexical and dense matching.

Training Data That May Help

Useful training data includes non-overlapping entity search, Wikipedia or DBpedia entity linking, multilingual entity retrieval, and short-query passage retrieval. Hard negatives should be related entities from the same category, location, or time period that fail the specific query relation. Training should exclude DBpedia-Entity, BEIR, NanoBEIR, and translated entity records likely to overlap with this benchmark.

Model Improvement Notes

Strong systems should combine exact entity-name recall with semantic description matching. For category-heavy queries, training should encourage result diversity and broad positive coverage. For exact-name queries, the model must preserve rare names and transliterations. Hybrid candidate generation is a natural fit, with a reranker used to separate truly relevant entities from related but incorrect DBpedia entries.

Example Data

Query	Positive document
フィッツジェラルド・オートモールチャンバーズバーグペンシルベニア州 [35 chars]	フィッツジェラルド・オートモールは1966年に創業された家族経営の自動車販売会社であり、最初の店舗はメリーランド州ベセスダにオープンしました。2014年時点で、フィッツジェラルド・オートモールは『オートモーティブニュース』が毎年発表する米国「トップ125ディーラーグループ」ランキングで59位にランクインしています。また、フィッツジェラルドのディーラー店舗は、2013年のワーズオートe-ディーラー100に第8位、第25位、第30位、第43位、第98位の5店舗で掲載されています。 [240 chars]
1994年の短編小説集『アリス・マーローはオープン』 [26 chars]	アリス・アーン・マンロー（/ˈælɨs ˌæn mʌnˈroʊ/、旧姓レイドロウ /ˈleɪdlɔː/、1931年7月10日生まれ）はカナダの作家である。マンローの作品は、特に時間の前後を行き来する傾向を持つことから、短編小説の構造を革新したと評されている。彼女の物語は「示すよりも内に秘め、誇示するよりも深く明らかにする」と言われている。マンローのフィクションの舞台は、多くの場合、彼女の故郷であるオンタリオ州南西部のヒューロン郡に設定されている。彼女の物語は、簡潔な散文スタイルで人間の複雑さを探求している。 [256 chars]
パリのガロ＝ローマ建築 [11 chars]	パリの芸術は、フランスの首都であるパリにおける芸術文化と歴史に関する記事である。何世紀にもわたり、パリは世界中から芸術家たちを惹きつけてきた。彼らはこの街に訪れ、芸術的な資源や美術館からインスピレーションを得るとともに、研鑽を積んできた。その結果、パリは「芸術の都」という名声を獲得した。 [144 chars]

Source Reference Table

Title	Year	Type	URL
DBpedia-Entity V2	2017	task paper	https://doi.org/10.1145/3077136.3080751
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models	2021	benchmark paper	https://arxiv.org/abs/2104.08663
MMTEB: Massive Multilingual Text Embedding Benchmark	2025	benchmark paper	https://arxiv.org/abs/2502.13595
NanoBEIR: Smaller BEIR dataset subsets	2024	dataset collection	https://huggingface.co/collections/zeta-alpha-ai/nanobeir

Dataset Information

Field	Value
Nano set	MNanoBEIR
Backing dataset	NanoBEIR-ja
Task / split	NanoDBPedia
Hugging Face dataset	hakari-bench/NanoBEIR-ja
Language	ja
Category	natural_language
Queries	50
Documents	6,045
Positive qrels	1,158
Positives / query avg	23.16
Positives / query min	1
Positives / query median	18.00
Positives / query max	81
Multi-positive queries	48 (96.00%)
Query length avg chars	28.16
Document length avg chars	174.64

Candidate Subsets

Profile	Config	nDCG@10	Hit@10	Recall@100	Candidates
BM25	`bm25`	0.5843	0.9200	0.6788	top-500
Dense	`harrier_oss_v1_270m`	0.6098	0.9400	0.7090	top-500
Reranking hybrid	`reranking_hybrid`	0.6081	0.9600	0.7263	top-100