MNanoBEIR / NanoBEIR-th / NanoDBPedia

Overview

NanoDBPedia in the Thai NanoBEIR slice is an entity-oriented retrieval task derived from DBpedia Entity Retrieval. The queries are Thai translated entity information needs, and the corpus contains Thai translated DBpedia-style entity descriptions. The task asks a model to rank entities that satisfy short needs involving names, categories, places, works, organizations, or entity classes. It is a compact diagnostic for multilingual entity search, alias handling, and type-aware retrieval in Thai.

Details

What the Original Data Measures

DBpedia Entity Retrieval evaluates ranking entities for information needs over an encyclopedic knowledge base. In BEIR-style retrieval, the entity descriptions are treated as documents. Relevance can depend on entity name, alias, type, location, category, relationship, or a combination of these constraints. A relevant result is not merely a document sharing words with the query; it must be the right kind of entity.

The Thai translated version adds difficulty from short Thai queries, translated descriptions, entity names that may remain in English, and tokenization-sensitive matching. A strong model must preserve exact names and categories while also recognizing semantic entity constraints when the query wording differs from the description wording.

Observed Data Profile

The task contains 50 queries, 6,045 documents, and 1,158 relevance judgments. It is strongly multi-positive, with an average of 23.16 positives per query. The minimum is 1, the median is 18.0, the maximum is 81, and 48 queries are multi-positive, or 96.0% of the query set. This makes the benchmark a broad entity ranking task rather than single-answer lookup.

Queries average 30.92 characters, while documents average 316.44 characters. The queries are short and often keyword-like, while entity descriptions are concise summaries. Since one query can match many entities, the model must recover the relevant set, not just one obvious name match.

BM25 Evaluation Profile

BM25 reaches nDCG@10 of 0.5043, hit@10 of 0.9200, and recall@100 of 0.6520 using the top-500 BM25 candidate subset. This is a strong lexical profile. Entity retrieval often contains names, places, or category terms that transfer directly into descriptions, and BM25 can exploit these anchors effectively.

The recall@100 value shows the remaining challenge. Many queries have broad relevant sets, and exact overlap may miss aliases, translated category labels, or entities that satisfy the intent without repeating the query terms. BM25 is useful for initial entity matching but does not fully cover the relevant set.

Dense Evaluation Profile

The dense harrier-oss-270m run reaches nDCG@10 of 0.5468, hit@10 of 0.9400, and recall@100 of 0.6675. Dense retrieval improves over BM25 across the main metrics, although the margin is moderate. This indicates that embedding similarity helps identify type and category matches beyond exact lexical overlap.

Dense retrieval is especially useful when the query is short and underspecified. It can connect entity needs to descriptions that use different phrasing, and it can capture broader semantic categories such as films, former republics, architecture, or collections. Remaining errors are likely to involve ambiguous entity types, names that require exact matching, or broad queries with many acceptable results.

Reranking Hybrid Evaluation Profile

The reranking_hybrid candidate set reaches nDCG@10 of 0.5482, hit@10 of 0.9000, and recall@100 of 0.6986. It uses exactly 100 candidates per query, with no rank-101 safeguard rows. The hybrid profile has the best nDCG@10 and recall@100, while dense retrieval has the best hit@10.

This pattern suggests that combining lexical and dense evidence is useful for ranking the broader entity set. BM25 contributes exact names and type terms, while dense retrieval adds semantic category matching. The small nDCG gain over dense indicates that the two signals are complementary, though the lower hit@10 also shows that hybrid ordering can occasionally displace an early obvious positive.

Metric Interpretation for Model Researchers

Because almost every query has many positives, hit@10 alone is insufficient. A model can find one relevant entity early and still fail to rank the relevant set well. nDCG@10 measures first-page density and ordering quality, while recall@100 measures whether the candidate set covers enough relevant entities for downstream reranking.

The method comparison shows a balanced task. BM25 is strong because entity names and categories are lexical. Dense retrieval improves semantic type matching. reranking_hybrid gives the best graded top-10 ranking and recall. This makes NanoDBPedia-th useful for testing entity search systems that combine exact name handling with semantic category understanding.

Query and Relevance Type Tendencies

Queries include entity needs such as Fitzgerald Auto Mall in Chambersburg, a 1994 Alice Munro short-story collection, Roman architecture in Paris, former Yugoslav republics, and films shot in Venice. Relevant documents are DBpedia-style descriptions of entities that satisfy those needs.

The task rewards models that track type constraints and entity attributes. A query may specify a location, work type, author, historical category, or class of films. The right result must satisfy those constraints, not merely mention one of the words.

Representative Failure Modes

Likely failures include retrieving entities with the wrong type, over-ranking famous entities that share a name fragment, missing aliases or translated category names, and failing to cover many positives for broad entity-class queries. BM25 may be too literal, while dense retrieval may blur neighboring entity types. Hybrid systems need careful ordering to preserve both exact names and type semantics.

Training Data That May Help

Useful training data includes entity retrieval, Wikipedia or DBpedia search, multilingual entity linking, alias matching, and type-aware ranking. Thai encyclopedic text can help with translated descriptions and segmentation behavior. Hard negatives should share names, locations, categories, or entity types while violating a key constraint.

Model Improvement Notes

A model targeting this task should combine exact entity-name precision with semantic type awareness. Sparse systems need Thai-aware tokenization and alias normalization. Dense systems should improve constraint tracking for short queries. Hybrid systems are well aligned with the benchmark when the final ranking balances lexical anchors and semantic entity class matching.

Example Data

Query	Positive document
ฟิตซ์เจอรัลด์ออโต้มอลล์แชมเบอร์สเบิร์กเพนซิลเวเนีย [50 chars]	ฟิตซ์เจอรัลด์ ออโต้ มอลล์ เป็นตัวแทนจำหน่ายรถยนต์ที่เป็นของครอบครัวและดำเนินการโดยครอบครัว ซึ่งก่อตั้งขึ้นในปี 1966 โดยมีสถานที่ตั้งแรกเปิดในเบธesda รัฐแมรี่แลนด์ จนถึงปี 2014 ฟิตซ์เจอรัลด์ ออโต้ มอลล์ ได้รับอันดับที่ 59 ในรายชื่อ "กลุ่มตัวแทนจำหน่าย 125 อันดับแรก" ในสหรัฐอเมริกา ซึ่งเผยแพร่ทุกปีโดย Automotive News สถานที่ตัวแทนจำหน่ายฟิตซ์เจอรัลด์ปรากฏใน 2013 WardsAuto e-Dealer 100 ถึงห้าครั้ง ที่อันดับที่ 8, 25, 30, 43 และ 98 [431 chars]
การรวบรวมเรื่องสั้นปี 1994 ของอลิส มุนโร เปิดอยู่ [49 chars]	อลิซ แอน มันโร (/ˈælɨs ˌæn mʌnˈroʊ/, ชื่อเดิม เลดลอว์ /ˈleɪdlɔː/; เกิด 10 กรกฎาคม 1931) เป็นนักเขียนชาวแคนาดา ผลงานของมันโรถูกอธิบายว่าปฏิวัติสถาปัตยกรรมของเรื่องสั้น โดยเฉพาะในแนวโน้มที่จะเคลื่อนที่ไปข้างหน้าและถอยหลังในเวลา เรื่องราวของเธอถูกกล่าวว่า "ฝังมากกว่าประกาศ เผยมากกว่าขบวนพาเหรด" นิยายของมันโรมักตั้งอยู่ในมณฑลฮูรอนบ้านเกิดของเธอในออนแทรีโอทางตะวันตกเฉียงใต้ เรื่องราวของเธอสำรวจความซับซ้อนของมนุษย์ในสไตล์การเขียนที่ไม่ซับซ้อน [440 chars]
สถาปัตยกรรมโรมันในปารีส [23 chars]	ศิลปะในปารีสเป็นบทความเกี่ยวกับวัฒนธรรมและประวัติศาสตร์ศิลปะในปารีส เมืองหลวงของฝรั่งเศส เป็นเวลาหลายศตวรรษที่ปารีสดึงดูดศิลปินจากทั่วโลกให้เดินทางมาที่เมืองนี้เพื่อศึกษาและค้นหาแรงบันดาลใจจากทรัพยากรศิลปะและแกลเลอรีต่างๆ ดังนั้นปารีสจึงมีชื่อเสียงในฐานะ "เมืองแห่งศิลปะ" [271 chars]

Source Reference Table

Item	Reference
Original dataset	DBpedia Entity Retrieval
Retrieval benchmark framing	BEIR
Multilingual benchmark context	MMTEB
NanoBEIR collection	NanoBEIR on Hugging Face
NanoBEIR-th dataset	hakari-bench/NanoBEIR-th

Representative query and positive entity snippets:

Query	Positive document snippet
ฟิตซ์เจอรัลด์ออโต้มอลล์แชมเบอร์สเบิร์กเพนซิลเวเนีย	ฟิตซ์เจอรัลด์ ออโต้ มอลล์ เป็นตัวแทนจำหน่ายรถยนต์ที่เป็นของครอบครัว...
การรวบรวมเรื่องสั้นปี 1994 ของอลิส มุนโร เปิดอยู่	อลิซ แอน มันโร เป็นนักเขียนชาวแคนาดา ผลงานของมันโรถูกอธิบายว่า...
สถาปัตยกรรมโรมันในปารีส	ศิลปะในปารีสเป็นบทความเกี่ยวกับวัฒนธรรมและประวัติศาสตร์ศิลปะในปารีส...
สาธารณรัฐของยูโกสลาเวียเดิม	รัฐธรรมนูญยูโกสลาเวียปี 1974 เป็นรัฐธรรมนูญฉบับที่สี่...
ภาพยนตร์ที่ถ่ายทำในเวนิส	A Little Romance เป็นภาพยนตร์ตลกโรแมนติก Technicolor และ Panavision ของอเมริกา...

Dataset Information

Field	Value
Nano set	MNanoBEIR
Backing dataset	NanoBEIR-th
Task / split	NanoDBPedia
Hugging Face dataset	hakari-bench/NanoBEIR-th
Language	th
Category	natural_language
Queries	50
Documents	6,045
Positive qrels	1,158
Positives / query avg	23.16
Positives / query min	1
Positives / query median	18.00
Positives / query max	81
Multi-positive queries	48 (96.00%)
Query length avg chars	30.92
Document length avg chars	316.44

Candidate Subsets

Profile	Config	nDCG@10	Hit@10	Recall@100	Candidates
BM25	`bm25`	0.5043	0.9200	0.6520	top-500
Dense	`harrier_oss_v1_270m`	0.5468	0.9400	0.6675	top-500
Reranking hybrid	`reranking_hybrid`	0.5482	0.9000	0.6986	top-100