NanoJMTEB-v2
Overview
NanoJMTEB-v2 is a compact Japanese retrieval group derived from JMTEB, MTEB, and related Japanese datasets. It covers Japanese casual web search, government FAQ matching, quiz-to-entity retrieval, answer-label retrieval, MIRACL and Mr. TyDi passage retrieval, long-document retrieval, and four Japanese NLP Journal paper-component matching tasks.
The group is useful because it is not simply Japanese passage retrieval. Some tasks retrieve short answer labels, some retrieve noisy web snippets or FAQ answers, some retrieve Wikipedia-like passages or full entity pages, and some match titles or abstracts to academic paper sections. BM25 exposes Japanese term and entity anchoring, dense retrieval tests semantic passage and label matching, and reranking_hybrid indicates whether sparse and dense retrieval recover complementary candidates.
What This Group Measures
JMTEB and MTEB define Japanese retrieval tasks for embedding evaluation. This Nano group collects several Japanese retrieval sources: JaCWIR, JaGovFaqs, JAQKET, Mintaka Japanese, MIRACL, Mr. TyDi, MultiLongDocRetrieval, and Japanese NLP Journal paper-component matching tasks.
The group measures Japanese retrieval robustness across target types. A relevant document may be a title/description snippet, an official FAQ answer, a Wikipedia entity page, a short answer label, an answer-bearing passage, a full long article, or an academic paper section. A model that works well on one of these surfaces may not work on the others.
Task Families
- Web and FAQ retrieval:
ja_cwirandja_gov_faqsretrieve web snippets or government FAQ answers. - Entity and answer-label retrieval:
jaqketretrieves entity pages from quiz clues, whilemintaka_jaretrieves short answer labels. - Japanese passage retrieval:
miracl_jaandmr_tidy_japaneseretrieve answer-bearing Japanese passages with some multi-positive qrels. - Long-document retrieval:
multi_long_doc_jaretrieves long documents from generated questions. - Academic component matching: the four
nlpjournal_*tasks link titles, abstracts, introductions, and full articles from Japanese NLP papers.
Dataset Shape
NanoJMTEB-v2 contains 11 task pages, 2,200 queries, 64,140 split-local documents, and 2,432 positive qrel rows. Each split has 200 queries. Most tasks are single-positive; miracl_ja and mr_tidy_japanese contain multi-positive queries.
Text length varies widely. mintaka_ja documents are short answer labels, while multi_long_doc_ja and nlpjournal_abs_article contain very long documents. The NLP Journal abstract queries are much longer than web, FAQ, entity, or passage queries. This makes the group a mix of short Japanese search, entity inference, evidence passage retrieval, and long academic-document matching.
Retrieval Behavior
BM25 Profile
BM25 is very strong on the Japanese NLP Journal component tasks and on ja_cwir, where exact technical terms, titles, web keywords, and shared paper vocabulary are highly informative. It is also strong on jaqket, ja_gov_faqs, and long-document retrieval. These tasks often preserve Japanese surface anchors that sparse retrieval can exploit.
BM25 is weaker on mintaka_ja, where the target is a short answer label, and on MIRACL/Mr. TyDi passage retrieval, where short factual questions may not repeat the evidence passage wording. It remains useful, but semantic inference and answerability become more important.
Dense Profile
Dense retrieval is strongest on many passage and answer-oriented tasks: ja_gov_faqs, mintaka_ja, miracl_ja, and mr_tidy_japanese all benefit from embedding similarity. It connects short Japanese questions to passages or labels that express the requested answer without exact overlap.
Dense retrieval is not always best. For the NLP Journal tasks and multi_long_doc_ja, BM25 can outperform dense because titles, abstracts, and long articles share distinctive technical terms. That makes this group useful for testing whether dense models preserve exact Japanese terminology.
Reranking Hybrid Profile
reranking_hybrid is best on ja_gov_faqs and jaqket, and is competitive across many other tasks. These are cases where exact Japanese terms and semantic matching both matter: an FAQ answer or entity page may contain the right keywords, but ranking still needs intent or entity disambiguation.
For reranker experiments, the hybrid profile is most useful on tasks where BM25 and dense disagree. It provides a safer candidate pool for FAQ, entity, and passage retrieval than either signal alone.
Task Summary
| Task | Retrieval shape | Queries | Docs | Positives | BM25 nDCG@10 | Dense nDCG@10 | Reranking hybrid nDCG@10 | Best profile |
| ja_cwir | generated web question to page snippet | 200 | 10,000 | 200 | 0.9181 | 0.8367 | 0.8810 | BM25 |
| ja_gov_faqs | government FAQ question to answer | 200 | 10,000 | 200 | 0.7196 | 0.7487 | 0.7614 | Reranking hybrid |
| jaqket | quiz clue to entity page | 200 | 10,000 | 200 | 0.7837 | 0.7830 | 0.7876 | Reranking hybrid |
| mintaka_ja | complex question to answer label | 200 | 1,592 | 200 | 0.2561 | 0.3687 | 0.3354 | Dense |
| miracl_ja | Japanese question to Wikipedia passage | 200 | 10,000 | 373 | 0.5361 | 0.6923 | 0.6252 | Dense |
| mr_tidy_japanese | Japanese question to Mr. TyDi passage | 200 | 10,000 | 259 | 0.5518 | 0.7399 | 0.6633 | Dense |
| multi_long_doc_ja | generated question to long article | 200 | 10,000 | 200 | 0.5929 | 0.3956 | 0.5008 | BM25 |
| nlpjournal_abs_article | abstract to full article | 200 | 637 | 200 | 0.9982 | 0.9763 | 0.9863 | BM25 |
| nlpjournal_abs_intro | abstract to introduction | 200 | 637 | 200 | 0.9896 | 0.9553 | 0.9545 | BM25 |
| nlpjournal_title_abs | title to abstract | 200 | 637 | 200 | 0.9526 | 0.9290 | 0.9428 | BM25 |
| nlpjournal_title_intro | title to introduction | 200 | 637 | 200 | 0.9132 | 0.8632 | 0.8704 | BM25 |
Interpretation Notes for Model Researchers
NanoJMTEB-v2 is best read by target format. Academic component matching is lexically strong because titles, abstracts, and papers share technical terms. FAQ, entity, Mintaka, MIRACL, and Mr. TyDi tasks require more semantic matching or entity inference. A single Japanese average will hide those differences.
The BM25-heavy tasks are not trivial; they show that exact Japanese terminology and paper vocabulary matter. Dense-led tasks show where answerability, short-label matching, or passage semantics matter more. Hybrid-led tasks show where exact entity anchors and semantic disambiguation both contribute.
Training and Leakage Notes
Useful training data includes Japanese search logs, government FAQ pairs, Wikipedia passage retrieval, JAQKET-style quiz/entity data, Mintaka-style complex QA, long-document question generation, and Japanese academic paper component matching. Hard negatives should come from same entities, same government programs, same article families, or same research subfields.
Exclude NanoJMTEB-v2 evaluation queries, positives, qrels, paper sections, answer labels, and direct synthetic variants. Upstream JMTEB, MTEB, JAQKET, Mintaka, MIRACL, Mr. TyDi, MLDR, and NLP Journal evaluation examples should be audited before use in training.
Source Reference Table
| Source | Year | Type | URL |
| MTEB: Massive Text Embedding Benchmark | 2022 | paper | https://arxiv.org/abs/2210.07316 |
| MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages | 2023 | paper | https://aclanthology.org/2023.tacl-1.63/ |
| Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval | 2021 | paper | https://arxiv.org/abs/2108.08787 |
| Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering | 2022 | paper | https://aclanthology.org/2022.coling-1.138/ |
Metadata Summary
| Field | Value |
| Task pages | 11 |
| Queries | 2,200 |
| Split-local documents | 64,140 |
| Positive qrels | 2,432 |
| Languages | ja |
| Categories | natural_language |
| Positives / query avg | 1.11 |
Task Metadata Summary
| Task | Backing dataset | Lang | Category | Queries | Docs | Positives | BM25 nDCG@10 | Dense nDCG@10 | Reranking hybrid nDCG@10 | Best profile |
| ja_cwir | NanoJMTEB-v2 | ja | natural_language | 200 | 10,000 | 200 | 0.9181 | 0.8367 | 0.8810 | BM25 |
| ja_gov_faqs | NanoJMTEB-v2 | ja | natural_language | 200 | 10,000 | 200 | 0.7196 | 0.7487 | 0.7614 | Reranking hybrid |
| jaqket | NanoJMTEB-v2 | ja | natural_language | 200 | 10,000 | 200 | 0.7837 | 0.7830 | 0.7876 | Reranking hybrid |
| mintaka_ja | NanoJMTEB-v2 | ja | natural_language | 200 | 1,592 | 200 | 0.2561 | 0.3687 | 0.3354 | Dense |
| miracl_ja | NanoJMTEB-v2 | ja | natural_language | 200 | 10,000 | 373 | 0.5361 | 0.6923 | 0.6252 | Dense |
| mr_tidy_japanese | NanoJMTEB-v2 | ja | natural_language | 200 | 10,000 | 259 | 0.5518 | 0.7399 | 0.6633 | Dense |
| multi_long_doc_ja | NanoJMTEB-v2 | ja | natural_language | 200 | 10,000 | 200 | 0.5929 | 0.3956 | 0.5008 | BM25 |
| nlpjournal_abs_article | NanoJMTEB-v2 | ja | natural_language | 200 | 637 | 200 | 0.9982 | 0.9763 | 0.9863 | BM25 |
| nlpjournal_abs_intro | NanoJMTEB-v2 | ja | natural_language | 200 | 637 | 200 | 0.9896 | 0.9553 | 0.9545 | BM25 |
| nlpjournal_title_abs | NanoJMTEB-v2 | ja | natural_language | 200 | 637 | 200 | 0.9526 | 0.9290 | 0.9428 | BM25 |
| nlpjournal_title_intro | NanoJMTEB-v2 | ja | natural_language | 200 | 637 | 200 | 0.9132 | 0.8632 | 0.8704 | BM25 |