NanoMTEB-Misc / 2022_fa

Overview

2022_fa is the Persian news retrieval split of the TREC 2022 NeuCLIR task. Queries are Persian information needs, and documents are Persian news articles drawn from a NeuCLIR hard-negative retrieval pool. The Nano split contains 45 queries, 8,882 documents, and 1,131 positive qrels. It is highly multi-positive: queries have 25.13 positives on average, the median is 12, and 93.33% of queries have more than one positive. Queries average 83.13 characters, while documents average 2,818.76 characters. This task is useful for evaluating Persian topical news retrieval, broad event coverage, and ranking over many relevant long-form articles.

Details

What the Original Data Measures

Overview of the TREC 2022 NeuCLIR Track describes NeuCLIR as a neural cross-language information retrieval benchmark over Chinese, Persian, and Russian news. The original TREC setting used English topics to retrieve documents in the target language, with translated topic variants and pooled relevance judgments. This Nano split uses Persian topic text against Persian documents, so it measures monolingual Persian news retrieval within the NeuCLIR topic and judgment structure.

NeuCLIR topics are not factoid questions. They describe information needs such as finding articles about domestic vaccine production, AI in agriculture, or health risks. A relevant set can include many articles covering the same event or issue from different angles.

Observed Data Profile

The split has 45 Persian queries, 8,882 documents, and 1,131 positive judgments. Documents are long Persian news articles with headlines and article bodies. The large positive pools make this a many-relevant-documents ranking task rather than a single-answer retrieval task.

Examples include articles about calcium-rich vegetables, Iran's COVIran Barekat vaccine, AI in agriculture, China's 5G role in Russia, and myopia among Chinese students. Queries are user information needs, often phrased as "I am looking for articles..." rather than short keyword queries.

BM25 Evaluation Profile

BM25 reaches nDCG@10 of 0.2600, hit@10 of 0.5778, and recall@100 of 0.5208. Lexical matching helps when topic terms, named entities, or domain phrases are repeated in the article text. However, BM25 is weak at top-10 ranking because the relevant set is broad and many long news articles share event vocabulary without necessarily being judged relevant.

The recall@100 value also shows that BM25 misses many positives even though there are many relevant documents per query. Sparse matching is not enough for semantic event coverage across long Persian news articles.

Dense Evaluation Profile

Dense retrieval is the strongest profile, with nDCG@10 of 0.4915, hit@10 of 0.9556, and recall@100 of 0.6897. The dense model is much better at connecting Persian information needs to relevant article semantics, especially when the article wording differs from the topic statement or covers the same issue from a different angle.

This task is therefore a useful Persian dense-retrieval diagnostic. Improvements should reflect better topical and event-level semantic matching, not only keyword overlap. The long document length also rewards models that can represent article-level meaning beyond the headline.

Reranking Hybrid Evaluation Profile

The reranking_hybrid profile reaches nDCG@10 of 0.4138, hit@10 of 0.8444, and recall@100 of 0.6490. It improves substantially over BM25 and approaches dense recall, but dense retrieval remains stronger at top-10 and top-100. The hybrid lists contain 100 to 101 candidates, with one safeguard-positive row.

This pattern suggests that the lexical component adds useful anchors, but the dominant signal is semantic news-topic similarity. Hybrid search is a reasonable reranking pool, but dense-only candidates are cleaner for this split.

Metric Interpretation for Model Researchers

2022_fa is dense-favorable. BM25 is limited by long documents and broad topic-event relevance, dense retrieval is strongest, and reranking_hybrid sits between them. Because the task is highly multi-positive, hit@10 only tells whether any relevant article appears early; recall@100 is important for how much of the relevant event cluster is exposed.

nDCG@10 measures whether highly useful relevant articles are ranked early. Models should not be judged as single-answer retrievers here: many documents may be relevant to the same information need.

Query and Relevance Type Tendencies

Queries are Persian topical news information needs. Positive documents are Persian news articles that satisfy the topic, often with multiple relevant articles per query. Relevance can depend on event angle, entity, country, policy issue, or health/science topic.

The task rewards broad topical recall and strong early ranking across news clusters. It is closer to ad hoc news search than QA retrieval.

Representative Failure Modes

BM25 can over-rank articles with shared named entities or repeated topic words but the wrong event angle. Dense retrieval can retrieve semantically related articles that are not judged relevant or miss narrow named-entity constraints. Hybrid retrieval can improve coverage while still mixing near-topic articles with true positives.

Long article bodies can dilute both sparse and dense signals, especially when the relevant information is concentrated in a headline or a few paragraphs.

Training Data That May Help

Useful training data includes Persian news retrieval pairs, NeuCLIR-style topic-document pairs, multilingual CLIR corpora, and hard negatives from the same event, entity, or news category. Training should avoid NeuCLIR evaluation topics, qrels, and article pools that overlap with the Nano split.

Synthetic data should generate Persian information needs from non-evaluation news clusters and pair each topic with several relevant articles. Hard negatives should share key entities but cover a different angle or event.

Model Improvement Notes

Models should represent Persian topical intent, named entities, and event semantics across long news articles. Dense encoders should be trained on multi-positive news clusters. Rerankers should compare whether an article actually satisfies the topic, not only whether it shares headline terms.

Example Data

Query	Positive document
من علاقه مند به یافتن مقالاتی هستم که سبزیجات سرشار از کلسیم را معرفی می کنند [77 chars]	۵ خوراکی سرشار از کلسیم را بشناسید کد مطلب: 474072 نصف فنجان از انجیر خشک شده و خام مقدار بالایی کلسیم در حدود ۱۲۰ میلی گرم دارد که شامل ۱۲ درصد ارزش روزانه می شود. انجیر خشک همچنين داراي فسفر است. فسفر در شكل‌گيري استخوان‌ها يا جوش خوردن آنها پس از آسيب‌ديدگي يا شكستگي موثر است.کلسیم از مهم‌ترین مواد معدنی موجود در بدن است که به‌طـور طبیـعی می‌توان آن را از انـواع مختـلف غذاها، نوشیدنی‌ها و… دریافت کرد. امـا نکته حائزاهمیت‌ آن است که این ماده معدنی به حفظ عروق خونی سالم در بدن، تنظیم فشارخون و حتی پیشگیری از مقاومت به انسولین (ابتلا به دیابت نوع ۲) نیز کمک می‌کند. اگر می‌خواهید با ۵ خوراکی سرشار از کلسیم غیرلبنی آشنا شوید این مقاله را به نقل از سایت بازده بخوانید و تصویری که در پایان مقاله آمده است را با دوستان و خانواده‌تان به اشتراک بگذارید.۱. آلو بخاراآلو بخارا به عنوان یک میوه خشک باعث غنی شدن ذخایر کلاژن بدن می‌شود که بدن را قوی و استخوان‌ها را نرم نگه می‌دارد. کربوهیدرات‌های انرژی‌زا در این میوه خشک عملکرد را بهبود می‌بخشد و سلامت استخوان‌ها و قدرت بدنی را بالا می‌برد.۲. برگه زر... [1,000 / 1,918 chars]
دنبال مقالاتی هستم که در مورد تولید بومی واکسن کووید کوویران برکت توسط ایران گزارش دهند. [88 chars]	واکسن مؤثر و ایمن «کوو ایران برکت» خیال مردم را راحت می‌کند واکسن مؤثر و ایمن «کوو ایران برکت» خیال مردم را راحت می‌کند مدیر گروه تحقیقات تولید واکسن ستاد اجرایی فرمان امام(ره): ۹ کشور تاکنون متقاضی خرید این واکسن هستند به گزارش ایرنا، مینو محرز روز گذشته در نشست خبری مرحله سوم تست انسانی واکسن کووایران برکت افزود: نخستین واکسن ایران به نام کوو ایران برکت ساخته شده و ویروس آن از بیمار اردبیلی گرفته شده و مراحل حیوانی، فازهای اول و دوم را با موفقیت طی کرده است.این متخصص عفونی ادامه داد: از صفر تا ۱۰۰ این واکسن در داخل کشور و توسط محققان ایرانی انجام شده است. محرز با تاکید بر اینکه باید کمک کنیم تا مردم هرچه زودتر واکسن کرونا بزنند، گفت: از مهر ماه امسال با زدن واکسن کرونا باید مدارس و دانشگاه‌ها بازگشایی شوند. وی با بیان اینکه واکسن ایرانی کوو ایران برکت مصون و موثر است، اظهار امیدواری کرد که هرچه زودتر واکسن در اختیار مردم قرار گیرد. محرز افزود: با این کار هم به درصد بالایی از واکسیناسیون جامعه می‌رسیم تا کارها تقریبا به روال عادی برگردد.وی تاکید کرد که اکنون در دنیا با کمبود واکسن کرو... [1,000 / 7,035 chars]
من به دنبال مقالاتی در مورد چگونگی استفاده از هوش مصنوعی در کشاورزی هستم. [73 chars]	کاربرد اینترنت اشیا در کشاورزی هوشمند نسخه الکترونیکی شماره اول از سری پنجم نشریه «صنعت سبز نوین»؛ فصلنامه انجمن علمی دانشجویی گروه مهندسی ماشین‌های کشاورزی دانشگاه تهران منتشر شد. به گزارش باشگاه دانشجویان ایسنا، در این شماره "مصاحبه با کارآفرین جوان رشته مهندسی مکانیک بیوسیستم"، "ذخیره‌سازی انرژی‌های تجدیدپذیر در کشاورزی"، "گلخانه‌های هوشمند"، "بررسی انواع روش‌های فیلتراسیون و کاربرد آن‌ها"، "کاربرد اینترنت اشیا در کشاورزی هوشمند" و "معرفی شبیه‌سازی تزریق پلیمر به کمک نرم افزار اتودسک مولدفلو" از جمله مطالب و مقالاتی است که در این شماره می خوانیم. نشریه «صنعت سبز نوین» به صاحب امتیازی انجمن علمی دانشجویی مهندسی ماشین‌های کشاورزی دانشگاه تهران و مدیر مسئولی محمد قوشچیان و زیر نظر شورای سردبیری، به صورت فصلنامه منتشر شده است. علاقه‌مندان می‌توانند برای مطالعه نشریه به سامانه sanatsabzsj.ut.ac.ir مراجعه کنند. انتهای پیام [836 chars]

Source Reference Table

Title	Year	Type	URL
Overview of the TREC 2022 NeuCLIR Track	2023	Benchmark paper	https://arxiv.org/abs/2304.12367
NeuCLIR official site	2022	Project page	https://neuclir.github.io/
mteb/NeuCLIR2022RetrievalHardNegatives	2025	Dataset card	https://huggingface.co/datasets/mteb/NeuCLIR2022RetrievalHardNegatives

Dataset Information

Field	Value
Nano set	NanoMTEB-Misc
Backing dataset	NanoMTEB-Misc
Task / split	2022_fa
Hugging Face dataset	hakari-bench/NanoMTEB-Misc
Language	fa
Category	natural_language
Queries	45
Documents	8,882
Positive qrels	1,131
Positives / query avg	25.13
Positives / query min	1
Positives / query median	12.00
Positives / query max	100
Multi-positive queries	42 (93.33%)
Query length avg chars	83.13
Document length avg chars	2,818.76

Candidate Subsets

Profile	Config	nDCG@10	Hit@10	Recall@100	Candidates
BM25	`bm25`	0.2600	0.5778	0.5208	top-500
Dense	`harrier_oss_v1_270m`	0.4915	0.9556	0.6897	top-500
Reranking hybrid	`reranking_hybrid`	0.4138	0.8444	0.6490	top-100