NanoMedical / NanoNFCorpus

Overview

NanoMedical / NanoNFCorpus is an English consumer-health to biomedical-evidence retrieval task derived from NFCorpus. Queries are very short lay health topics, food names, disease labels, acronyms, or nutrition-related questions, and documents are PubMed or PubMed Central-style article titles and abstracts. The original NFCorpus dataset was built from NutritionFacts.org pages and the research articles cited by those pages, making it a benchmark for bridging consumer health language to technical biomedical literature. This Nano split is heavily multi-positive, so it evaluates both top-rank evidence quality and broad retrieval of many related abstracts.

Details

What the Original Data Measures

NFCorpus measures medical information retrieval where non-expert health topics must retrieve scientific articles. The original dataset uses links from NutritionFacts.org to cited research articles, with relevance signals derived from direct links, indirect links, and topic or tag relations.

This task is not medical FAQ answer retrieval. A query such as a food item, supplement, acronym, or health claim may correspond to many biomedical abstracts whose relevance depends on the cited evidence relationship, not only on shared words.

Observed Data Profile

The Nano split contains 200 queries, 3,593 documents, and 3,718 positive qrel rows. Queries have 18.59 positives on average, with a median of 9 and a maximum of 97. There are 160 multi-positive queries, or 80.0% of the set. Queries average only 17.15 characters, while documents average 1,589.52 characters.

The examples include short topics such as avocados, grapes, Dr. Walter Willett, chlorophyll, and Native Americans. Documents are long technical abstracts with study background, methods, or biomedical mechanisms. This creates a large lexical and register gap between query and document.

BM25 Evaluation Profile

The BM25 candidate subset uses top-500 candidates and reaches nDCG@10 of 0.2921, hit@10 of 0.6200, and recall@100 of 0.2066. BM25 can help when short queries contain exact biomedical terms, distinctive foods, or acronyms. However, recall is low because each query can have many relevant abstracts and because consumer phrasing may not appear in the abstract.

Sparse retrieval often reaches the right vocabulary cluster but misses the citation or health-claim relation. It can also over-rank abstracts that repeat a food or disease term while studying a different outcome, population, or mechanism.

Dense Evaluation Profile

The dense candidate subset from harrier_oss_v1_270m uses top-500 candidates and reaches nDCG@10 of 0.3070, hit@10 of 0.6150, and recall@100 of 0.2633. Dense retrieval improves nDCG@10 and recall@100 over BM25, although hit@10 is slightly lower.

This indicates that semantic matching helps bridge lay health topics and technical abstracts, but the task remains difficult. Many relevant documents are related through citation context, nutrition claims, or evidence roles that a general embedding model may not capture fully.

Reranking Hybrid Evaluation Profile

The reranking_hybrid subset uses top-100 candidates, with 48 queries carrying a rank-101 safeguard positive. It reaches nDCG@10 of 0.3182, hit@10 of 0.6500, and recall@100 of 0.2604. Hybrid retrieval gives the best nDCG@10 and hit@10, while dense retrieval has slightly better recall@100.

The profile suggests that sparse and dense signals are complementary: exact food names and acronyms matter, but semantic evidence matching is also needed. A reranker should benefit from the hybrid pool if it can judge biomedical relevance rather than only topic overlap.

Metric Interpretation for Model Researchers

Because 80.0% of queries are multi-positive and some have dozens of positives, recall@100 is a key metric. Hit@10 only shows whether at least one related abstract is found. nDCG@10 measures whether the first page of results contains highly relevant evidence.

The low recall values are not surprising given the many-positive structure and short queries. This task is better understood as consumer-health evidence retrieval than as ordinary question answering.

Query and Relevance Type Tendencies

Queries are short lay topics, food names, health concerns, public-health labels, or acronyms. Relevant documents are biomedical abstracts. The relation may reflect a cited evidence link from a NutritionFacts.org page, not direct answer wording.

The relevance relation is evidence or citation affinity between a health topic and biomedical literature.

Representative Failure Modes

Common failures include over-matching exact food or disease names, missing abstracts that use technical terminology for a lay topic, retrieving same-topic studies with different outcomes, and failing on ambiguous acronyms. Dense models may retrieve broad health-related abstracts; sparse models may miss semantically related evidence with little term overlap.

Training Data That May Help

Useful training data includes non-overlapping consumer-health to biomedical retrieval pairs, nutrition article citation links, biomedical abstract retrieval with lay queries, and hard negatives from the same food, disease, exposure, or mechanism. Training should exclude overlapping NFCorpus test queries, source NutritionFacts page links, and positive PubMed or PMC qrels for clean evaluation.

Model Improvement Notes

Models should learn both exact biomedical terminology and lay-to-technical bridging. Citation-informed training and hard negatives that share topic but differ in outcome or population are likely valuable. Multi-positive training is important because each query may have many evidence-bearing abstracts.

Example Data

Query	Positive document
avocados [8 chars]	Role of insulin in the pathogenesis of free fatty acid-induced insulin resistance in skeletal muscle. Insulin resistance is a pathophysiological link of obesity to type 2 diabetes. The initial cause of insulin resistance is critical for prevention and treatment of type 2 diabetes. Lipotoxicity is a well-known concept in the explanation of initiation of insulin resistance. Although there are several prevailing hypotheses about the cellular/molecular mechanisms of lipotoxicity, such as inflammation, oxidative stress, hyperinsulinemia, and ER stress, the relative importance of these hypothesized events remains to be determined. The role of hyperinsulinemia is relatively under documented in the literature for the initiation of insulin resistance. In this review, an interaction of fatty acid and beta-cells, and a synergy between free fatty acids (FFAs) and insulin are emphasized for the role of hyperinsulinemia. This article presents the evidence about FFA-induced insulin secretion in vitro... [1,000 / 1,694 chars]
grapes [6 chars]	A berry thought-provoking idea: the potential role of plant polyphenols in the treatment of age-related cognitive disorders. Today, tens of millions of elderly individuals worldwide suffer from dementia. While the pathogenesis of dementia is complex and incompletely understood, it may be, at least to a certain extent, the consequence of systemic vascular pathology. The metabolic syndrome and its individual components induce a proinflammatory state that damages blood vessels. This condition of chronic inflammation may damage the vasculature of the brain or be directly neurotoxic. Associations have been established between the metabolic syndrome, its constituents and dementia. A relationship has also been observed between certain dietary factors, such as constituents of the 'Mediterranean diet', and the metabolic syndrome; similar associations have been noted between these dietary factors and dementia. Fruit juices and extracts are under investigation as treatments for cognitive impairme... [1,000 / 1,862 chars]
Dr. Walter Willett [18 chars]	Coconut oil predicts a beneficial lipid profile in pre-menopausal women in the Philippines Coconut oil is a common edible oil in many countries, and there is mixed evidence for its effects on lipid profiles and cardiovascular disease risk. Here we examine the association between coconut oil consumption and lipid profiles in a cohort of 1,839 Filipino women (age 35–69 years) participating in the Cebu Longitudinal Health and Nutrition Survey, a community based study in Metropolitan Cebu City. Coconut oil intake was measured as individual coconut oil intake calculated using two 24-hour dietary recalls (9.54 ± 8.92 grams). Cholesterol profiles were measured in plasma samples collected after an overnight fast. Mean lipid values in this sample were total cholesterol (TC) (186.52 ± 38.86 mg/dL), high density lipoprotein cholesterol (HDL-c) (40.85 ± 10.30 mg/dL), low density lipoprotein cholesterol (LDL-c) (119.42 ± 33.21 mg/dL), triglycerides (130.75 ± 85.29 mg/dL) and the TC/HDL ratio (4.80... [1,000 / 1,360 chars]

Source Reference Table

Title	Year	Type	URL
A Full-Text Learning to Rank Dataset for Medical Information Retrieval	2016	ECIR paper PDF	https://www.cl.uni-heidelberg.de/~sokolov/pubs/boteva16full.pdf
NFCorpus: A Full-Text Learning to Rank Dataset for Medical Information Retrieval	2016	official dataset page	https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/
A Full-Text Learning to Rank Dataset for Medical Information Retrieval	2016	DOI	https://doi.org/10.1007/978-3-319-30671-1_58
mteb/nfcorpus		dataset card	https://huggingface.co/datasets/mteb/nfcorpus

Dataset Information

Field	Value
Nano set	NanoMedical
Backing dataset	NanoMedical
Task / split	NanoNFCorpus
Hugging Face dataset	hakari-bench/NanoMedical
Language	en
Category	natural_language
Queries	200
Documents	3,593
Positive qrels	3,718
Positives / query avg	18.59
Positives / query min	1
Positives / query median	9.00
Positives / query max	97
Multi-positive queries	160 (80.00%)
Query length avg chars	17.15
Document length avg chars	1,589.52

Candidate Subsets

Profile	Config	nDCG@10	Hit@10	Recall@100	Candidates
BM25	`bm25`	0.2921	0.6200	0.2066	top-500
Dense	`harrier_oss_v1_270m`	0.3070	0.6150	0.2633	top-500
Reranking hybrid	`reranking_hybrid`	0.3182	0.6500	0.2604	top-100

Training and Leakage Metadata

Original train split: available
Evaluation split origin: NFCorpus retrieval split sampled into NanoMedical
Train/eval overlap audit: not_audited
Leakage note: exclude overlapping NFCorpus test queries, source NutritionFacts page links, and positive PubMed/PMC abstract qrels for clean evaluation
Multi-positive training: support many positives per query and graded or citation-derived relevance
Useful training data: non-overlapping consumer-health to biomedical retrieval pairs, nutrition article citation links, biomedical abstract retrieval with layperson queries, hard negatives from the same food, disease, exposure, or mechanism