NanoBRIGHT / NanoBrightSustainableLivingLong
Overview
NanoBrightSustainableLivingLong is the long-document Sustainable Living StackExchange slice of NanoBRIGHT. Queries are practical sustainability questions, and relevant documents are full cited pages, reports, official guidance pages, or long environmental references. The task measures whether a retriever can identify the source page containing evidence for a specific environmental decision, even when the answer-bearing section is embedded inside a long document.
Details
What the Original Data Measures
BRIGHT's long-document variants retrieve full source pages rather than split passages. In Sustainable Living, those pages can be environmental reports, product guidance, government pages, energy-program documentation, encyclopedia pages, or practical sustainability articles.
The task retains the practical reasoning nature of the passage-level slice: the model must find evidence for a decision about reuse, energy, carbon, materials, recycling, or environmental risk. Long documents add extra difficulty because pages often cover many related topics and include navigation, legal context, examples, and caveats.
Observed Data Profile
The task contains 108 queries, 551 documents, and 129 relevance judgments. It is mostly single-positive: there are 1.19 positives per query on average, a minimum of 1, a median of 1.0, a maximum of 5, and 15 multi-positive queries, or 13.89% of the set.
Queries average 682.84 characters, while documents average 38,204.30 characters. The corpus has far fewer documents than the passage version, but each candidate may include several sections and broad sustainability vocabulary.
BM25 Evaluation Profile
BM25 reaches nDCG@10 of 0.3277, hit@10 of 0.5000, and recall@100 of 0.8992 using the top-500 BM25 candidate subset. Sparse retrieval has strong recall because long environmental pages contain many query terms, product names, materials, and policy expressions.
The top-rank quality is much weaker than recall. A long page can mention carbon, recycling, energy, pesticides, or water while not containing the evidence needed for the user's specific decision. BM25 is useful for candidate coverage but struggles to order the most supportive source pages first.
Dense Evaluation Profile
The dense harrier-oss-270m run reaches nDCG@10 of 0.5501, hit@10 of 0.7870, and recall@100 of 0.9690. Dense retrieval is the strongest top-ranking profile in this task and greatly improves over BM25 for nDCG@10 and hit@10.
This indicates that semantic matching is highly valuable for long sustainability sources. Dense retrieval can connect a practical question to the page whose overall content supports the relevant decision, even when the page does not repeat the query wording exactly.
Reranking Hybrid Evaluation Profile
The reranking_hybrid candidate set reaches nDCG@10 of 0.4436, hit@10 of 0.6852, and recall@100 of 0.9845. It uses a top-100 candidate range with an optional rank-101 safeguard; this task has 2 safeguard rows, candidate counts from 100 to 101, and a mean of 100.02 candidates.
The hybrid profile has the best recall@100 but does not beat dense retrieval at the top of the ranking. This means the fused pool is excellent for downstream reranking, while dense retrieval alone provides the strongest first-page ordering for this long-document slice.
Metric Interpretation for Model Researchers
This task separates top-rank semantic quality from candidate coverage. BM25 has strong recall because long source pages contain many environmental terms. Dense retrieval is best for nDCG@10 and hit@10 because it captures the practical decision intent. Reranking_hybrid is best for recall@100 and is therefore attractive as a reranker input pool.
Researchers should evaluate whether systems retrieve the page that actually contains evidence for the environmental decision. A page can be about sustainability and still fail to answer the specific question. Long-document models should ideally combine source-page retrieval with section-level evidence extraction.
Query and Relevance Type Tendencies
Queries ask about reusing bacon grease, carbon-reduction upgrades, solar water circulation, neonicotinoid recognition, deposit-label rules, product lifecycle impacts, and household environmental decisions. Positive long documents may be government pages, energy-lab documents, practical guides, encyclopedia pages, or product and program references.
The relevance relation is source-level evidence support. The document is positive because it contains the necessary environmental, technical, or regulatory evidence, not because every section is relevant.
Representative Failure Modes
Likely failures include retrieving a long environmental page with broad vocabulary but no decision evidence, over-ranking a product page that lacks lifecycle or regulatory detail, missing official guidance because the query uses everyday wording, and confusing adjacent impact categories such as energy use, emissions, waste, and toxicity.
BM25 is exposed to long-page term dilution. Dense retrieval can still prefer a semantically plausible source with insufficient evidence. Hybrid retrieval improves coverage but needs reranking to recover dense-like top precision.
Training Data That May Help
Useful training data includes long environmental report retrieval, document-level sustainability QA, cited-source retrieval from environmental forums, and passage-to-full-page supervision that maps evidence spans to source documents.
Synthetic data should generate long environmental pages about products, materials, energy, lifecycle impacts, and policy caveats. Questions should ask practical comparisons or decision criteria. Hard negatives should cover the same product or impact category but fail to answer the exact question.
Model Improvement Notes
Strong systems should combine practical decision understanding with source-page evidence retrieval. Dense retrieval is the best observed first-stage ranker, while reranking_hybrid gives the broadest candidate coverage. A useful production system would retrieve with a hybrid pool and rerank or extract the section that directly supports the decision.
The task is a good probe for whether retrieval models can support environmental advice with evidence rather than merely matching green-living vocabulary.
Example Data
| Query | Positive document |
| More uses for bacon grease We (my family) consume good amounts of bacon which produce a lot of bacon grease. I don't like wasting anything that I could reuse or repurpose, including this. I use this byproduct in many different ways, including: cooking. Filtered it can be used in cooking other foods or greasing the pans. pet food. Mixed with other foods, it is a good addition to the animals' diet. lubricant. Good for certain tools, or snow sleds. candles. Good source of light while camping or in... [500 / 604 chars] |  * Organic Gardening * Fruits * Garden Planning * Garden Tools * Gardening Techniques * Herbs * Ornamentals * Pest Control * Vegetables * Homesteading & Livestock * [ Livestoc... [1,000 / 35,063 chars] |
| Determining carbon reduction vs cost of various home upgrades I've done some amount of upgrades to my house to reduce my overall carbon emissions, and reading online there are all kinds of suggestions for doing even more: Replace my natural gas water heater with electric Put solar panels on the roof Buy wind energy credits Other kinds of carbon offsets Geothermal Don't replace my electric oven with gas (which I had been thinking about, since I hear how great they are for cooking) And of course I... [500 / 2,158 chars] | Skip to main content  Toggle Search Search NREL.gov Search Buildings Menu * Research * Research __ * Building Energy Modeling * Communities & Urban Districts * Extreme Climates * Industrialized Construction * Lighting * Resilient Buildings * Sensors & Controls * Systems Technologies * Thermal Energy Storage * Windows * Workforce Development * Staff * Publications * Publications __ * [ Newsle... [1,000 / 17,158 chars] |
| Forcing water circulation in solar hot water installation I'm planning an installation for heating water using solar "exchanger" panels (solar used to heat water directly, not through electricity). I don't want to bind the reservoir, panel and tap locations to the natural circulation cycle though (hot water traveling up etc). So, in order for this to work, I'd have to force some very slow water circulation between the reservoir and the panels; a pump that takes very little power and provides ver... [500 / 1,071 chars] | Skip to content Username or Email Address Password Remember Me Lost your password? Search for: Search _ _ Search Cart _ _  * Hardware Menu Toggle * Whole Heater Packages * Firebox Doors * Oven Doors * [ Cleanout Doors & Ash Doors ](https://www.firespeaking.co... [1,000 / 19,918 chars] |
Source Reference Table
| Item | Reference |
| Original benchmark paper | BRIGHT |
| Project page | BRIGHT project page |
| Source dataset | xlangai/BRIGHT |
| NanoBRIGHT dataset | hakari-bench/NanoBRIGHT |
Representative query and positive source snippets:
| Query | Positive document snippet |
| What are additional uses for bacon grease instead of wasting it? | A long practical living article discusses reuse of rendered fats and related household applications. |
| Which home upgrades reduce carbon most cost-effectively? | A long NREL-style page describes building energy optimization and efficiency package evaluation. |
| How should water circulate in a solar hot water installation? | A long plumbing guide discusses thermosiphon loop and solar hot water details. |
| How can products containing neonicotinoid pesticides be recognized? | A long reference page explains neonicotinoid products, markets, and pesticide context. |
| Why do some sparkling or mineral water cans not carry deposit labels? | A government environmental page lists beverage deposit categories and exclusions. |
Dataset Information
| Field | Value |
| Nano set | NanoBRIGHT |
| Backing dataset | NanoBRIGHT |
| Task / split | NanoBrightSustainableLivingLong |
| Hugging Face dataset | hakari-bench/NanoBRIGHT |
| Language | en |
| Category | natural_language |
| Queries | 108 |
| Documents | 551 |
| Positive qrels | 129 |
| Positives / query avg | 1.19 |
| Positives / query min | 1 |
| Positives / query median | 1.00 |
| Positives / query max | 5 |
| Multi-positive queries | 15 (13.89%) |
| Query length avg chars | 682.84 |
| Document length avg chars | 38,204.30 |
Candidate Subsets
| Profile | Config | nDCG@10 | Hit@10 | Recall@100 | Candidates |
| BM25 | bm25 | 0.3277 | 0.5000 | 0.8992 | top-500 |
| Dense | harrier_oss_v1_270m | 0.5501 | 0.7870 | 0.9690 | top-500 |
| Reranking hybrid | reranking_hybrid | 0.4436 | 0.6852 | 0.9845 | top-100 |
Training and Leakage Metadata
- Original train split: unknown
- Evaluation split origin: BRIGHT Sustainable Living long-document evaluation split
- Train/eval overlap audit: not_audited
- Leakage note: exclude NanoBRIGHT SustainableLivingLong queries and full cited source pages
- Multi-positive training: multi_positive_objective
- Useful training data: long environmental report retrieval, document-level sustainability QA, cited-source retrieval from environmental forums