NanoLaw
Overview
NanoLaw is a compact legal retrieval group spanning English, German, and Chinese legal data. It includes Indian precedent and statute retrieval, German legal passage and QA retrieval, Chinese criminal-case retrieval, LegalBench-derived consumer-contract and corporate-lobbying retrieval, and plain-English contract-summary retrieval.
The group is useful because legal retrieval is not one search pattern. Some tasks map long fact scenarios to cases or statutes. Others match contract questions to clauses, bill descriptions to summaries, German questions to judgments, or Chinese criminal cases to related cases. A model can be topically close and still be wrong if it misses jurisdiction, legal role, statutory function, contract obligation, procedural posture, or case analogy. BM25, dense retrieval, and reranking_hybrid expose different legal matching behaviors.
What This Group Measures
NanoLaw draws from several legal NLP resources rather than a single benchmark. AILA-style tasks measure Indian legal assistance retrieval. GerDaLIR and LegalQuAD measure German legal information access. LeCaRDv2 measures Chinese legal case retrieval. LegalBench contributes consumer-contract and corporate lobbying tasks, and LegalSummarization turns contract simplification into summary-to-clause retrieval.
The shared measurement target is legal relevance. The positive document must support the requested legal relation, not merely share broad topic terms. That relation can be precedent analogy, statutory applicability, contractual right, legislative policy match, or related-case reasoning.
Task Families
- Scenario to law:
NanoAILACasedocsandNanoAILAStatutesmap long Indian legal fact patterns to cases or statutory provisions. - Long legal document retrieval:
NanoGerDaLIRSmall,NanoLegalQuAD, andNanoLeCaRDv2retrieve German or Chinese legal decisions and related cases. - Contract retrieval:
NanoLegalBenchConsumerContractsQAandNanoLegalSummarizationretrieve clauses or snippets for consumer questions and simplified contract summaries. - Legislative retrieval:
NanoLegalBenchCorporateLobbyingretrieves bill titles or summaries from policy descriptions.
Dataset Shape
NanoLaw contains 8 task pages, 1,259 queries, 15,142 split-local documents, and 5,488 positive qrel rows. Relevance density varies sharply. AILA, LeCaRDv2, and LegalSummarization are multi-positive, while the LegalBench and LegalQuAD tasks are single-positive. NanoLeCaRDv2 dominates the qrel count with many related cases per query.
The text profile is broad. AILA and LeCaRDv2 queries are long legal narratives. German legal documents can average around 19,000 characters. Contract and legislative tasks are shorter but require precise clause or bill matching. This makes NanoLaw both a legal reasoning benchmark and a long-document retrieval benchmark.
Retrieval Behavior
BM25 Profile
BM25 is strongest when legal formulas, bill phrases, German legal terms, or contract keywords repeat directly. It leads on NanoGerDaLIRSmall and NanoLegalQuAD, and is very strong on corporate lobbying. This reflects the importance of exact legal vocabulary, citations, and statutory phrasing.
BM25 is weaker on AILA scenario-to-law tasks because long fact patterns imply statutory or precedent relevance without necessarily repeating the authority's language. It can also over-rank contract or case documents that share topic words but miss the decisive legal relation.
Dense Profile
Dense retrieval helps with legal paraphrase and fact-to-authority mapping. It improves both AILA tasks, consumer-contract QA, corporate lobbying, and LegalSummarization. Dense retrieval is especially useful when the query is in plain English or factual narrative form and the target is written in legal or contractual language.
Dense retrieval is not always enough. German long-document tasks show that exact legal terminology can outperform broad semantic matching. Legal retrieval often requires preserving precise words, names, sections, and citations.
Reranking Hybrid Profile
reranking_hybrid is best on NanoLeCaRDv2, NanoLegalBenchConsumerContractsQA, and NanoLegalSummarization. These tasks benefit from combining exact legal terms with semantic or analogical matching. Hybrid is also useful where a reranker needs candidate diversity from both sparse and dense retrieval.
For reranking, multi-positive legal tasks should be read with Recall@100 in mind. A system that retrieves one plausible case or clause may still miss other valid authorities.
Task Summary
| Task | Retrieval focus | Lang | Queries | Docs | Positives | BM25 nDCG@10 | Dense nDCG@10 | Reranking hybrid nDCG@10 | Best profile |
| NanoAILACasedocs | legal fact pattern to precedent case | en | 50 | 186 | 195 | 0.2805 | 0.4003 | 0.3667 | Dense |
| NanoAILAStatutes | legal fact pattern to statute | en | 50 | 82 | 217 | 0.2070 | 0.2711 | 0.2564 | Dense |
| NanoGerDaLIRSmall | German legal passage to judgment | de | 200 | 9,969 | 235 | 0.5911 | 0.2405 | 0.4287 | BM25 |
| NanoLeCaRDv2 | Chinese criminal case to related cases | zh | 159 | 3,795 | 3,896 | 0.6528 | 0.6940 | 0.7225 | Reranking hybrid |
| NanoLegalBenchConsumerContractsQA | contract question to clause | en | 200 | 153 | 200 | 0.7556 | 0.7785 | 0.8054 | Reranking hybrid |
| NanoLegalBenchCorporateLobbying | policy description to bill summary | en | 200 | 319 | 200 | 0.8955 | 0.9108 | 0.9068 | Dense |
| NanoLegalQuAD | German legal question to judgment | de | 200 | 200 | 200 | 0.7420 | 0.5819 | 0.7043 | BM25 |
| NanoLegalSummarization | plain-English summary to contract snippet | en | 200 | 438 | 345 | 0.5678 | 0.5861 | 0.6085 | Reranking hybrid |
Interpretation Notes for Model Researchers
NanoLaw should be interpreted by jurisdiction and legal relation. English contract retrieval, Indian scenario-to-law retrieval, German legal judgment retrieval, and Chinese related-case retrieval have different relevance rules. One overall score can hide whether a model is learning legal vocabulary, jurisdiction-specific structure, or broader semantic analogy.
The BM25/dense split is instructive. BM25-led German tasks show the value of exact legal terminology. Dense-led AILA tasks show fact-to-authority semantic matching. Hybrid-led Chinese and contract tasks show that both exact legal anchors and semantic relevance are needed for candidate generation.
Training and Leakage Notes
Useful training data includes jurisdiction-specific case retrieval, fact-to-statute retrieval, citation prediction, German legal QA, Chinese related-case retrieval, consumer-contract QA, contract clause retrieval, and legislative search. Hard negatives should share statutes, charges, agencies, contract topics, or legal vocabulary while failing the decisive legal relation.
Exclude NanoLaw evaluation queries, positives, qrels, legal cases, statutes, contract clauses, bill summaries, and direct synthetic variants. Legal datasets often reuse public benchmark splits, so source and text-overlap audits are necessary before training.
Source Reference Table
| Source | Year | Type | URL |
| Overview of the FIRE 2019 AILA Track: Artificial Intelligence for Legal Assistance | 2019 | paper | https://ceur-ws.org/Vol-2517/T1-1.pdf |
| LeCaRDv2: A Large-Scale Chinese Legal Case Retrieval Dataset | 2023 | paper | https://arxiv.org/abs/2310.17609 |
| LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models | 2023 | paper | https://arxiv.org/abs/2308.11462 |
Metadata Summary
| Field | Value |
| Task pages | 8 |
| Queries | 1,259 |
| Split-local documents | 15,142 |
| Positive qrels | 5,488 |
| Languages | de, en, zh |
| Categories | natural_language |
| Positives / query avg | 4.36 |
Task Metadata Summary
| Task | Backing dataset | Lang | Category | Queries | Docs | Positives | BM25 nDCG@10 | Dense nDCG@10 | Reranking hybrid nDCG@10 | Best profile |
| NanoAILACasedocs | NanoLaw | en | natural_language | 50 | 186 | 195 | 0.2805 | 0.4003 | 0.3667 | Dense |
| NanoAILAStatutes | NanoLaw | en | natural_language | 50 | 82 | 217 | 0.2070 | 0.2711 | 0.2564 | Dense |
| NanoGerDaLIRSmall | NanoLaw | de | natural_language | 200 | 9,969 | 235 | 0.5911 | 0.2405 | 0.4287 | BM25 |
| NanoLeCaRDv2 | NanoLaw | zh | natural_language | 159 | 3,795 | 3,896 | 0.6528 | 0.6940 | 0.7225 | Reranking hybrid |
| NanoLegalBenchConsumerContractsQA | NanoLaw | en | natural_language | 200 | 153 | 200 | 0.7556 | 0.7785 | 0.8054 | Reranking hybrid |
| NanoLegalBenchCorporateLobbying | NanoLaw | en | natural_language | 200 | 319 | 200 | 0.8955 | 0.9108 | 0.9068 | Dense |
| NanoLegalQuAD | NanoLaw | de | natural_language | 200 | 200 | 200 | 0.7420 | 0.5819 | 0.7043 | BM25 |
| NanoLegalSummarization | NanoLaw | en | natural_language | 200 | 438 | 345 | 0.5678 | 0.5861 | 0.6085 | Reranking hybrid |