NanoVNMTEB / fi_qa2018_vn

Overview

fi_qa2018_vn is the Vietnamese NanoVNMTEB version of the FiQA 2018 financial question-answer retrieval task. FiQA was introduced for financial opinion mining and question answering, and BEIR uses it as a domain-specific retrieval benchmark. In this VN-MTEB split, translated investor and personal-finance questions retrieve translated answer posts or financial discussion snippets.

The Nano split contains 200 queries, 10,000 candidate documents, and 549 positive qrels. Queries average 69.43 characters, while documents average 811.0306 characters. The task is finance-domain retrieval rather than generic QA: many questions depend on products, jurisdictions, account types, taxation, transfers, investments, and practical decision context. reranking_hybrid is strongest across nDCG@10, hit@10, and recall@100, with dense retrieval second and BM25 third. The task rewards systems that combine semantic financial reasoning with exact product, country, and terminology matching.

Details

What the Original Data Measures

The FiQA 2018 challenge includes an opinion-based financial QA task over answer posts. Questions ask about investment decisions, tax treatment, market terminology, personal-finance planning, company roles, transfers, inheritance, and financial instruments. In retrieval form, the goal is to find answer posts that address the same financial decision or interpretation.

The Vietnamese version translates the questions and answers while preserving finance-specific tokens such as product names, currencies, jurisdictions, account types, stock-market terms, and company roles. Relevance depends on matching the same financial situation, not just the same product. A document about bonds is relevant only if it answers the same passive-investment or portfolio question.

Observed Data Profile

The task has 549 positives across 200 queries. The average is 2.745 positives per query, the median is 2, and 129 queries have multiple positives, giving a multi-positive rate of 64.5%. The maximum positive count is 15. This is a moderately multi-positive task where several answer posts may address the same financial need.

Documents are often long and caveated. They may include assumptions about country, law, tax status, account type, product structure, or risk preference. A query may be short, but the relevant answer may explain conditions under which the advice applies. This makes retrieval sensitive to context and constraints.

BM25 Evaluation Profile

BM25 reaches nDCG@10 of 0.3388107397, hit@10 of 0.6000, and recall@100 of 0.5901639344 with a top-500 candidate set. Lexical retrieval can use strong terms such as ETF, Vanguard, director, China, shareholder return, tax, bid, ask, bond, or inheritance. These anchors help when the same financial product or term appears in the answer.

The limitation is that finance answers frequently use explanatory language rather than repeating the exact question. A relevant answer may discuss the mechanism behind a bid-ask spread, taxation rule, short-sale counterparty risk, or passive investment strategy using different wording. BM25 can also retrieve same-product documents that give advice for a different jurisdiction or decision.

Dense Evaluation Profile

Dense retrieval with harrier-oss-270m reaches nDCG@10 of 0.4056925340, hit@10 of 0.6500, and recall@100 of 0.6612021858. It is clearly stronger than BM25, showing that semantic matching is important for finance QA. Dense retrieval can connect a financial question with an answer that explains the same situation even when the exact wording differs.

Dense retrieval is useful for questions about investment strategy, legal or tax interpretation, transfers, inheritance planning, and market mechanics. Its risk is that finance topics can be semantically close while requiring different advice. Same-product or same-country answers may be wrong if the account type, time horizon, legal status, or risk assumption differs.

Reranking Hybrid Evaluation Profile

reranking_hybrid is strongest: nDCG@10 is 0.4118067935, hit@10 is 0.6700, and recall@100 is 0.7030965392. The top-100 candidate pool has mean candidate count 100.105, with 21 safeguard-positive rows and 21 rows containing 101 candidates. Hybrid retrieval improves modestly over dense at the top ranks and more clearly in recall@100.

The result fits the domain. Dense retrieval captures financial intent, while sparse retrieval preserves product names, countries, currencies, and technical market terms. Hybrid search can recover answers that share a decisive product or jurisdiction token while still benefiting from semantic matching. The small top-rank gain over dense suggests that final reranking quality matters more than candidate expansion alone.

Metric Interpretation for Model Researchers

The metric ordering shows a domain where semantic similarity is stronger than lexical overlap, but lexical constraints still improve coverage. BM25 alone is not enough; dense retrieval gives a large nDCG@10 gain. reranking_hybrid is best, especially on recall@100, indicating that exact finance tokens remain important.

The median of 2 positives and multi-positive rate of 64.5% make multi-positive training useful. Many questions have multiple acceptable answers, but those answers may differ in caveats or assumptions. A strong model should retrieve several answer posts while ranking those that match the query's decision context highest.

Query and Relevance Type Tendencies

Queries include market terminology, tax filing, short-selling consequences, passive-investment allocation, inheritance, international transfers, company directors invoicing their own company, real-estate investment, and shareholder-return headlines. Relevant documents often explain mechanisms or conditions rather than giving a short fact.

Relevance is context-sensitive. A financial answer can be wrong for the query if it assumes a different country, account structure, tax rule, or investment objective. The task therefore tests domain-aware retrieval with constraints, not just broad finance-topic similarity.

Representative Failure Modes

BM25 can retrieve same-term answers that do not match the jurisdiction or decision. Dense retrieval can retrieve advice-like answers from the same financial neighborhood but with incompatible assumptions. Hybrid retrieval can improve recall but still needs reranking to prefer answers with the right legal, product, and timing context.

Another failure mode is treating financial terms as generic semantics. Terms such as bid, ask, short seller, bond, ETF, and tax deduction have specific meanings. Models should preserve these meanings and avoid overly broad paraphrase.

Training Data That May Help

Useful training data includes official FiQA training pairs with overlap removed, Vietnamese personal-finance and investment QA, financial forum answer ranking data, and translated finance retrieval data. Data should include product names, currencies, countries, account types, and caveats.

Synthetic data can generate Vietnamese finance questions from answer posts, but it should include hard negatives sharing the same product or country with different advice. It should also avoid turning benchmark content into training data because finance questions can be highly specific.

Model Improvement Notes

The main improvement direction is domain-aware hybrid retrieval. Dense retrieval should model the financial decision; sparse retrieval should preserve exact financial terms and jurisdictional clues. Rerankers should compare assumptions, caveats, and product context between query and answer.

Error analysis should separate failures by product mismatch, jurisdiction mismatch, missing caveats, and generic topic confusion. Because financial content can be stale or jurisdiction-specific, benchmark use should remain separate from real advice generation.

Example Data

Query	Positive document
"Sell on ask", "sell on bid" trong chứng khoán là gì? [53 chars]	Giá mua (bid) và giá bán (ask) là mức giá cao nhất để mua và thấp nhất để bán trên thị trường, điều đó không có nghĩa là bạn chỉ nên mua/bán ở mức giá này. Tuy nhiên bạn có thể mua/bán theo mức giá mà bạn muốn mặc dù việc thực hiện ở một mức giá khác với giá mua/bán thường khó khăn hơn vì các nhà đầu tư khác sẽ hướng tới mức giá mua/bán đang được áp dụng. Về lý thuyết, bạn có thể mua ở mức giá ask và bán ở mức giá bid, tuy nhiên việc có thực hiện được hay không lại là một vấn đề khác. [489 chars]
Giải thích chi phí sinh viên - Để khai thuế cho năm tiếp theo [61 chars]	Giả sử ở đây bạn đang nói về việc khấu trừ học phí của bạn như một khoản khấu trừ dưới mức cơ bản như một chi phí kinh doanh hoặc tương tự, sau đó nó phụ thuộc. Theo 1.162-5, nếu giáo dục: Sau đó nó được coi là một chi phí kinh doanh hợp pháp và có thể khấu trừ. Nếu không - nếu bạn đi học để theo đuổi một nghề nghiệp khác, chẳng hạn như người làm việc như một người bồi bàn nhưng đi học để lấy bằng y tá, hoặc người làm việc như một giáo viên lấy bằng luật - sau đó không phải; bạn sẽ phải đủ điều kiện theo một trong các khoản tín dụng khác (nhưng ít phức tạp hơn). Đọc thêm về chủ đề này tại Chủ đề thuế 513. Lưu ý rằng khoản khấu trừ phổ biến nhất khác - Khoản khấu trừ học phí và lệ phí trên mức cơ bản - hết hạn vào năm 2016 và không áp dụng (vẫn chưa?) năm 2017, và hơn nữa sẽ không yêu cầu hầu hết những gì bạn mô tả vì nó chỉ tính học phí và lệ phí trả trực tiếp cho cơ sở giáo dục và yêu cầu như một điều kiện tham dự, vì vậy sách, đỗ xe, v.v. không tính. [966 chars]
Điều gì xảy ra với "người mua dài" của một cổ phiếu khi người bán ngắn khác thất bại (đó là, thua lỗ không giới hạn phá sản người bán ngắn) [139 chars]	Nếu không có gì tinh tế mà tôi bỏ lỡ, chẳng có gì xảy ra với người mua. Giả sử Alice muốn bán khống 1000 cổ phiếu XYZ ở mức $5. Cô ta vay cổ phiếu từ Bob và bán chúng cho Charlie. Bây giờ Charlie thực sự sở hữu cổ phiếu; chúng nằm trong tài khoản của anh ta. Nếu sau này cổ phiếu tăng lên đến $10, Charlie sẽ vui mừng; anh ta có thể bán những cổ phiếu mà anh ta hiện đang sở hữu, và thu được lợi nhuận $5000. Alice vẫn còn số tiền $5000 mà cô ta nhận được từ việc bán khống, nhưng cô ta nợ 1000 cổ phiếu cho Bob. Như vậy cô ta đã nợ $5000. Nếu Bob đòi lại khoản vay, cô ta sẽ phải tìm cách để có thêm $5000 để mua 1000 cổ phiếu ở mức giá $10 trên thị trường mở. Nếu cô ta không làm được, thì đó là chuyện giữa cô ta và Bob. Có thể cô ta sẽ phá sản và Bob sẽ phải ghi nhận khoản lỗ. Nhưng tất cả những điều này đều không ảnh hưởng đến Charlie! Anh ta đã có được số cổ phiếu mà anh ta đã trả tiền, và không ai có thể lấy chúng đi khỏi anh ta. Anh ta không có lý do gì để quan tâm đến nguồn gốc của chún... [1,000 / 1,159 chars]

Source Reference Table

Source	Role
FiQA 2018	Original financial opinion mining and QA challenge
FiQA project page	Challenge and dataset context
BEIR	Retrieval benchmark framing
VN-MTEB	Vietnamese benchmark collection using translated retrieval tasks
GreenNode dataset card	Public dataset entry for this Vietnamese split

Dataset Information

Field	Value
Nano set	NanoVNMTEB
Backing dataset	NanoVNMTEB
Task / split	fi_qa2018_vn
Hugging Face dataset	hakari-bench/NanoVNMTEB
Language	vi
Category	natural_language
Queries	200
Documents	10,000
Positive qrels	549
Positives / query avg	2.75
Positives / query min	1
Positives / query median	2.00
Positives / query max	15
Multi-positive queries	129 (64.50%)
Query length avg chars	69.43
Document length avg chars	811.03

Candidate Subsets

Profile	Config	nDCG@10	Hit@10	Recall@100	Candidates
BM25	`bm25`	0.3388	0.6000	0.5902	top-500
Dense	`harrier_oss_v1_270m`	0.4057	0.6500	0.6612	top-500
Reranking hybrid	`reranking_hybrid`	0.4118	0.6700	0.7031	top-100

Training and Leakage Metadata

Original train split: available
Evaluation split origin: translated VN-MTEB FiQA2018 test split from GreenNode/fiqa-vn
Train/eval overlap audit: not_audited
Leakage note: Exclude translated FiQA-VN test questions, qrels, and positive answers used by this Nano split.
Multi-positive training: multi_positive_objective
Useful training data: official FiQA training question-answer pairs with overlap removed, Vietnamese personal-finance and investment QA, finance-domain forum answer ranking data, translated financial QA data with overlap removed