Scientists are grappling with the persistent problem of “hallucinations,” where factually incorrect information is generated within financial search augmented generation (RAG) systems. Ant Group’s Taoye ying, Haoyuan Hu, Yaxin Fan, along with Xinhao Chen, Xinya Wu, Kai Deng, and others, introduce a new reinforcement learning framework powered by fine-grained knowledge verification (RLFKV) to address this critical problem. Their work is important because it goes beyond simple response evaluation to break down financial answers into individual units of knowledge and validate each one against source documents. Experiments on both public and newly created datasets demonstrate that this fine-grained approach provides more accurate feedback to the model, improves factual consistency, and prevents the generation of misleading financial information.
This innovative approach analyzes financial responses into individual knowledge units and carefully evaluates the correctness of each unit against the acquired information, providing more accurate signals for model optimization.
At the heart of RLFKV is the ability to break down complex financial answers into “atomic knowledge units” that represent minimal, self-contained financial facts. Each of these units undergoes rigorous validation against the documents they retrieve, producing a fine-grained reward system that directly improves alignment with the source material.
This fine-grained approach goes beyond traditional reinforcement learning methods that rely on costly human annotations and imprecise binary reward signals. Additionally, to prevent the model from producing overly succinct responses as a shortcut to higher rewards, the framework incorporates an “information reward” that ensures retention of at least as many knowledge units as the baseline model.
Experiments conducted using publicly available Financial Data Description (FDD) tasks and the newly created FDD-ANT dataset demonstrate consistent improvements in accuracy and fidelity. This study confirms the effectiveness of RLFKV in reducing hallucinations and increasing the reliability of financial RAG systems.
This advancement is particularly important because financial inquiries are time-sensitive and even the slightest inaccuracy can have significant consequences. The framework’s ability to operate without human-annotated reference answers significantly reduces operational costs and scalability challenges.
This study details a system that leverages the financial quadruple structure: entities, metrics, values, and timestamps to accurately capture the smallest unit of knowledge in financial documents. This design specifically addresses the strict temporal sensitivity and quantitative nature of financial data, ensuring a more robust and accurate valuation process. By employing specialized prompts to guide the evaluation model, the system effectively decomposes responses, verifies factual consistency, and ultimately yields more reliable and useful output.
Fine-grained fidelity and input rewards for adjusting financial language models
Decomposing economic reactions into atomic knowledge units underpins this work to reduce illusions in search-enhancement generation systems. This work addresses the inaccuracies that arise when large-scale language models generate responses that are inconsistent with retrieved financial documents, a critical issue given the time-sensitive nature of the domain.
First, the generated responses are divided into minimal self-contained representations of financial facts, allowing detailed assessment of the correctness of the facts. Each knowledge unit is then subjected to rigorous evaluation against the retrieved documentation to determine its consistency and generate fine-grained fidelity rewards.
This reward system provides accurate optimization signals and improves the consistency between the generated text and the source material. Informational rewards are included to counter potential reward hacking, where the model may generate overly concise responses to maximize rewards. This secondary reward ensures that the policy model has at least the same number of knowledge units as the baseline model, ensuring comprehensive coverage.
The policy model is then optimized by maximizing both loyalty and informational rewards, leading to the generation of accurate and informative financial summaries. The experiments were conducted using BizFinBench’s public financial data description task and the newly proposed dataset FDD-ANT.
As the evaluation model, we adopted Qwen3-32B for response decomposition and knowledge unit verification. Beyond the coarse binary rewards typically obtained from human annotation, this methodology facilitates a more stable training process by reducing labeling costs and improving the descriptive quality of the financial data generated. This study demonstrates that the consistency of facts across both datasets is consistently improved, validating the effectiveness of the proposed reinforcement learning framework.
Consistency evaluation of fine-grained facts using atomic knowledge unit decomposition and reinforcement learning
By breaking down financial answers into atomic knowledge units and then verifying them against captured documentation, you can rigorously assess factual consistency. This framework enables detailed optimization signals and improves consistency with retrieved information without requiring annotated reference answers. Each answer is categorized into atomic knowledge units that are self-contained representations of minimal financial facts, and these units are evaluated for support within the retrieved documentation.
Evaluation results focus on factual accuracy and directly provide fine-grained rewards to guide model optimization. To prevent the model from producing overly concise responses as a shortcut to higher rewards, a binary pairwise constraint is included to ensure that the policy model has at least the same number of knowledge units as the base model.
This study leverages the financial quadruple structure: entities, metrics, values, and timestamps to accurately capture the minimal units of knowledge within financial texts, addressing the temporal sensitivity and quantitative nature of the field. This structure enforces integrity constraints and invalidates assertions with missing key elements.
The valuation model uses specialized prompts to decompose responses and explicitly define four important aspects of financial data. Experiments conducted on the BizFinBench financial data description task and the newly proposed FDD-ANT dataset demonstrate the effectiveness of this approach. This framework provides fine-grained rewards for stable optimization, leads to higher quality generation, and eliminates the need for costly human annotation. This method addresses the problem of hallucinations, a serious concern in time-sensitive financial fields, where models generate responses that are inconsistent with the acquired data.
By decomposing the response into individual knowledge units, the framework evaluates the correctness of each unit, providing accurate optimization signals and increasing consistency with the retrieved documents. Additionally, this study introduces the FDD-ANT dataset, a new resource for evaluating financial data description tasks with diverse data types, and incorporates informational rewards to prevent over-succinct responses during the reinforcement learning process.
Experiments on both publicly available and newly created datasets demonstrate consistent performance improvements and validate the effectiveness of the proposed approach. Error analysis revealed that the remaining inaccuracies are primarily related to handling relative time expressions, converting fiscal years to calendar years, and rounding numbers.
The findings establish a clear path towards more reliable financial language models. Limitations acknowledged by the researchers include continued challenges with temporal and numerical accuracy, suggesting room for improvement. Future work will specifically focus on addressing these issues and improving the reward mechanism to further increase the accuracy of the responses generated.
👉 More information
🗞 Reducing illusions in financial search extension generation through fine-grained knowledge verification
🧠ArXiv: https://arxiv.org/abs/2602.05723
