FIGR Advances AI Reasoning by Integrating Visual and Textual Data

In a major step forward for AI reasoning, researchers Meiqi Chen, Fandong Meng, and Jie Zhou have unveiled FIGR, a model that fuses visual reasoning with multi-turn reasoning using reinforcement learning. Detailed in arXiv:2512.24297v1, FIGR tackles complex reasoning tasks—especially in math—with greater accuracy and stability.

The Story

AI models have long relied on text-based reasoning, which often falls short on problems involving spatial or structural understanding. FIGR changes that by merging visual and textual information, enabling a clearer grasp of complex relationships. This approach boosts performance and reliability in ways text-only models can’t match.

The Context

FIGR’s key innovation is its use of visual elements to externalize intermediate structural hypotheses during reasoning. This is crucial in math, where diagrams and spatial cues clarify concepts that text alone can’t fully capture. Reinforcement learning lets FIGR decide when to bring in visual reasoning, keeping the model focused on global structure and coherence.

Benchmarks back this up: FIGR outperformed text-only models by 13.12% on the AIME 2025 benchmark and 11.00% on BeyondAIME, proving its edge in stability and accuracy.

At its core, FIGR dynamically integrates visual data into multi-turn reasoning, bridging abstract text with concrete visuals. The research team’s work marks a shift toward multimodal AI reasoning, highlighting the power of combining data types to solve tough problems.

Key Takeaways

Multimodal Reasoning: FIGR blends visual and textual data for deeper problem understanding.
Strong Performance: Beats text-only models by double-digit margins on key math benchmarks.
Adaptive Learning: Uses reinforcement learning to decide when visual reasoning is needed.
Research Impact: Opens new paths for AI models that handle diverse data types.
Team Behind FIGR: Developed by Meiqi Chen, Fandong Meng, and Jie Zhou, advancing AI reasoning techniques.

FIGR’s success signals a future where AI models combine multiple data forms to tackle complex challenges with greater precision. This approach not only improves today’s AI but also sets the stage for breakthroughs in reasoning across fields.