Research
Why Top AI Models Still Fail High School Geometry
GeoBench reveals that vision-language models like OpenAI-o3 aren’t reasoning through geometry—they’re just recalling answers.
PATHWAYS Benchmark Reveals Critical Reasoning Failures in Web-Based AI Agents
A new benchmark exposes how current web-based AI agents stumble on multi-step reasoning, often fabricating their decision process and falling prey to misleading information.
AI Interprets Brain MRIs: A Step Forward in Neurological Diagnosis
An AI model shows promise in reading brain MRIs, aiming to speed up diagnosis and improve outcomes. But can it match human radiologists?
Reinforcement Learning’s Stability Problem Gets a Mathematical Fix
MSACL applies 19th-century stability theory to modern AI, aiming to stop robots from spiraling into chaos. A crucial step toward safe robotics—if the math holds up beyond the lab.
RoboMIND 2.0 Releases 310,000 Trajectories to Fix Robot Clumsiness
A massive new dataset and the MIND-2 framework tackle the sim-to-real gap, giving dual-arm robots the data they need to stop fumbling.
New Study Reveals Risks in Automated Vehicle Crash Patterns
Analysis of 2,500+ AV crashes exposes safety challenges and calls for smarter policies.
AI Models Reveal Bias in Lung Cancer Risk Estimates
New research exposes significant disparities in AI lung cancer risk tools, spotlighting urgent fairness issues in healthcare.
OpenAI’s New Research Proves That More Optimization Isn’t Always Better
A new study on reward model scaling laws suggests that pushing AI performance too hard can lead to 'Goodhart’s Law' on steroids.
OpenAI’s New Math Whiz Can Solve High School Olympiad Problems
By training a neural theorem prover on the Lean language, OpenAI is moving past chatbot hallucinations and into the rigid world of formal logic.
OpenAI Research: Teaching Models to Admit When They’re Guessing
The lab’s latest research into 'verbalized uncertainty' aims to fix the AI hallucination problem by teaching models to say 'I don't know.'
R-Debater Sets New Standard for AI Multi-Turn Debates
R-Debater uses argumentative memory to deliver more consistent and coherent AI debates, outperforming existing models.
Teaching Self-Driving Cars to Read the Room—Without the Lag
LSRE compresses complex vision-language model reasoning into a lightweight system, letting autonomous vehicles spot social hazards in real time.
Entropy-Aware Speculative Decoding Boosts Language Model Reasoning
New method uses entropy penalties to improve large language models’ accuracy and efficiency.
Nordlys Labs’ Mixture-of-Models Hits 75.6% Accuracy on SWE-Bench
New architecture routes tasks to specialized models based on success history, beating single-model performance without new foundation models.
Latent Motion Reasoning Advances Text-to-Motion AI
New research introduces Latent Motion Reasoning, overcoming key hurdles in Text-to-Motion generation with better semantic and motion alignment.