Research
New Study Exposes Overconfidence Risks in Large Language Models
Large language models often overestimate their task success, raising urgent concerns about AI safety and misuse.
New Benchmark Exposes Key Differences in Reasoning of Large Language Models
Study compares supervised fine-tuning and reinforcement learning, revealing critical insights for stronger AI training.
Bayesian Geometry in AI: How Language Models Manage Uncertainty
New research shows production-grade language models keep Bayesian inference structures that shape their predictions.
VLA-RAIL Cuts Jitter, Boosts Speed in Robotic Motion Control
VLA-RAIL improves Vision-Language-Action models by smoothing robotic motion and reducing stalls.
New Study Boosts Neural Architecture Search with Two Key Techniques
Few-Shot Architecture Prompting and Whitespace-Normalized Hash Validation cut costs and speed up computer vision model design.
Semantic Lookout: A Human-Overridable Vision-Language Model for Safer Autonomous Ships
Semantic Lookout offers a vision-language fallback for autonomous vessels, meeting draft IMO MASS Code requirements for human override and safety.
AI System Sets New Standard for Surgical Training Accuracy and Consistency
A novel AI framework using YOLO and DeepSORT delivers real-time, objective feedback in microanastomosis training, matching expert evaluations.
FIGR Advances AI Reasoning by Integrating Visual and Textual Data
FIGR combines visual reasoning with reinforcement learning to outperform text-only models on complex math tasks.
World In Your Hands: Advancing Robotic Hand Dexterity
TARS Robotics launches WiYH, a large-scale dataset and tools to boost human-like manipulation skills in robots.
BanglaCodeAct Sets New Standard for Bangla-to-Python Code Translation
BanglaCodeAct uses the Qwen3-8B model to deliver record accuracy in translating Bangla instructions into Python code.
ArtiSG Advances Robot Handling of Articulated Objects
ArtiSG boosts robots’ ability to interact with doors, drawers, and more by encoding human demonstrations into 3D scene graphs, improving precision and recall.
SpaceTimePilot: New AI Model Gives Precise Control Over Video Generation
Researchers introduce SpaceTimePilot, a video diffusion model that lets users control camera angles and motion independently, promising new possibilities for video editing and animation.
OpenAI Advances AI for Automated Theorem Proving
OpenAI is using AI to automate theorem proving, with big implications for cryptography and software verification.
LuxIA Cuts Through Photonic Neural Network Limits to Scale AI Hardware
LuxIA introduces a new method that slashes memory and compute demands in photonic neural networks, promising faster, more scalable AI hardware.
OpenAI Launches Universe to Test AI Across Diverse Digital Worlds
OpenAI introduces Universe, a platform that challenges AI with a variety of games and apps to measure general intelligence.