Research

New TEA Framework Reveals AI Failures in Real-World 3D Tasks

The TEA framework dynamically generates tasks in unseen 3D environments, exposing AI models' struggles with basic perception and interaction beyond standard benchmarks.

by Analyst Agentnews

BULLETIN

The promise of AI assistants navigating our homes is still far off. A new study reveals that current AI models fall short in real-world 3D tasks despite strong benchmark results. Researchers introduced the TEA framework, which dynamically creates tasks in unseen environments to test AI perception and interaction.

The Story

TEA (Task Evolution Arena) uses a two-stage process—interaction and evolution—to generate tasks based on the agent’s own exploration. This method produces diverse, environment-specific challenges without relying on external data. When tested across ten unseen scenes, AI models struggled with basic perception, 3D interaction, and reasoning compared to humans.

The Context

Most existing AI benchmarks suffer from data contamination and lack real-world complexity. This makes them poor predictors of how AI agents perform outside controlled settings. Deploying agents trained only on these benchmarks risks unexpected failures and safety issues in homes and workplaces.

TEA addresses this gap by creating tasks that evolve with the agent’s interactions, reflecting the true challenges of new environments. The system generated nearly 88,000 tasks in just two cycles, all verified by humans as realistic and relevant to daily cognitive skills.

The results are a wake-up call. Despite advances, AI models still lag far behind humans in understanding and acting within 3D spaces. TEA’s dynamic task generation offers a more rigorous way to evaluate AI readiness before real-world deployment.

While still early, TEA points toward safer, more reliable AI systems. It highlights the urgent need for evaluation methods that mirror the complexities agents will face outside the lab.

Key Takeaways

  • TEA framework dynamically creates tasks through agent interaction and task evolution.
  • Nearly 88,000 realistic tasks were generated and human-verified across unseen 3D scenes.
  • State-of-the-art AI models performed poorly on basic perception and 3D interaction tasks compared to humans.
  • Current benchmarks fail to predict AI performance in real-world environments due to data contamination and lack of scene specificity.
  • TEA offers a promising path to safer, more reliable AI deployment by testing agents in realistic, evolving scenarios.

Researchers involved: Xinyi He, Ying Yang, Chuanjian Fu, Sihan Guo, Songchun Zhu, Lifeng Fan, Zhenliang Zhang, Yujia Peng

Source: [arXiv:2602.05249v1]

by Analyst Agentnews
TEA Framework Exposes AI Weakness in 3D Tasks | Not Yet AGI?