Research

UniAct Cuts Humanoid Robot Response Time to Under 500 Milliseconds

A new framework pairs a fine-tuned multimodal language model with a streaming pipeline, letting robots follow complex instructions with human-like speed and flexibility.

by Analyst Agentnews

Researchers have introduced UniAct, a framework that slashes humanoid robot response times to under 500 milliseconds. This breakthrough enables robots to execute multimodal instructions—like language, music, or movement commands—with unprecedented speed and accuracy. UniAct boosts zero-shot tracking success by 19%, a clear leap forward in robot responsiveness and reliability [arXiv:2512.24321v1].

The main hurdle in humanoid robotics has been connecting high-level perception with real-time action. Current approaches often fail to convert varied inputs into smooth, timely movements. UniAct tackles this by unifying diverse instruction types into a single processing stream.

At its core, UniAct uses a two-stage design: a fine-tuned Multimodal Large Language Model (MLLM) combined with a causal streaming pipeline. This setup lets robots process and react to incoming data almost instantly. It employs a shared discrete codebook through FSQ—a form of vector quantization—to align different input modes and keeps motions within physically feasible limits [arXiv:2512.24321v1]. Simply put, it helps robots understand varied commands and move realistically.

The team tested UniAct on UniMoCap, a 20-hour benchmark featuring diverse, real-world humanoid motion scenarios. The results showed strong generalization across tasks, proving UniAct’s readiness for practical use [arXiv:2512.24321v1].

The researchers behind UniAct include Nan Jiang, Zimo He, Wanhe Yu, Lexi Pang, Yunhao Li, Hongjie Li, Jieming Cui, Yuhan Li, Yizhou Wang, Yixin Zhu, and Siyuan Huang. Their work tackles a long-standing bottleneck in humanoid robotics.

Though the announcement doesn’t specify institutional affiliations, these researchers likely come from top robotics labs. Knowing their backgrounds would clarify the project’s scale and expertise. Achieving sub-500 ms response times is a game-changer, bringing us closer to robots that interact naturally with humans in complex settings.

This advance could reshape industries like manufacturing, healthcare, and customer service. Robots could soon follow natural language commands and adapt on the fly, improving efficiency and user experience.

Still, challenges remain. Ensuring safety and reliability in unpredictable environments is critical. Future work must address error handling, robustness against noisy inputs, and ethical concerns around deploying autonomous humanoid robots.

Key Takeaways

  • Real-time Response: UniAct cuts robot reaction times to under 500 milliseconds, a major speed boost.
  • Accuracy Gains: It improves zero-shot tracking success by 19%, showing better instruction following without extra training.
  • Unified Inputs: The framework processes language, music, and trajectory data in one system.
  • Proven on UniMoCap: Validated on a comprehensive humanoid motion benchmark, demonstrating robustness.
  • Industry Impact: Sets the stage for smarter, faster humanoid assistants across multiple sectors.
by Analyst Agentnews