Alibaba Releases Qwen-Robot Series, Its First Embodied AI Model Family
Alibaba has released the Qwen-Robot series, its first-ever embodied AI model family, marking a major push into physical intelligence that connects large language models directly to robotic action in the real world, the company announced on Tuesday. The Qwen-Robot suite includes three distinct models, each targeting a different layer of physical intelligence. Qwen-RobotNav handles visual language navigation, unifying instruction following, point and target navigation, object t
Alibaba has released the Qwen-Robot series, its first-ever embodied AI model family, marking a major push into physical intelligence that connects large language models directly to robotic action in the real world, the company announced on Tuesday. The Qwen-Robot suite includes three distinct models, each targeting a different layer of physical intelligence. Qwen-RobotNav handles visual language navigation, unifying instruction following, point and target navigation, object tracking, and autonomous driving into a single model trained on 15.6 million samples. Qwen-RobotManip addresses robotic manipulation via a visual language action architecture built on a Qwen3.5-4B VL backbone paired with a flow-matching diffusion transformer action head, trained on over 38,100 hours of operational data built entirely from open-source sources. Qwen-RobotWorld functions as a world model for physical agents, predicting physics-compliant futures across manipulation, driving, and navigation scenarios through a natural language action interface. One of the most striking demonstrations involved deploying Qwen-RobotNav on a Unitree Go2 quadruped robot equipped with NVIDIA Jetson Thor hardware and nothing more than a single low-resolution camera. The robot navigated an unfamiliar apartment step by step, following verbal instructions to traverse multiple rooms without any prior mapping, achieving an inference latency of just 196 milliseconds. Alibaba also introduced Qwen-RobotClaw, an internal robotics agent framework that allows Qwen VLM agents to call the Qwen-Robot suite models as physical world tools, managing long-horizon task context and memory. Researchers demonstrated this framework in a real-world scenario where an agent searched a building for an available restroom, detected an out-of-order sign, and autonomously replanned its route to find an alternative. The company also open-sourced Chat2Robot, a browser-based embodied intelligence evaluation platform where users can chat with a robot and observe real-time responses. The platform currently supports Qwen-RobotManip trained on 50 tasks using the RoboTwin-Clean dataset. Alibaba's move positions it alongside major global players in the race to bridge large language models with physical world interaction, a space that analysts estimate could become a multi-billion dollar market within the next three years.
📌 Kaynak
Bu haber XML kaynağından derlenmiştir. Tamamı için orijinal habere gidin.
Orijinal haberi oku →