Robbyant open-sources LingBot-VLA model as a ‘universal brain’ for robots

Robbyant open-sources LingBot-VLA model as a ‘universal brain’ for robots

Robbyant, an embodied AI firm inside Alibaba affiliate Ant Group, has introduced the open-source launch of LingBot-VLA, a vision-language-action (VLA) mannequin designed to function a “common mind” for real-world robotics, which “helps scale back post-training prices and speed up the trail to scalable deployment”, in line with the corporate.

To this point, LingBot-VLA has been efficiently tailored to robots from main producers, together with Galaxea Dynamics and AgileX Robotics, demonstrating sturdy cross-morphology switch capabilities throughout various robotic platforms.

The mannequin’s efficiency was evaluated on the GM-100 benchmark, a complete analysis suite open-sourced by Shanghai Jiao Tong College that contains 100 real-world duties.

In exams carried out throughout three distinct bodily robotic platforms, LingBot-VLA achieved larger process success charges than different evaluated fashions.

Notably, when depth info was included, the mannequin’s spatial notion improved considerably, setting a brand new document on process success price.


Moreover, on the RoboTwin 2.0 simulation benchmark, which options 50 difficult duties underneath intense environmental randomization, together with various lighting, muddle, and top perturbations, LingBot-VLA leveraged its learnable question alignment mechanism to combine depth cues successfully and achieved a better process success price in complicated eventualities, demonstrating sturdy efficiency on each simulation and real-world deployment.

To this point, the deployment of embodied AI has been hampered by cross-platform generalization challenges stemming from variations in robotic morphology, process definitions and working environments.

Builders are sometimes compelled to repeatedly gather knowledge, retrain fashions, and fine-tune parameters for every new deployment, resulting in excessive prices, low reusability, and restricted scalability.

To handle these challenges, LingBot-VLA was pre-trained on over 20,000 hours oflarge-scale real-world interplay knowledge, masking 9 mainstream dual-arm robotic configurations, together with AgileX, Galaxea R1Pro, RILite, and AgiBot G1.

This allows a single mannequin, or a common mind, to be deployed throughout a variety of robotic morphologies, together with single-arm, dual-arm, and humanoid platforms, whereas sustaining excessive success charges and robustness regardless of variations in duties, environments, or {hardware} configurations.

Past generalization, LingBot-VLA additionally demonstrates sturdy knowledge and computational effectivity. With complete optimizations to its underlying codebase, LingBot-VLA achieves a 1.5x to 2.8x enchancment in coaching velocity in contrast with different frameworks comparable to StarVLA and OpenPI.

Notably, this open-source launch consists of not solely the mannequin weights but in addition a whole, production-ready codebase, that includes instruments for knowledge processing, environment friendly fine-tuning, and automatic analysis.

This toolchain will help shorten coaching cycles and reduces each compute necessities and time price to industrial deployment, permitting builders to quickly adapt LingBot-VLA to their very own robots and use instances with minimal overhead.

Zhu Xing, CEO of Robbyant, stated: “For embodied intelligence to attain large-scale adoption, we’d like extremely succesful and cost-effective basis fashions that work reliably on actual {hardware}.

“With LingBot-VLA, we intention to push the boundaries of reusable, verifiable, and scalable embodied AI for real-world deployment. Our objective is to speed up the combination of AI into the bodily world so it may possibly serve everybody sooner.”

“LingBot-VLA is Ant Group’s first open-source embodied AI mannequin and marks one other milestone in our efforts towards Synthetic Basic Intelligence (AGI),” Zhu added. “Ant Group is dedicated to advancing AGI by an open and collaborative strategy.

“To this finish, we’ve launched InclusionAI, a complete technological ecosystem spanning foundational fashions, multimodal intelligence, reasoning, novel architectures, and embodied AI. The open-sourcing of LingBot-VLA is a key step on this initiative.

“We stay up for working with builders worldwide to speed up the event and large-scale adoption of embodied intelligence and assist advance progress towards AGI.”

The announcement was made as a part of Robbyant’s “Evolution of Embodied AI Week” initiative. On January 27, Robbyant unveiled LingBot-Depth, a high-precision spatial notion mannequin.

When paired with LingBot-Depth, LingBot-VLA can leverage higher-quality depth representations, successfully upgrading the system’s “imaginative and prescient” and enabling robots to “see extra clearly and act extra intelligently”.

To be taught extra about LingBot-VLA, go to:

  • Code:
  • Tech Report:
  • Hugging Face: