AGIBOT holds World Challenge 2026 to see how AI models perform on real tasks

AGIBOT holds World Challenge 2026 to see how AI models perform on real tasks

Members within the problem examined and debugged robots engaged on completely different duties. | Supply: AGIBOT

AGIBOT Innovation Expertise Co. final week hosted the AGIBOT World Problem 2026 alongside ICRA 2026 in Vienna. The corporate introduced collectively 526 analysis and enterprise groups from 27 international locations to compete throughout two embodied AI tracks: “Reasoning to Motion” and “World Mannequin.”

Shanghai-based AGIBOT stated the competitors highlighted a key shift in how embodied AI is evaluated. The corporate stated it confirmed that the trade is shifting past simulation scores towards closed-loop testing on actual robots, actual duties, and standardized benchmarks.

The competitors adopted a benchmark-driven format that mixed on-line automated analysis with an offline real-robot ultimate in Vienna. With AGIBOT’s EWMBench and Genie Sim Benchmark, the constant framework enabled automated testing, standardized metrics, and reproducible outcomes.

Through the offline ultimate, finalist groups accomplished duties utilizing the AGIBOT G2 humanoid robotic. By incorporating real-robot validation into the analysis course of, the competitors positioned robotic stability, real-world adaptability, and long-horizon process reliability on the heart of the scoring system. The corporate, also called Zhiyuan Robotics Co., stated this extra carefully aligns technical analysis with sensible deployment wants.

The problem drew analysis and trade groups from main establishments and corporations, together with the Chinese language Academy of Sciences, Tsinghua College, the College of Science and Expertise of China, the College of California San Diego, Russia’s Sber Robotics Middle, Alibaba, Amap, and vivo. Greater than 100 groups surpassed the official baseline.

What’s the distinction between the R2A and WM tracks?

The 2 tracks on the AGIBOT World Problem 2026 mirrored the broader evolution of embodied AI from process execution towards understanding, prediction, and decision-making, based on AGIBOT.

The Reasoning to Motion (R2A) monitor  evaluated how robots perceive duties, plan actions, and execute them in bodily environments. The R2A monitor, upgraded from the 2025 Manipulation monitor, expanded the analysis from motion execution to the complete means of setting understanding, process planning, and bodily execution.

The World Mannequin (WM) monitor targeted on how AI techniques predict physical-world modifications and mannequin interactions primarily based on robotic actions and sensor inputs.

Groups educated reasoning-and-manipulation fashions utilizing the AGIBOT WORLD open-source dataset and evaluated them by way of Genie Sim 3.0, with the benchmark protecting language understanding, spatial reasoning, atomic abilities, disturbance adaptation, and zero-shot switch.

Within the ultimate rating, PrismBot from vivo received the championship with 43.47 factors, adopted by Shanghai RoboParty’s RP-VLA with 35.66 factors and Russia’s GreenVLA with 33.19 factors.

AGIBOT targets grocery store duties with the problem

Alongside the competitors, AGIBOT and Dexmal launched a supermarket benchmark monitor targeted on end-to-end decision-making and whole-body management. This monitor included non-ideal bodily interactions, together with object drops and greedy failures, to raised mirror the complexity of real-world interplay and supply a extra sensible analysis framework for world mannequin analysis.

Set in a practical retail setting, the monitor required fashions to finish the complete cellular manipulation course of, from autonomous navigation and merchandise picking to merchandise transport and placement, beneath bodily constraints equivalent to shelf top limits and randomized merchandise placement. Via API-based distant management, contributors’ algorithms straight managed actual robots, making a sensible benchmark for evaluating embodied intelligence in deployment-oriented situations.

Within the World Mannequin (WM) monitor, NeoVerse-ABot, a joint staff from the Institute of Automation of the Chinese language Academy of Sciences, and Amap CV Lab, received first place. The PAI@IAII staff from the Institute of Industrial Synthetic Intelligence on the Chinese language Academy of Sciences, ranked second. The Loop staff from the College of Science and Expertise of China positioned third.

With the World Challenge, AGIBOT hoped to contribute to a more practical and reproducible evaluation framework for embodied AI.

With the World Problem, AGIBOT hoped to contribute to a extra sensible and reproducible analysis framework for embodied AI. | Supply: AGIBOT

AGIBOT releases full-stack toolchain for robotic validation

Past the competitors itself, AGIBOT opened a full-stack toolchain protecting real-world information, simulation analysis, and real-robot testing. The toolchain included the AGIBOT WORLD open-source dataset, Genie Sim 3.0, and the AGIBOT G2 robotic platform, serving to builders validate fashions throughout the trail from coaching to simulation and bodily deployment.

EWMBench and Genie Sim Benchmark supported standardized metrics, automated analysis, and comparable outcomes throughout simulation and bodily testing. They addressed widespread challenges equivalent to inconsistent analysis standards and the hole between simulated efficiency and real-world deployment.

AGIBOT stated it’ll combine the technical and ecosystem assets developed by way of the competitors with its ongoing benchmark improvement and open-source efforts. The corporate additionally plans to launch an internet simulation leaderboard, introduce extra take a look at duties and diversified benchmarks, and help extra complete quantitative analysis of mannequin capabilities.

As well as, AGIBOT stated it’ll proceed to refine its benchmarks and full-stack toolchain, working with international analysis establishments, builders, and trade companions. Its acknowledged aim is to assist embodied AI transfer from particular person algorithmic advances towards techniques that may be deployed and scaled in real-world settings.

In different benchmark information, Fraunhofer IPA final month provided a brand new take a look at benchmark for humanoid robots, and NIST proposed its personal baseline efficiency benchmark for humanoids.



ITE AD for the 2026 RoboBusiness call for speakers
Submit your session concept for the 2026 RoboBusiness

The put up AGIBOT holds World Problem 2026 to see how AI fashions carry out on actual duties appeared first on The Robotic Report.