Humanoid robot ends up in a mountain range while picking up trash

A new demonstration video from Flexion Robotics shows how its new “brain” powers a humanoid robot to navigate outdoor terrain and perform cleaning tasks independently.

In a clip shot in a forest, a robot identifies, picks up, and throws trash into a trash can without any prior training. This represents a new step in real-world robot autonomy.

The Switzerland-based company’s technology is a reinforcement learning and simulation-to-reality platform designed to enhance humanoid robots across a variety of designs and tasks.

“Just as large-scale language models automate tasks involving reasoning, writing, and creativity, robots should be able to navigate and adapt themselves. We’re building for that future,” the company said in a statement.

Expanding real intelligence

Collecting real-world data in all situations to train robots is simply not scalable. The world is too complex and unpredictable. Add new edge cases to every object, surface, and motion. Rather than manually teaching a robot how to respond to anything it might encounter, a better approach is to give it a reliable set of core skills and let a high-level model decide when and how to use them.

According to Frection, recent research supports this direction, showing that language models can subdivide tasks by combining what they know from language with visual understanding. Meanwhile, advances from simulation to reality have proven that low-level skills can be safely and efficiently learned in simulation and transferred to physical robots. Based on these ideas, Flexion has developed a three-tier system.

At the top, a large-scale or vision language model handles task planning and common sense reasoning. Divide your goals into steps, choose the right tools and understand the daily rules. The motion generator then uses perception and task instructions to suggest short, safe actions, such as reaching for an object or moving through space. Finally, the reinforcement learning controller reliably performs these actions regardless of the environment or type of robot.

The company says this modular approach increases versatility and avoids vulnerable end-to-end systems. You can also make heavy use of simulation and synthetic data, adding real data only when needed to fill gaps, ensuring training is scalable, efficient, and physics-based.

Autonomy of next generation robots

A recently published video shows Flexion testing its new autonomy framework on long-term tasks. This means identifying scattered toys, cleaning the space, picking them up and putting them in the basket. Although this task seems simple, it challenges nearly every layer of robot intelligence, from perception and locomotion to manipulation and decision-making.

This configuration does not use manual controls or scripted state machines. Instead, the vision language model acts as a high-level coordinator, deciding which skills the robot uses and when. The system is built as a module hierarchy. The foundation is reinforcement learning motor skills trained entirely in simulation. These include uneven terrain walking and whole-body control, allowing the robot to balance, recover from slips, and grasp objects at different heights.

Above this layer are reusable high-level skills such as navigation and object picking. They combine multiple motor primitives to enable reliable movement and manipulation in the real world. At the top, the VLM agent interprets the user’s instructions, breaks them down into steps, and triggers the appropriate skill. Use perceptual models to identify objects, track them in 3D, and sequence actions through tool calls.

The hardware performs low-latency control with the Jetson Orin module, and cloud-based inference handles the planning. In future versions, we aim to run everything onboard and work completely independently.

According to the company, this is the first step in Flexion’s modular autonomous stack. Next, the company plans integrated whole-body control, diffusion-based motion generators for rich interactions, improved 3D spatial reasoning, and end-to-end agent fine-tuning. They are trained both on real robots and in simulations to increase versatility and long-term autonomy.

Source link