This is not the first time that Figure.AI, a robotics start-up from Sunnyvale, has impressed with its two-legged and two-handed robots. The very first video, in which the robot Figure 01 talks to a human and performs tasks for them, showed where the journey was heading. While this robot was still integrated with a language model from OpenAI at the time, the startup is now demonstrating its own vision-based language and action model called Helix.
The following video, published by Figure CEO Brett Adcock, shows two Figure robots being presented with a few objects that they have never seen before and having to sort them accordingly. There is a fridge, a drawer, a tray and a bowl to choose from. The two robots also hand each other different objects such as cookie packets, a ketchup bottle or an apple.
It is also particularly striking how the two robots seem to communicate with each other in an almost human way by nodding their heads and looking at each other. This behavior is no coincidence. In the presentation of the vision-language-action model (VLA model) Helix, this is mentioned as a built-in design function. Here are some details:
- Control of the entire upper body: Helix is the first VLA that enables continuous control of the entire humanoid upper body at high speed, including the wrists, upper body, head and individual fingers.
- Collaboration with multiple robots: Helix is the first VLA to work simultaneously on two robots, allowing them to solve a common, wide-ranging manipulation task with objects they have never seen before.
- Pick up anything: Helix-equipped figurine robots can now pick up virtually any small household item, including thousands of objects they’ve never encountered before, simply by following prompts in natural language.
- A neural network: Unlike previous approaches, Helix uses a single set of neural network weights to learn all behaviors – picking up and putting down objects, using drawers and refrigerators, and interacting across robots – without task-specific fine-tuning.
- Commercially viable: Helix is the first VLA to run entirely on embedded, low-power GPUs, making it immediately ready for commercial use.
Just like a human, Helix understands language, sees through problems and can grasp any object – all without training or code. In tests, Helix was able to grasp almost any household object.

