14/04/2026

Multimodal Large Language Models for Real-Time Situated Reasoning

This work by Giulio Antonio Abbo, Senne Lenaerts and Tony Belpaeme shows that Large Language Models can go beyond just detecting human values in text and images: they can use these to take action, while at the same time reasoning on the preferences, comfort and safety of the people and pets involved… all of this in real-time! As an example, we observed how an LLM would control a simple vacuum cleaning robot.
To achieve this, we combined an LLM with a TurtleBot 4 platform simulating a smart vacuum cleaning robot in a home. The robot periodically rotates and sends images from its camera to the LLM, which evaluates the environment through a vision pipeline, and determines whether it is appropriate to initiate cleaning. The graphical user interface shows the reasoning process. While this is just a toy example, in the future we plan to use LLMs as the reasoning engine controlling the robot's behaviours. The publication ad more information on this research can be found on https://giubots.net/publications/multimodal-large-language/