Convening the AI-safety and robot-learning communities to establish a shared science — and a shared research agenda — for the interpretability, alignment, and control of robot foundation models.
Robotics is undergoing a paradigm shift — from modular perception–planning–control pipelines to large transformer-based robot foundation models that act directly on the physical world.
Because RFMs inherit the same transformer substrate that frontier AI-safety research has spent years dissecting, the field's tools for interpretability, alignment, and control may already port to robots REF x, y, z.
This workshop brings the two communities into one room to establish the Science of Physical AI Safety — anchored around three questions the field has not yet settled.
The safety case is acute. Failure modes researchers study in LLMs — goal misgeneralization, specification gaming, jailbreaks, deceptive or unintended behavior — become physical harms when the model controls a robot.
The day is built around active, collaborative problem-solving: anchor talks and spotlights put concrete context on the table, then breakout rooms reach grounded consensus — with speakers and authors circulating as mentors. Every participant leaves a contributor to a single authored paper.
The workshop is organized around three questions that the field has not yet answered. Each anchors a talk, a breakout room, and a section of the final paper.
Do AI-safety techniques built for LLMs transfer to robot foundation models?
Does classical robotics safety transfer to robot foundation models?
Are evaluations for RFMs meaningfully different from those for LLMs?
Each anchor speaker frames one of the core questions, putting concrete technical context on the table for the day's collaborative work.
Participants opt into a room aligned to one of the three key questions. Each group works to produce:
Breakout artifacts are aggregated into a single paper that names the field's open research problems and sets a shared research agenda. Every participant is a contributor.
Both calls will open following workshop acceptance — submission portals and deadlines will be announced here. Accepted submissions are spotlighted on the day, and their authors join the breakout rooms as mentors.
Work on interpretability, alignment, and control for robot foundation models. Contributed papers are spotlighted and their authors mentor breakout groups.
Two sentences plus optional media describing an RFM safety failure beyond collision. Selected demos are spotlighted on the day.