Brown College researchers have developed a synthetic intelligence mannequin that may generate motion in robots and animated figures in a lot the identical manner that AI fashions like ChatGPT generate textual content.
A paper describing this work is printed on the arXiv preprint server.
The mannequin, referred to as MotionGlot, allows customers to easily sort an motion—”stroll ahead a couple of steps and take a proper”— and the mannequin can generate correct representations of that movement to command a robotic or animated avatar.
The mannequin’s key advance, in response to the researchers, is its capacity to “translate” movement throughout robotic and determine sorts, from humanoids to quadrupeds and past. That permits the technology of movement for a variety of robotic embodiments and in all types of spatial configurations and contexts.
“We’re treating movement as merely one other language,” mentioned Sudarshan Harithas, a Ph.D. scholar in pc science at Brown, who led the work. “And simply as we are able to translate languages—from English to Chinese language, for instance—we are able to now translate language-based instructions to corresponding actions throughout a number of embodiments. That permits a broad set of recent functions.”
The analysis will probably be offered later this month on the 2025 Worldwide Convention on Robotics and Automation in Atlanta. The work was co-authored by Harithas and his advisor, Srinath Sridhar, an assistant professor of pc science at Brown.
Giant language fashions like ChatGPT generate textual content by a course of referred to as “subsequent token prediction,” which breaks language down right into a collection of tokens, or small chunks, like particular person phrases or characters. Given a single token or a string of tokens, the language mannequin makes a prediction about what the subsequent token is perhaps.
These fashions have been extremely profitable in producing textual content, and researchers have begun utilizing comparable approaches for movement. The thought is to interrupt down the parts of movement—the discrete place of legs through the means of strolling, for instance—into tokens. As soon as the movement is tokenized, fluid actions could be generated by subsequent token prediction.
One problem with this method is that motions for one physique sort can look very completely different for an additional. For instance, when an individual is strolling a canine down the road, the particular person and the canine are each doing one thing referred to as “strolling,” however their precise motions are very completely different. One is upright on two legs; the opposite is on all fours.
Based on Harithas, MotionGlot can translate the which means of strolling from one embodiment to a different. So a consumer commanding a determine to “stroll ahead in a straight line” will get the proper movement output whether or not they occur to be commanding a humanoid determine or a robotic canine.
To coach their mannequin, the researchers used two datasets, every containing hours of annotated movement information. QUAD-LOCO options dog-like quadruped robots performing a wide range of actions together with wealthy textual content describing these actions. An analogous dataset referred to as QUES-CAP comprises actual human motion, together with detailed captions and annotations applicable to every motion.
Utilizing that coaching information, the mannequin reliably generates applicable actions from textual content prompts, even actions it has by no means particularly seen earlier than. In testing, the mannequin was capable of recreate particular directions, like “a robotic walks backwards, turns left and walks ahead,” in addition to extra summary prompts like “a robotic walks fortunately.”
It will possibly even use movement to reply questions. When requested, “Are you able to present me motion in cardio exercise?” the mannequin generates an individual jogging.
“These fashions work finest once they’re skilled on tons and plenty of information,” Sridhar mentioned. “If we might acquire large-scale information, the mannequin could be simply scaled up.”
The mannequin’s present performance and the adaptability throughout embodiments make for promising functions in human-robot collaboration, gaming and digital actuality, and digital animation and video manufacturing, the researchers say. They plan to make the mannequin and its supply code publicly out there so different researchers can use it and increase on it.
Extra info:
Sudarshan Harithas et al, MotionGlot: A Multi-Embodied Movement Technology Mannequin, arXiv (2024). DOI: 10.48550/arxiv.2410.16623
Quotation:
AI mannequin interprets textual content instructions into movement for numerous robots and avatars (2025, Might 8)
retrieved 16 Might 2025
from https://techxplore.com/information/2025-05-ai-motion-kinds-robots.html
This doc is topic to copyright. Other than any honest dealing for the aim of personal examine or analysis, no
half could also be reproduced with out the written permission. The content material is offered for info functions solely.