In today’s column, I closely explore the rapidly emerging advancement of large behavior models (LBMs) that are becoming the go-to for creating AI that runs robots and robotic systems. You might not be familiar with LBMs. No worries. I will be explaining what an LBM is, along with identifying how they end up leveraging large language models (LLMs) and contemporary generative AI.
All told, large behavior models are quite promising and an exciting new combination consisting of LLMs boosted with behavior-oriented specialized capacities. It is a real meal deal, one might cheekily say.
Let’s talk about it.
This analysis of an innovative proposition is part of my ongoing Forbes.com column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).
How To Learn New Tasks
Before we jump into the AI aspects, let’s examine a crucial means of how humans often learn new tasks.
The other day, I wanted to improve my cooking skills, so I watched as my son — he’s a better cook than me, by far – prepared a new dish. The meal was new to him too, but he leaned into his knowledge of other similar dishes to get the job done. Proof was in the pudding as they say, and the final cooked meal was delicious.
In what way did I learn from his display of cooking skills?
You might say that I intently observed his cooking behaviors.
Here’s what I mean. I watched as he carefully selected the cooking utensils needed. He prepared the ingredients with close attention to detail. The stovetop, air fryer, and additional cooking equipment were deftly utilized. I observed as he put in various spices, stirred here and there, and he kept a watchful eye on the temperature and evidence of food items being cooked all the while.
I also asked him numerous questions along the way. As an aside, I must publicly thank him for his patience since having someone pester you with questions while trying to cook a new meal must certainly be exasperating. Please put him up for the best son of the year trophy, thanks.
Anyway, the crux is that via the use of observation and inquisitive questioning, I learned how to cook that particular meal and indubitably picked up other broader cooking-related insights. Notice that I didn’t read a book or study up on the topic per se. Instead, I used an interactive observational technique to garner a new skill and boost my existing prowess.
I’m sure you’ve done something like this throughout your life and in all walks of life, such as learning how to drive a car, paint a house, use a spreadsheet, play a sport, and so on. Maybe we can use the same kind of interactive observational technique to aid in advancing AI. Sure, that makes a lot of keen sense, let’s see how.
Training AI On A Skill Involving More Than Language Alone
Let’s begin by discussing large language models and generative AI.
Hundreds of millions of people are daily using generative AI apps such as the widely popular ChatGPT by OpenAI, along with other well-known major AI wares such as GPT-4o, o1, Anthropic Claude, Google Gemini, Meta Llama, etc. These generative AI apps are based on large language models. Put simply, an LLM is shaped around natural language such as English. The AI is data trained via extensive pattern-matching of how humans write, doing so by scanning online essays, narratives, poems and the like that are found across the Internet, for my in-depth explanation see the link here.
They are models of human language.
They are large in size, which is how they gain a semblance of computational fluency and appear to amazingly mimic human writing. You can enter a prompt and ask questions of the AI. The AI responds with answers that at times are indistinguishable from what a real human might write.
One mode of using generative AI and LLMs is to merely interact with AI in a natural language manner. You write a question; you get a written answer. We are gradually improving the AI so that you can speak to the AI and get spoken answers, somewhat like Siri and Alexa have been for many years, but with much greater fluency.
There is a kind of missing ingredient, in the sense that we haven’t especially ventured into the realm of behaviors. We can exploit the grand value of behaviors including behavioral observation and akin inquisitive inquiries.
Secret Sauce Is Behaviors
Here’s the deal.
Suppose that I’ve set up a cooking robot in my kitchen (I actually do have this, but at my AI lab rather than at home). The robot has a camera and can visually detect me and what’s throughout the kitchen. I can observe the AI. The AI can observe me. The AI is meanwhile making use of generative AI or an LLM under-the-hood. The beauty of this overall setup is that the AI can learn a task from me by observing me and asking me questions, plus, I can possibly learn from the AI by watching what it does and being inquisitive about what the AI is doing.
Note that natural language is not the only element of the underway learning effort. Observations are a vital part too. Since we have an LLM amid this set up, the customary capability of computational fluency is at hand. Nice. The bonus is that the role of observations mightily ups the ante and immensely extends how things are learned.
A catchy moniker for this type of AI is that we shall refer to it as being a large behavior model. This is a logical nomenclature. The AI has a similar semblance of largeness and modeling as does an LLM. The icing on the cake is that the AI can make observations and be observed and undertake and advance in training via behaviors.
Just as an aside, the overall definition and notion of LBMs is still being bandied around. Some AI researchers refer to LBMs or large behavior models but might mean something different from the way I am depicting LBMs. That’s fine. Just know that the realm of LBMs is in flux and a diverse range of research endeavors are underway. Progress is happening and at some point, there will be crisper delineations of what large behavior models consist of in a globally standardized and official way.
Sample Dialogue With LBM For Cooking Demo
A sample dialogue of interacting with an LBM might be helpful to illustrate how things can proceed.
You were earlier given a heads-up that I’m somewhat into cooking, well, not proficiently, but you get the idea. Let’s go ahead and continue the cooking theme and see what an LBM can do. Make sure to keep in mind that this same type of conversation can occur in other domains, such as a robot that works on car engines, a robot that cleans windows, a robot that guards your yard, and as many other scenarios as can be imagined.
I initiated my cooking robot and opted to have the AI prepare me a meal. Yes, in a sense, my son is being replaced by a robot, but only for cooking purposes and only when he’s out-of-town. To be abundantly clear, my son is still my son, and, by gosh, no robot can ever overtake that resolute precept.
Okay, here we go.
- My entered prompt: “Let’s prepare a stir-fry. Can you help chop the vegetables?”
- LBM generated response: “Yes. I’ve observed your preferred chopping style from previous cooking sessions. Would you like me to use a rough or fine chop this time?”
- My entered prompt: “Go with a rough chop, please.”
- LBM generated response: “Understood. I’ll start with the carrots. Based on my multi-modal data, I estimate they should be about 1/2 inch thick for your usual texture preferences. Does that work?”
- My entered prompt: “That’s good. And let’s keep it consistent for the zucchini and bell peppers too.”
- LBM generated response: “Got it! I’ll adjust my pressure and speed to match the density of each vegetable. While I chop, I will also heat the pan for the stir-fry. Pan is preheating to medium-high. Would you like me to add oil now or wait until the vegetables are chopped?”
- My entered prompt: “Add it now, thanks.”
- LBM generated response: “Adding oil. I’ve learned from our past sessions that you like the vegetables to have a bit of sear. I’ll keep the heat steady and let you know when the pan reaches optimal temperature. I’m continuously learning from our sessions to refine my techniques and better assist you.”
The dialogue goes on for quite a while, but I think the above excerpt is sufficient to showcase some key points about LBMs that I’ll cover next.
Notable Characteristics Of LBMs
Time to do some unpacking of what just happened.
I’m guessing you instantly recognized the interactive nature of the dialogue as being relatively representative of what happens when you converse with generative AI and LLMs. The discussion was easy going. I didn’t have to stipulate precise commands or use some quirky specialized vocabulary.
The AI and I were using everyday natural language.
I mention this noteworthy point because programming of robots has historically entailed the use of arcane coding and specialized computer programming skills. A software engineer would laboriously write lines of code to program a robot. The robot would be narrowly capable and usually respond to only a limited set of words or commands.
Fortunately, we are gradually connecting generative AI to robots, which I detail at the link here. This makes the use of robots and their said-to-be programming a lot simpler. Happy face. But this isn’t a free lunch. There are lots of potential problems and troubles afoot. Big sad face.
We are gradually giving generative AI the opportunity to physically do things in the real world.
That is both exciting and unnerving.
Suppose a robot that is roaming around your yard as a guard dog goes awry due to the generative AI encountering a so-called AI hallucination, see my coverage of such AI confabulations at the link here. All kinds of errors and AI-related issues can arise. I’m not suggesting we avoid connecting generative AI to robots. That’s the wave of the future. Don’t think you can stop this progress. I am instead emphasizing that we need to do so mindfully, cautiously, and must weigh the ethical and legal ramifications. Period, end of story.
Moving on, another aspect of the interaction involved multi-modal data. You probably are using generative AI that is based on a single mode of data, such as only conversing with you via text. Or maybe you are using an image generator that takes text and produces a nifty picture for you. I’ve been touting that we are increasingly heading toward multi-modal generative AI, see my predictions at the link here. This includes text-to-text, text-to-images, image-to-text, text-to-audio, audio-to-text, text-to-video, video-to-text, and otherwise multi-modal in the type of content being utilized.
With LBMs, usually the AI has been data-trained in a multi-modal fashion. This contrasts with many conventional generative AI that are pretty much trained on one or two modes of data. Even if they employ multi-modes of the data, they are often doing so in a separate way and not in a fully integrated manner. LBMs gain their strengths by using multi-modal that is well-integrated, or some say the modes are fused with each other (this is somewhat like AI-based self-driving cars and multi-sensor data fusion or MSDF, see my explanation at the link here).
Behaviors Are A Prime Consideration
During my interaction with the cooking robot, you might have subtly detected that the AI kept saying that I had been previously observed while cooking. For example, my chopping style had already been observed and the AI was data-trained on how I like to chop vegetables.
The LBM then asked me if this time I wanted the AI to copy my rough chop type or my fine chop style. All in all, the keystone is that based on observations, the LBM was able to mimic my cooking regimen. I hadn’t explicitly instructed or taught the LBM how to chop vegetables, and instead I merely showed the LBM via my efforts of chopping vegetables. It was based on behavior and observations.
This illustrates that LBMs are devised to go above and beyond a natural language approach and encompass behaviors too.
Wow, think of the possibilities.
I don’t want to seem like a gloomy Gus, but this once again has an upside plus a knotty downside.
What if the AI observed me chopping vegetables and while I was doing so, I inadvertently dropped the knife? Would the AI be data trained that each time that vegetables are chopped, the knife is supposed to be dropped? That’s a real possibility of what the computational mimicry might consist of.
I doubt that any adult would make that copycat mistake. Why? Partially due to common sense. It is worrisome that we do not yet have AI that somehow encompasses common sense, see my analysis at the link here, and yet we are connecting AI to robots that move around in the physical world. For the moment, other programmatic and data training guardrails will need to serve in that safety related role.
LBM Interest And Advancement Is Expanding Rapidly
A few additional comments and then I’ll do a quick wrap-up.
An AI research project that initially helped put LBM into the limelight was entitled “TRI’s Robots Learn New Manipulation Skills in an Afternoon. Here’s How.” by Siyuan Feng, Ben Burchfiel, Toffee Albina, and Russ Tedrake, Medium, September 14, 2023, which made these salient points (excerpts):
- “Most real-world tasks can be solved in many different ways. When picking up a cup, for example, a person might grab it from the top, the side, or even the bottom. This phenomenon, behavioral multimodality, has historically been very difficult for behavior learning methods to cope with, despite its ubiquity in normal human behavior.”
- “Currently, robots are meticulously programmed to accomplish tasks, with humans explicitly anticipating edge cases and instructing the robot how to recover from mistakes.”
- “This can’t scale to the complexity required for future, more capable, robots operating in the wild.”
- “Existing Large Language Models possess the powerful ability to compose concepts in novel ways and learn from single examples. The next big milestone is the creation of equivalently powerful Large Behavior Models that fuse this semantic capability with a high level of physical intelligence and creativity.”
- “These models will be critical for general-purpose robots that are able to richly engage with the world around them and spontaneously create new dexterous behaviors when needed.”
The advent of LBMs is still going strong and gaining daily traction.
Plenty of opportunities exist in this burgeoning realm. Large behavior models are only in their infancy. The growth is going to be astronomical. We must first though iron out the kinks and resolve very tough problems.
I would stridently advise AI researchers that are seeking grand challenges to give LBMs a good strong look. How can we ensure that the AI suitably identifies the right behavior? What can be done to prevent mistakes in behavioral copycatting? Are there guardrails that will on the one hand stop calamities but at the same time not unduly constrain or limit what the LBM can accomplish? Do we need new AI-related laws that will suitably govern the design, development, fielding, and use of large behavior models?
From a technological perspective, adaptability is a huge keyword for the future of LBMs.
Speaking of adaptability, you might know of this famous quote by Charles Darwin: “The most important factor in survival is neither intelligence nor strength but adaptability.” Making AI and especially LBMs adaptable is crucial. Let’s do so intelligently, carefully, and with assurance.