Categories: Health

AI imitates toddler learning to unlock human cognition

Summary: A new AI model, based on the PV-RNN framework, learns to generalize language and actions in a manner similar to toddlers by integrating vision, proprioception and linguistic instructions. Unlike large language models (LLMs) that rely on large data sets, this system uses embedded interactions to achieve compositionality while requiring less data and computing power.

Researchers have found AI’s modular, seamless design useful for studying how humans learn cognitive skills such as combining language and actions. The model offers insight into developmental neuroscience and could lead to safer and more ethical AI by basing behavioral learning and transparent decision-making processes.

Key facts:

  • Learning like that of toddlers: The AI ​​learns composition by integrating sensory input, language and actions.
  • Transparent design: Its architecture allows researchers to study internal decision-making pathways.
  • Practical advantages: Requires less data than LLMs and highlights the ethical and embodied development of AI.

Source: OIST

We humans excel at generalization. If you taught a toddler to identify the color red by showing him a red ball, a red truck, and a red rose, he will likely correctly identify the color of a tomato, even if it’s the first time. he sees one.

An important step in learning to generalize is compositionality: the ability to compose and decompose a whole into reusable parts, like the redness of an object. How we achieve this ability is a key question in developmental neuroscience – and in AI research.

Combined with limited working memory and attention span, AI mirrors human cognitive constraints, requiring it to process inputs and update its predictions in sequence rather than all at once as do LLM. Credit: Neuroscience News

The first neural networks, which later evolved into the large language models (LLMs) that are revolutionizing our society, were developed to study how information is processed in our brains.

Ironically, as these models became more sophisticated, the information processing pathways within also became increasingly opaque, with some models now having billions of tunable parameters.

But now, members of the Cognitive Neurorobotics Research Unit at the Okinawa Institute of Science and Technology (OIST) have created an embodied intelligence model with a new architecture that allows researchers to access the different states internals of the neural network and which seems to learn how to generalize in the same way as children.

Their findings have now been published in Scientific robotics.

“This paper demonstrates a possible mechanism for neural networks to achieve compositionality,” explains Dr. Prasanna Vijayaraghavan, first author of the study.

“Our model achieves this not by inference based on large datasets, but by combining language with vision, proprioception, working memory and attention – just like toddlers do.”

LLMs, based on a transformer network architecture, learn the statistical relationship between words that appear in sentences from large amounts of textual data. Essentially, they have access to every word in every context imaginable, and from this understanding they predict the most likely response to a given prompt.

On the other hand, the new model is based on a PV-RNN (Predictive coding Inspired, Variational Recurrent Neural Network) framework, trained by embodied interactions integrating three simultaneous inputs linked to different senses: vision, with a video of an arm of robot moving colorful blocks. ; proprioception, the sensation of the movement of our limbs, with the joint angles of the robot arm as it moves; and a language instruction like “put red on blue.”

The model is then responsible for generating either a visual prediction and corresponding joint angles in response to a linguistic instruction, or a linguistic instruction in response to sensory input.

The system is inspired by the free energy principle, which suggests that our brains continually predict sensory inputs based on past experiences and take steps to minimize the difference between prediction and observation.

This difference, quantified as “free energy,” is a measure of uncertainty, and by minimizing free energy, our brain maintains a steady state.

Combined with limited working memory and attention span, AI mirrors human cognitive constraints, requiring it to process inputs and update its predictions in sequence rather than all at once as do LLM.

By studying the flow of information within the model, researchers can better understand how it integrates different inputs to generate its simulated actions.

It is through this modular architecture that researchers have learned more about how infants can develop compositionality. As Dr. Vijayaraghavan recounts: “We found that the more the model is exposed to the same word in different contexts, the better it learns that word.

This reflects real life, where a toddler will learn the concept of the color red much faster if they interact with various red objects in different ways, rather than just pushing a red truck repeatedly.

“Our model requires a significantly smaller training set and much less computing power to achieve compositionality. It makes more mistakes than LLMs, but it makes mistakes similar to what humans make,” says Dr. Vijayaraghavan.

It is precisely this feature that makes the model so useful to cognitive scientists, as well as AI researchers trying to map the decision-making processes of their models.

Although it serves a different purpose than currently used LLMs and therefore cannot be meaningfully compared in terms of effectiveness, PV-RNN nevertheless shows how neural networks can be organized to provide greater insight. of their information processing pathways: its relatively superficial architecture allows researchers to visualize the latent state of the network – the evolving internal representation of information retained from the past and used in current predictions.

The model also addresses the problem of stimulus poverty, which posits that the linguistic input available to children is insufficient to explain their rapid language acquisition.

Despite a very limited data set, especially compared to LLMs, the model still achieves compositionality, suggesting that embedding language in behavior may be an important catalyst for the impressive language learning ability of children.

This embodied learning could further pave the way for safer and more ethical AI in the future, both by improving transparency and being able to better understand the effects of its actions.

Learning the word “suffering” from a purely linguistic perspective, as LLMs do, would carry less emotional weight than for a PV-RNN, which learns meaning through embodied experiences as well as language.

“We are continuing our work to improve the capabilities of this model and using it to explore various areas of developmental neuroscience.

“We are excited to see what future insights into cognitive development and language learning processes we can discover,” says Professor Jun Tani, head of the research unit and lead author of the paper.

How we acquire the intelligence necessary to create our society is one of the great questions of science. Although PV-RNN did not answer this, it opens new avenues of research into how information is processed in our brains.

“By observing how the model learns to combine language and action,” summarizes Dr. Vijayaraghavan, “we gain insight into the fundamental processes that underlie human cognition.

“This has already taught us a lot about compositionality in language acquisition, and it presents the potential for more efficient, transparent and secure models.” »

About this news on research in AI and learning

Author: Jun Tani
Source: OIST
Contact: Jun Tani – OIST
Picture: Image is credited to Neuroscience News

Original research: Closed access.
“Development of compositionality through interactive learning of robot language and action” by Prasanna Vijayaraghavan et al. Scientific robotics


Abstract

Development of compositionality through interactive learning of robot language and action

Humans excel at applying learned behavior to unlearned situations. A crucial element of this generalization behavior is our ability to compose/decompose a whole into reusable parts, an attribute known as composition.

One of the fundamental questions in robotics concerns this characteristic: how to develop linguistic compositionality concomitantly with sensorimotor skills through associative learning, especially when individuals learn only partial linguistic compositions and their corresponding sensorimotor patterns?

To answer this question, we propose a brain-inspired neural network model that integrates vision, proprioception, and language into a framework of predictive coding and active inference based on the free energy principle.

The effectiveness and capabilities of this model were evaluated through various simulation experiments conducted with a robotic arm.

Our results show that generalization in learning unlearned verb-noun compositions is significantly improved when variations in task composition learning are increased.

We attribute this to self-organized compositional structures in linguistic latent state space that are substantially influenced by sensorimotor learning.

Ablation studies show that visual attention and working memory are essential for accurately generating visuomotor sequences to achieve linguistically represented goals.

This knowledge advances our understanding of the mechanisms underlying the development of compositionality through the interactions of linguistic and sensorimotor experience.

newsnetdaily

Share
Published by
newsnetdaily

Recent Posts

Brutal, “courageous” and relentless: the North Korean troops fighting Ukraine

North KoreaThe soldiers are implacable, almost fanatical, faced with death. They are determined and capable…

3 days ago

Dogecoin Whale Dayt, spark 17% crash: are the bears here for Doge?

The Dogecoin whales have sold another important part of their assets in the last 24…

3 days ago

What Ryan Day said about Chip Kelly leaving Ohio State Football after a season

Columbus, Ohio - The news from Chip Kelly on Sunday leave Ohio State Football to…

3 days ago

Lip reader decodes what Kanye West said to his wife Bianca Censori during the Grammys red carpet appearance 2025

Kanye West and his wife Bianca Censori the exchange during their scandalous appearance on the…

3 days ago

Faced with Trump’s threats to Greenland, the chief of Denmark asks for the support of his EU partners

Brussels (AP) - The Prime Minister of Denmark insisted on Monday that Greenland is not…

3 days ago

The crews recover more victims as efforts continue after the deadly collision of helicopter

Washington (7news) - The United States crews and rescuers have recovered more victims of the…

3 days ago