Teaching algorithms to mimic humans typically requires hundreds or thousands of examples. But a new AI from Google DeepMind can pick up new skills from human demonstrators on the fly.

One of humanity’s greatest tricks is our ability to acquire knowledge rapidly and efficiently from each other. This kind of social learning, often referred to as cultural transmission, is what allows us to show a colleague how to use a new tool or teach our children nursery rhymes.

It’s no surprise that researchers have tried to replicate the process in machines. Imitation learning, in which AI watches a human complete a task and then tries to mimic their behavior, has long been a popular approach for training robots. But even today’s most advanced deep learning algorithms typically need to see many examples before they can successfully copy their trainers.

When humans learn through imitation, they can often pick up new tasks after just a handful of demonstrations. Now, Google DeepMind researchers have taken a step toward rapid social learning in AI with agents that learn to navigate a virtual world from humans in real time.

“Our agents succeed at real-time imitation of a human in novel contexts without using any pre-collected human data,” the researchers write in a paper in Nature Communications. We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission.”

The researchers trained their agents in a specially designed simulator called GoalCycle3D. The simulator uses an algorithm to generate an almost endless number of different environments based on rules about how the simulation should operate and what aspects of it should vary.

In each environment, small blob-like AI agents must navigate uneven terrain and various obstacles to pass through a series of colored spheres in a specific order. The bumpiness of the terrain, the density of obstacles, and the configuration of the spheres varies between environments.

The agents are trained to navigate using reinforcement learning. They earn a reward for passing through the spheres in the correct order and use this signal to improve their performance over many trials. But in addition, the environments also feature an expert agent—which is either hard-coded or controlled by a human—that already knows the correct route through the course.

Over many training runs, the AI agents learn not only the fundamentals of how the environments operate, but also that the quickest way to solve each problem is to imitate the expert. To ensure the agents were learning to imitate rather than just memorizing the courses, the team trained them on one set of environments and then tested them on another. Crucially, after training, the team showed that their agents could imitate an expert and continue to follow the route even without the expert.

This required a few tweaks to standard reinforcement learning approaches.

The researchers made the algorithm focus on the expert by having it predict the location of the other agent. They also gave it a memory module. During training, the expert would drop in and out of environments, forcing the agent to memorize its actions for when it was no longer present. The AI also trained on a broad set of environments, which ensured it saw a wide range of possible tasks.

It might be difficult to translate the approach to more practical domains though. A key limitation is that when the researchers tested if the AI could learn from human demonstrations, the expert agent was controlled by on person during all training runs. That makes it hard to know whether the agents could learn from a variety of people.

More pressingly, the ability to randomly alter the training environment would be difficult to recreate in the real world. And the underlying task was simple, requiring no fine motor control and occurring in highly controlled virtual environments.

Still, social learning progress in AI is welcome. If we’re to live in a world with intelligent machines, finding efficient and intuitive ways to share our experience and expertise with them will be crucial.

Image Credit: Juliana e Mariana Amorim / Unsplash

By