Why machines dream of spiders with 15 legs

27 November 2018

Richard Gray

Features correspondent•@chalkmark

DeepMind The weird way machines see the world (Credit: DeepMind)

Look carefully at each of the following images – none of them show real things. They have all been dreamed up by a machine. And they provide insights into how our own brains work.

DeepMind Advances in artificial intelligence are allowing algorithms to be trained to not only recognise objects in the world around them, but draw them too (Credit: DeepMind) — Advances in artificial intelligence are allowing algorithms to be trained to not only recognise objects in the world around them, but draw them too (Credit: DeepMind)

Advances in artificial intelligence are allowing algorithms to be trained to not only recognise objects in the world around them, but draw them too. Much like a human artist, they can create images based on what they have seen.

But rather than building up a picture with brush strokes, they do it pixel by pixel until they produce something of photographic quality. The resulting images look real at first sight, but closer inspection reveals something is very wrong with each big cat in these pictures.

Algorithms like BigGan can reproduce textures but don't know that spiders should only have eight legs (Credit: DeepMind)

DeepMind Algorithms like BigGan can reproduce textures but don't know that spiders should only have eight legs (Credit: DeepMind) — Algorithms like BigGan can reproduce textures but don't know that spiders should only have eight legs (Credit: DeepMind)

How do we know that spiders have eight legs, or that cars need wheels that all point in the same direction in order to work properly? The answer lies in the information we begin to absorb about the world from an early age. The networks of neurons in our brains are shaped by the information sent to them by our eyes and our other senses as we experience our lives.

Machine learning algorithms are trained in a similar way – they are shown many, many examples and slowly, over time, learn to recognise them. The main difference is that while our brains can learn from just a few examples, an AI requires millions before it starts to recognise patterns.

The cars dreamed up by BigGan often have wheels in the wrong place and body shapes that seem to defy physics (Credit: DeepMind)

DeepMind The cars dreamed up by BigGan often have wheels in the wrong place and body shapes that seem to defy physics (Credit: DeepMind) — The cars dreamed up by BigGan often have wheels in the wrong place and body shapes that seem to defy physics (Credit: DeepMind)

These images were generated by an AI algorithm called BigGan, developed by researchers at DeepMind, the Alphabet-owned AI company. It is a type of machine learning approach that uses systems known as Generative Adversarial Networks, or Gans.

These are algorithms that work in opposition to each other – one trained to produce images of objects and another that looks at what it has produced to spot the differences between the AI-generated pictures and those of real life objects. This network feeds back on itself, continually improving.

“Over time, the model learns what these features are and how these features are composed,” says Andrew Brock, a computer scientist at Heriot-Watt University who helped to develop BigGan during a placement with DeepMind.

But as we can see here, while it recognises that cars and school buses should have wheels and certain shapes of bodywork, it doesn’t quite understand how they fit together.

Would you want to get on board an aircraft or boat that looked like this (Credit: DeepMind)

DeepMind Would you want to get on board an aircraft or boat that looked like this (Credit: DeepMind) — Would you want to get on board an aircraft or boat that looked like this (Credit: DeepMind)

Few people would want to get on board an aircraft or a boat that looks like it defies the laws of physics.

The BigGan algorithm often leaves out crucial details like eyes and beaks (Credit: DeepMind)

DeepMind The BigGan algorithm often leaves out crucial details like eyes and beaks (Credit: DeepMind) — The BigGan algorithm often leaves out crucial details like eyes and beaks (Credit: DeepMind)

All this training can help the AI to learn what features make an image compared to a square of random pixels. These features can be textures like fur or feathers, structures like a face, or building, or photographic elements like lighting or reflections.

“Studying generative image models in isolation helps us tease out what they are currently good at understanding, and what they struggle with,” says Brock. “Two obvious examples with BigGan are that it’s good at generating textures like fur, grass and the sky, but it struggles with counting, hence spiders with too many legs.”

This might be why at first glance the images look right, but on closer inspection we can see that something occurs too often, or is missing entirely.

Elephants appear with too many legs, or no trunk (Credit: DeepMind)

DeepMind Elephants appear with too many legs, or no trunk (Credit: DeepMind) — Elephants appear with too many legs, or no trunk (Credit: DeepMind)

“There’s nothing in the BigGan algorithm that explicitly learns to count,” says Brock. “So, it doesn’t really have the capacity to say ‘there’s too many legs’ or ‘there’s not enough toes.’”

The houses it creates have a dystopian feel to them (Credit: DeepMind)

DeepMind The houses it creates have a dystopian feel to them (Credit: DeepMind) — The houses it creates have a dystopian feel to them (Credit: DeepMind)

Where Gans appear to fall down is their understanding how things can vary when viewed from different angles, for example. The human brain is very good at this task: We can often imagine what might lie behind a newspaper behind held by someone on a train, for example, or roughly what shape a house might be if we are presented with one view of it.

Machines are not so good at this. Instead, they rely upon the data they are trained on and variations in the angle or viewpoint can confuse it. To overcome this, the algorithm needs many more examples to learn. But researchers at DeepMind are already developing systems that can do some of this mental ‘filling in’ to help them produce more realistic images.

The dogs look realistic but all have something out of place (Credit: DeepMind)

DeepMind The dogs look realistic but all have something out of place (Credit: DeepMind) — The dogs look realistic but all have something out of place (Credit: DeepMind)

Ask a four-year-old child to draw a dog and she’ll probably give you something that doesn’t look anything like man’s best friend. What it would have is the right number of legs, ears, eyes and a nose.

AI algorithms, by comparison, can produce dogs that look very like the real thing, but should their eyes really be that high up on their face (top left) or should they have that many legs (bottom left)? Sometimes it decides to give them none of these things at all and concentrates on what it can do really well – fur.

These poor dogs are missing eyes and legs (Credit: DeepMind)

DeepMind These poor dogs are missing eyes and legs (Credit: DeepMind) — These poor dogs are missing eyes and legs (Credit: DeepMind)

Pictures of Queen's guards and swimmers look terrifying (Credit: DeepMind)

DeepMind Pictures of Queen's guards and swimmers look terrifying (Credit: DeepMind) — Pictures of Queen's guards and swimmers look terrifying (Credit: DeepMind)

When it comes to replicating images of humans, AI can have mixed results. These BigGan images of of the Queen's Guard look far from the well-polished facade of the real thing while the swimmers look like they have surfaced from some sort of surreal nightmare.

According to Jeff Donahue and Karen Simonyan, who worked with Brock on BigGan, this is due to the high variation that exists between people in the first place. We all look different, so the algorithm gets confused about what a typical human should look like.

“BigGan needs a lot more images than there are in the dataset we trained it on for the paper,” they say. But there are other Gans that are already producing photo-realistic pictures of entirely fictitious people (see “g is for…” in our recent a-z of AI).

BigGan can create beautiful looking soap bubbles but as it has never handled a wine glass or coffee cup, it does less well with this (Credit: DeepMind)

DeepMind BigGan can create beautiful looking soap bubbles but as it has never handled a wine glass or coffee cup, it does less well with this (Credit: DeepMind) — BigGan can create beautiful looking soap bubbles but as it has never handled a wine glass or coffee cup, it does less well with this (Credit: DeepMind)

Capturing something like the beauty of a soap bubble is something most of us would struggle with, but the AI does it almost perfectly here. Yet, never having drunk wine from a glass or a coffee cup, it struggles to know why the designs it comes up with are not ideal.

“This model is only trained on still images, so it doesn’t learn anything about how objects move or interact with each other or the environment – it only sees snapshots,” says Brock. “It also doesn’t have any sort of decision making or interaction capabilities, so it can’t reach out and touch something in the image to see how the image would change."

Having never touched or handled objects, the AI does not learn in the same way as we do and has no concept how things work (Credit: DeepMind)

DeepMind Having never touched or handled objects, the AI does not learn in the same way as we do and has no concept how things work (Credit: DeepMind) — Having never touched or handled objects, the AI does not learn in the same way as we do and has no concept how things work (Credit: DeepMind)

"Exploration and observation is a big part of how we learn, but it’s not something that’s incorporated into this particular algorithm,” says Brock.

These look like turtles, but on closer examination they are not (Credit: DeepMind)

DeepMind These look like turtles, but on closer examination they are not (Credit: DeepMind) — These look like turtles, but on closer examination they are not (Credit: DeepMind)

Producing near picture-perfect images is a useful parlour trick (although these turtles are far from perfect), but why train a machine to do this in the first place?

“Looking forward, we expect our findings to be useful in the development of more complex intelligent systems,” says Brock. “For example, we would like to create models which understand the rich structure that underlies our complex visual world.”

This would enable machines to start making more sense of the baffling, data-rich environment we live in. Our brains have an astonishing ability to make sense of this, picking out what it needs and discarding useless details.

“Learning to generate realistic images is one way to do that,” adds Brock. “In order to draw something, one must, on some level, understand it.”

Cats have odd features like their ears or eyes in BigGan's world (Credit: DeepMind)

DeepMind Cats have odd features like their ears or eyes in BigGan's world (Credit: DeepMind) — Cats have odd features like their ears or eyes in BigGan's world (Credit: DeepMind)

With ears cocked at strange angles and eyes woefully misplaced, these cats are far from being unrecognisable – but they also look utterly wrong.

Humans can almost instantly recognise a picture of a cat or dog. Images of these pets are among the most shared things on the internet. So why do machines seem to see them differently from us?

Shopping trolleys in BigGan's world look a mess while teapots have an air of the improbable (Credit: DeepMind)

DeepMind Shopping trolleys in BigGan's world look a mess while teapots have an air of the improbable (Credit: DeepMind) — Shopping trolleys in BigGan's world look a mess while teapots have an air of the improbable (Credit: DeepMind)

“The kinds of deep learning neural networks used by today’s engineers operate nothing like the brain,” says Simon Stringer, head of the laboratory for theoretical neuroscience and artificial intelligence at the University of Oxford.

The human brain, he explains, is able to understand the relationships between visual features at every scale in a scene. It can tell that an edge, for example, is part of one object, while another edge belongs to something else. An algorithm like BigGan does not understand this, and so it creates shopping trolleys that look like tangled messes and teapots that no-one would be able to drink from.

“This obviously underpins the ability of the brain to interpret and make sense of complex visual scenes,” says Stringer.

Solving this problem in AI could lead to smarter machines that can think far more like humans do.

Artificial intelligence