2016 / Sep / 16

Deep Learning: Creating graphics from computer vision

posted in: Imaging Technology
/ tags: deep learning, machine generated imaging, neural networks

The Painting Fool is a deep learning computer program created with the aim of ‘being taken seriously as an artist’. The question is, can a program following instructions given to it by a human ever be considered truly creative? After all, surprisingly complex behaviours can come from the simplest of rules. One such example is the flocking of Starlings, where there is no overall control of the group but instead a ‘hivemind’ is in operation – each bird keeps track of its 6 or 7 closest neighbours in the flock, and changes direction synchronously. It is easy to imagine how in a similar way, a few simple rules in a software program can create unforeseen images. This is known as machine generated imaging, and is not to be confused with computer generated imaging (CGI).

“I have been built to exhibit behaviours that might be deemed as skilful, appreciative and imaginative. My work has been exhibited in real and online galleries; the ideas behind my conception have been used to address philosophical notions such as emotion and intentionality in non-human intelligences; and technical papers about the artificial intelligence, machine vision and computer graphics techniques I use have been published in the scientific literature” Picture and quote credit: The Painting Fool

As far as art goes, if you ask a human to draw something from memory, they will have an attempt. Admittedly, the result will vary depending on the person’s artistic abilities and familiarity with the subject. Until recently, it was believed that the ability to form concepts of ‘things’ – objects, environments, lighting, weather, etc. – was the preserve of the human mind (and to an extent, the minds of other creatures in the animal kingdom).

Deep Learning Neural Networks

The field of computer vision has exploded in the last 5 years, facilitated by deep learning. Neural networks have made artificial intelligence better than humans at recognising and categorising images. The internet contains all the source material a neural network could ever need, and processing power is steadily increasing at an exponential rate. Moore’s Law is a prediction that computing power will double every 2 years, and has held fast for decades while showing no signs of stopping.

University of Cambridge deep learning Computer Vision

A large area of research in computer vision has been twinned with the development of self driving cars – through deep learning, software learns to recognise the world around it. This is the first step in it developing concepts and being able to think in a more abstract way. Picture credit: University of Cambridge’s SegNet.

We now have computers 8 orders of magnitude (around 100 million times!) more powerful than when deep learning with neural networks was first proposed back in the 1960s, with huge growth thanks to the massively parallel nature of GPU computing. Standard microprocessors (CPUs) are usually dual or quad core, where as GPUs are graphics chips that often contain over a thousand processing cores. This means that (for certain tasks) a GPU can perform many more instructions within the same time a CPU can. This vast amount of computing power allows today’s neural networks to look through hundreds of thousands of images in order to learn about the visual world.

Once a computer has an idea of what something looks like, it can reproduce it. These dancers were not copied from a photograph, but instead ‘drawn’ based on what the software program’s idea of dancers were from looking at thousands of images of people dancing.

The 3D World

Humans and machines can both extrapolate the 3rd dimension from a 2d image. If you ask a sculptor to model an object using a picture as a reference, they would not have a problem. Just as the person who drew an object or scene from memory created a 2d drawing of a 3d concept, the reverse is possible. 2d and 3d are interchangeable in the mind’s understanding of the world. That’s why watching the 2d or 3d versions of the same film aren’t very different in terms of the overall experience, in fact they’re near-identical. Visual depth is something humans create from multiple 2D images. This can either be from two viewpoints simultaneously in the case of binocular vision, or many in viewpoints in rapid succession – as is the case with film (obviously, other depth cues such as perspective and focus also play a part).

Today’s rendering software can do a pretty good job of simulating the real world. Picture Credit: commons.wikimedia.org

One area in which computer graphics excels is 3D graphics. Where a sculptor may mould an object from clay, a machine AI would do the same with polygons or voxels (volumetric pixels). The AI can then improve the digital model using textures, normal maps, diffuse and specular material simulation, lighting, etc. The object can be automatically animated using the machine’s understanding of how the object behaves in a physically simulated environment. A virtual camera can be placed in the scene to render an image that we humans can recognise. This is all possible to automate, and may not take technology much more advanced technology than our current deep learning neural networks.

The Future

While computers have recently been taught to recognise stylistic themes, humans can go beyond that and learn how to represent mood and feel. These are things that master artists and photographers take years to perfect. The awesome thing is that advanced concepts like the subtleties of how to convey emotion will soon be learned by computers in a matter of a few days – or even hours – if enough processing power and source material is available. Skip ahead to a few years’ time, and our current state-of-the-art AI will seem as antiquated as the room-sized computers NASA used to put a man on the moon. Your phone currently has more processing power than they did. Imagine a future where a book can be brought to life much in the same way as a film adaptation, only automated.

Soon, any work of fiction can have every detail fully realised in its own fleshed-out universe – for movies, games, VR experiences, etc. This is the future we are headed for, and I can’t be the only one who’s excited about that.