Deep Learning: Creating graphics from computer vision
- posted in: Imaging Technology
The Painting Fool is a deep learning computer program created with the aim of ‘being taken seriously as an artist’. The question is, can a program following instructions given to it by a human ever be considered truly creative? After all, surprisingly complex behaviours can come from the simplest of rules. One such example is the flocking of Starlings, where there is no overall control of the group but instead a ‘hivemind’ is in operation – each bird keeps track of its 6 or 7 closest neighbours in the flock, and changes direction synchronously. It is easy to imagine how in a similar way, a few simple rules in a software program can create unforeseen images. This is known as machine generated imaging, and is not to be confused with computer generated imaging (CGI).
As far as art goes, if you ask a human to draw something from memory, they will have an attempt. Admittedly, the result will vary depending on the person’s artistic abilities and familiarity with the subject. Until recently, it was believed that the ability to form concepts of ‘things’ – objects, environments, lighting, weather, etc. – was the preserve of the human mind (and to an extent, the minds of other creatures in the animal kingdom).
Deep Learning Neural Networks
The field of computer vision has exploded in the last 5 years, facilitated by deep learning. Neural networks have made artificial intelligence better than humans at recognising and categorising images. The internet contains all the source material a neural network could ever need, and processing power is steadily increasing at an exponential rate. Moore’s Law is a prediction that computing power will double every 2 years, and has held fast for decades while showing no signs of stopping.
We now have computers 8 orders of magnitude (around 100 million times!) more powerful than when deep learning with neural networks was first proposed back in the 1960s, with huge growth thanks to the massively parallel nature of GPU computing. Standard microprocessors (CPUs) are usually dual or quad core, where as GPUs are graphics chips that often contain over a thousand processing cores. This means that (for certain tasks) a GPU can perform many more instructions within the same time a CPU can. This vast amount of computing power allows today’s neural networks to look through hundreds of thousands of images in order to learn about the visual world.
Once a computer has an idea of what something looks like, it can reproduce it. These dancers were not copied from a photograph, but instead ‘drawn’ based on what the software program’s idea of dancers were from looking at thousands of images of people dancing.
The 3D World
Humans and machines can both extrapolate the 3rd dimension from a 2d image. If you ask a sculptor to model an object using a picture as a reference, they would not have a problem. Just as the person who drew an object or scene from memory created a 2d drawing of a 3d concept, the reverse is possible. 2d and 3d are interchangeable in the mind’s understanding of the world. That’s why watching the 2d or 3d versions of the same film aren’t very different in terms of the overall experience, in fact they’re near-identical. Visual depth is something humans create from multiple 2D images. This can either be from two viewpoints simultaneously in the case of binocular vision, or many in viewpoints in rapid succession – as is the case with film (obviously, other depth cues such as perspective and focus also play a part).
One area in which computer graphics excels is 3D graphics. Where a sculptor may mould an object from clay, a machine AI would do the same with polygons or voxels (volumetric pixels). The AI can then improve the digital model using textures, normal maps, diffuse and specular material simulation, lighting, etc. The object can be automatically animated using the machine’s understanding of how the object behaves in a physically simulated environment. A virtual camera can be placed in the scene to render an image that we humans can recognise. This is all possible to automate, and may not take technology much more advanced technology than our current deep learning neural networks.
The Future
While computers have recently been taught to recognise stylistic themes, humans can go beyond that and learn how to represent mood and feel. These are things that master artists and photographers take years to perfect. The awesome thing is that advanced concepts like the subtleties of how to convey emotion will soon be learned by computers in a matter of a few days – or even hours – if enough processing power and source material is available. Skip ahead to a few years’ time, and our current state-of-the-art AI will seem as antiquated as the room-sized computers NASA used to put a man on the moon. Your phone currently has more processing power than they did. Imagine a future where a book can be brought to life much in the same way as a film adaptation, only automated.
Soon, any work of fiction can have every detail fully realised in its own fleshed-out universe – for movies, games, VR experiences, etc. This is the future we are headed for, and I can’t be the only one who’s excited about that.