When people think of the greatest artists who’ve ever lived, they probably think of names like Beethoven or Picasso. No one would ever think of a computer as a great artist. But what if one day, that was indeed the case. Could computers learn to create incredible drawings like the Mona Lisa? Perhaps one day a robot will be capable of composing the next great symphony. Some experts believe this to be the case. In fact, some of the greatest minds in artificial intelligence are diligently working to develop programs that can create drawing and music independently from humans. The use of artificial intelligence in the field of art has even been picked up by tech giants the likes of Google.
The projects that are included in this paper could have drastic implications in our everyday lives. They may also change the way we view art. They also showcase the incredible advancement that has been made in the field of artificial intelligence. Image recognition is not as far as the research goes. Nor is the ability to generate music in the styling of the great artists of our past. Although these topics will be touched upon, we will focus on several more advanced achievements such as text descriptions being turned into images and generating art and music that is totally original. Each of these projects bring something new and innovative to the table and show us exactly how the art space is a great place to further explore applications of artificial intelligence. We will be discussing problems that have been faced in these projects and how they have been overcome. The future of AI looks bright. Let’s look at what the future may hold. In doing this, we may be able to better understand the impact that artificial intelligence can have in an area that is driven by human creativity.
GAN and Its Evolved Forms
Machines must be educated. They learn from instruction. How do we lead machines away from emulating what already exists, and have them create new techniques? “No creative artist will create art today that tries to emulate the Baroque or Impressionist style, or any other traditional style, unless trying to do so ironically” (Elgammal, Liu, Elhoseiny, & Mazzone, 2017). This problem isn’t limited to paintings either. Music can be very structured in some respects, but is also a form of art that requires vast creativity. So how do we go about solving such a problem? The first concept we will discuss is something called GAN (Generative Adversarial Networks). GANs, although quite complex, are becoming an outdated model. If artificial intelligence in the art space is to advance, researchers and developers will have to work to find better methods to allow machines to generate art and music. Two of these such methods are presented in the form of Sketch-RNN and CAN (Creative Adversarial Networks). Each of these methods have their advantages over GANs.
First, let’s explore what exactly a GAN is. Below is a small excerpt explaining how a GAN works:
Generative Adversarial Network (GAN) has two sub networks, a generator and a discriminator. The discriminator has access to a set of images (training images). The discriminator tries to discriminate between “real” images (from the training set) and “fake” images generated by the generator. The generator tries to generate images similar to the training set without seeing the images (Elgammal, Liu, Elhoseiny, & Mazzone, 2017).
The more images the generator creates, the closer they get to the images from the training set. The idea is that after a certain number of images are generated, the GAN will create images that are very similar to what we consider art. This is a very impressive accomplishment to say the least. But what if we take it a step further?
Many issues associated with the GAN are simply limitations on what it can do. The GAN is powerful, but can’t do quite as much as we would like. For example, the generator in the model described above will continue to create images closer and closer to the images given to the discriminator that it isn’t producing original art. Could a GAN be trained to draw alongside a user? It’s not likely. The model wouldn’t be able to turn a text-based description of an image into an actual picture either. As impressive as the GAN may be, we would all agree that it can be improved. Each of the shortcoming mentioned have actually been addressed and, to an extent, solved. Let’s look at how this is done.
Sketch-RNN is a recurrent neural network model developed by Google. The goal of Sketch-RNN is to help machines learn to create art in a manner similar to the way a human may learn. It has been used in a Google AI Experiment to be able to sketch alongside a user. While doing so, it can provide the users with suggestions and even complete the user’s sketch when they decide to take a break. Sketch-RNN is exposed to a massive number of sketches provided through a dataset of vector drawings obtained through another Google application that we will discuss later. Each of these sketches are tagged to let the program know what object is in the sketch. The data set represents the sketch as a set of pen strokes. This allows Sketch-RNN to then learn what aspects each sketch of a certain object has in common. If a user begins to draw a cat, Sketch-RNN could then show the user other common features that could be on the cat. This model could have many new creative applications. “The decoder-only model trained on various classes can assist the creative process of an artist by suggesting many possible ways of finishing a sketch” (Eck & Ha, A Neural Representation of Sketch Drawings, 2017). The Sketch-RNN team even believes that, given a more complex dataset, the applications could be used in an educational sense to teach users how to draw. These applications of Sketch-RNN couldn’t be nearly as easily achieved with GAN alone.
Another method used to improve upon GAN is the Creative Adversarial Network. In their paper regarding adversarial networks generating art, several researchers discuss a new way of generating art through CANs. The idea is that the CAN has two adversary networks. One, the generator, has no access to any art. It has no basis to go off of when generating images. The other network, the discriminator, is trained to classify the images generated as being art or not. When an image is generated, the discriminator gives the generator two pieces of information. The first is whether it believes the generated image comes from the same distributor as the pieces of art it was trained on, and the other being how the discriminator can fit the generated image into one of the categories of art it was taught. This technique is fantastic in that it helps the generator create images that are both emulative of past works of art in the sense that it learns what was good about those images and creative in a sense that it is taught to produce new and different artistic concepts. This is a big difference from GAN creating art that emulated the training images. Eventually, the CAN will learn how to produce only new and innovative artwork.
One final future for the vanilla GAN is StackGAN. StackGAN is a text to photo-realistic image synthesizer that uses stacked generative adversarial networks. Given a text description, the StackGAN is able to create images that are very much related to the given text. This wouldn’t be doable with a normal GAN model as it would be much too difficult to generate photo-realistic images from a text description even with a state-of-the-art training database. This is where StackGAN comes in. It breaks the problem down into 2 parts. “Low-resolution images are generated by our Stage-I GAN. On the top of our Stage-I GAN, we stack Stage-II GAN to generate realistic high-resolution images conditioned on Stage-I results and text descriptions” (Huang, et al., 2016). It is through the conditioning on Stage-I results and text descriptions that Stage-II GAN can find details that Stage-I GAN may have missed and create higher resolution images. By breaking the problem down into smaller subproblems, the StackGAN can tackle problems that aren’t possible with a regular GAN. On the next page is an image showing the difference between a regular GAN and each step of the StackGAN.
This image came from the StackGAN paper (Huang, et al., 2016).
It is through advancements like these that have been made in recent years that we can continue to push the boundaries of what AI can do. We have just seen three ways to improve upon a concept that was already quite complex and innovative. Each of these advancements have a practical, everyday use. As we continue to improve on artificial intelligence techniques, we will able to do more and more in regard to, not just art and music, but a wide variety of tasks to improve our lives.
DeepBach, Magenta, and NSynth
Images aren’t the only type of art that artificial intelligence can impact though. Its effect on music is being explored as we speak. We will now explore some specific cases and their impact on both music and artificial intelligence. In doing this, we should be able to see how art can do as much for AI as AI does for it. Both fields benefit heavily from the types of projects that we are exploring here.
Could a machine ever be able to create a piece of music the likes of Johann Sebastian Bach? In a project known as DeepBach, several researchers looked to create pieces similar to Bach’s chorales. The beauty of DeepBach is that it “is able to generate coherent musical phrases and provides, for instance, varied reharmonizations of melodies without plagiarism” (Hadjeres, Pachet, & Neilsen, 2016). What this means it that DeepBach can create music with correct structure and be original. It is just in the style of Bach. It isn’t just a mashup of his works. DeepBach is creating new content. The developers of DeepBach went on to test whether their product could actually fool listeners.
As part of the experiment, over 1,250 people were asked to vote whether pieces presented to them were in fact composed by Bach. The subjects had varying degrees of musical expertise. The results showed that as the model for DeepBach’s complexity increased, the subjects had more and more trouble distinguishing the chorales of Bach from those of DeepBach. This experiment shows us that through the use of artificial intelligence and machine learning, it is quite possible to recreate original works in the likeness of the greats. But is that the limit to what artificial intelligence can do in the field of art and music?
DeepBach has achieved something that would have been unheard of in the not so distant past, but it certainly isn’t the fullest extent of what AI can do to benefit the field of music. What if we want to create new and innovative music? Maybe AI can change the way music is created all together. There must be projects that do more to push the envelope. As a matter of fact, that is exactly what the team behind Magenta look to do.
Magenta is a project being conducted by the Google Brain team and lead by Douglas Eck. Eck has been working for Google since 2010, but that isn’t where his interest in Music began. Eck helped found Brain Music and Sound, an international laboratory for brain, music, and sound research. He was also involved at the McGill Centre for Interdisciplinary Research in Music Media and Technology, and was an Associate Professor in Computer Science at the University of Montreal.
Magenta’s goal is to be “a research project to advance the state of the art in machine intelligence for music and art generation” (Eck, Welcome to Magenta!, 2016). It is an open source project that uses TensorFlow. Magenta aims to learn how to generate art and music in a way that is indeed generative. It must go past just emulating existing music. This is distinctly different that projects along the line of DeepBach which set out to emulate existing music in a way that wasn’t plagiarizing existing pieces of music. Eck and company realize that art is about capturing elements of surprise and drawing attention to certain aspects. “This leads to perhaps the biggest challenge: combining generation, attention and surprise to tell a compelling story. So much of machine-generated music and art is good in small chunks, but lacks any sort of long-term narrative arc” (Eck, Welcome to Magenta!, 2016). Such a perspective gives computer-generated music more substance, and helps it to become less of a gimmick.
One of the projects the magenta team has developed is called NSynth. The idea behind NSynth is to be able to create new sounds that have never been heard before, but beyond that, to reimagine how music synthesis can be done. Unlike ordinary synthesizers that focus on “a specific arrangement of oscillators or an algorithm for sample playback, such as FM Synthesis or Granular Synthesis” (Engel, et al., 2017), NSynth generates sounds on an individual level. To do this, it uses deep neural networks. Google has even launched an experiment that allows users to really see what NSynth can do by allowing them to fuse together the sounds of existing instruments to create new hybrid sounds that have never been heard before. As an example, users can take two instruments such as a banjo and a tuba, and take parts of each of their sounds to create a totally new instrument. The experiment also allowed users to decide what percentage of each instrument would be used.
Projects like Magenta go above and beyond in showing us the full extent of what artificial intelligence can do in the way of generating music. They explore new applications of artificial intelligence that can generate new ideas independent of humans. It is the closest we have come to machine creativity. Although machines aren’t yet able to truly think and express creativity, they may soon be able to generate new and unique art and music for us to enjoy. Don’t worry though. Eck doesn’t intend to replace artists with AI. Instead he looks to provide artists with tools to create music in an entirely new way.
Deep Dream and Quick, Draw!
As we look ahead to a few more of the ways that AI has been used to accomplish new and innovative ideas in the art space, we look at projects like Quick, Draw! and Deep Dream. These projects showcase amazing progress in the space while pointing out some issues that researchers in AI will have to work out in the years to come.
Quick, Draw! is an application from the Google Creative Lab, trained to recognize quick drawings much like one would see in a game of Pictionary. The program can recognize simple objects such as cats and apples based on common aspects of the many pictures it was given before. Although the program will not get every picture right each time it is used, it continues to learn from the similarities in the picture drawn and the hundreds of pictures before it.
The science behind Quick, Draw! “uses some of the same technology that helps Google Translate recognize your handwriting. To understand handwritings or drawings, you don’t just look at what the person drew. You look at how they actually drew it” (Developers, Google, 2016). It is presented in the form of a game, with the user drawing a picture of an object chosen by the application. The program then has 20 seconds to recognize the image. In each session, the user is given a total of 6 objects. The images are then stored to the database used to train application. This happens to be the same database we saw earlier in the Sketch-RNN application. This image recognition is a very practical use of artificial intelligence in the realm of art and music. It can do a lot to benefit us in our everyday lives. But this only begins to scratch the surface of what artificial intelligence can do in this field. Although this is very impressive, we might point out that the application doesn’t truly understand what is being drawn. It is just picking up on patterns. In fact, this distinction is part of the gap between simple AI techniques and true artificial general intelligence. Machines that truly understand what the objects in images are don’t appear to be coming in the near future.
Another interesting project in the art space is Google’s Deep Dream project, which uses AI to create new and unique images. Unfortunately, the Deep Dream Generator Team wouldn’t go into too much detail about the technology itself (mostly fearing it would be too long for an email) (Team, 2017). They did, however, explain that convolutional neural networks train on the famous ImageNet dataset. Those neural networks are then used to create art-like images. Essentially, Deep Dream takes the styling of one image and uses it to modify another image. The results can be anything from a silly fusion to an artistic masterpiece. This occurs when the program identifies the unique stylings of an image provided by the user and imposes those stylings onto another image that the user provides. What can easily be observed through the use of Deep Dream is that computers aren’t yet capable of truly understanding what they are doing with respect to art. They can be fed complex algorithms to generate images, but don’t fundamentally understand what it is they are generating. For example, a computer may see a knife cutting through an onion and assume the knife and onion are one object. The lack of an ability to truly understand the contents of an image is one dilemma that researchers have yet to solve.
Perhaps as we continue to make advances in artificial intelligence we will be able to have machines that do truly understand what objects are in an image and even the emotions evoked by their music. The only way for this to be achieved is by reaching true artificial general intelligence (AGI). IN the meantime, the Deep Dream team believes that generative models will be able to create some really interesting pieces of art and digital content.
Where Do We Go From Here?
For this section, we will consider where artificial intelligence could be heading in the art space. We will take a look at how AI has impacted the space and in what ways it can continue to do so. We will also look at ways art and music could continue to impact AI in the years to come.
Although I don’t feel that we have completely mastered the ability to emulate the great artists of our past, it is just a matter of time before that problem is solved. The real task to be solved is that of creating new innovations in art and music. We need to work towards creation without emulation. It is quite clear that we are headed in that direction through projects like CAN and Magenta. Artificial general intelligence (AGI) is not the only way to complete this task. As a matter of fact, even those who dispute the possibility of AGI would have a hard time disputing the creation of unique works of art by a machine.
One path that may be taken to further improve art and music through AI is to create more advanced datasets to use in training the complex networks like Sketch-RNN and Deep Dream. AI needs to be trained to be able to perform as expected. That training has a huge impact on the results we get. Shouldn’t we want to train our machines in the most beneficial way possible. Even developing software like Sketch-RNN to use the ImageNet dataset used in Deep Dream could be huge in educating artists on techniques for drawing complex, realistic images. Complex datasets could very well be our answer to more efficient training. Until our machines can think and learn like we do, we will need to be very careful what data is used to train them.
One of the ways that art and music can help to impact AI is by providing another method of Turing Testing machines. For those who dream of creating AGI, what better way to test the machine’s ability that to create something that tests the full extent of human-like creativity? Art is the truest representation of human creativity. That is, in fact, its essence. Although art is probably not the ultimate end game for artificial intelligence, it could be one of the best ways to test the limits of what a machine can do. The day that computers can create original musical composition and create images based on descriptions given by a user could very well be the day that we stop being able to distinguish man from machine.
There are many benefits to using artificial intelligence in the music space. Some of them have already been seen in the projects we have discussed so far. We have seen how artificial intelligence could be used for image recognition as well as their ability to turn our words into fantastic images. We have also seen how AI can be used to synthesize new sounds that have never been heard. We know that artificial intelligence can be used to create art alongside us as well as independently from us. It can be taught to mimic music from the past and can create novel ideas. All of these accomplishments are a part of what will drive AI research into the future. Who knows? Perhaps one day we will achieve artificial general intelligence and machines will be able to understand what is really in the images it is given. Maybe our computers will be able to understand how their art makes us feel. There is a clear path showing us where to go from here. I firmly believe that it is up to us to continue this research and test the limits of what artificial intelligence can do, both in the field of art and in our everyday lives.
Developers, Google. (2016, November 15). A.I. Experiments: Quick, Draw! United States of America.
Eck, D. (2016, June 1). Welcome to Magenta! Retrieved from Magenta: https://magenta.tensorflow.org/welcome-to-magenta
Eck, D., & Ha, D. (2017). A Neural Representation of Sketch Drawings. eprint arXiv:1704.03477.
Elgammal, A., Liu, B., Elhoseiny, M., & Mazzone, M. (2017). CAN: Creative Adversarial Networks Generating “Art” by Learning About Styles and Deviating from Style Norms.
Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders. eprint arXiv:1704.01279.
Hadjeres, G., Pachet, F., & Neilsen, F. (2016). DeepBach: a Steerable Model for Bach Chorales Generation. eprint arXiv:1612.01010.
Huang, X., Li, H., Metaxas, D., Wang, X., Xu, T., Zhang, H., & Zhang, S. (2016). StackGAN: Text to Photo-realisitc Image Synthesis with Stacked Generative Adersarial Networks. eprint arXiv:1612.03242.
Team, D. D. (2017, September 22–25). E-Mail. (C. Kalahiki, Interviewer)