I have always been interested in art, but I never became a real artist. However, my interest in it remained, and although I did not have enough time to master my skills, I liked to dabble in something creative from time to time. In the last two years, with the introduction of advanced AI tools, it has become easier to create visually appealing works. We now have access to tools that help us express our ideas through visual representation without having to invest a lot of time in mastering the skills. Naturally, I was curious about these tools and what could be created with them. But lately I’ve been wondering how AI actually creates art, and what the scene was like before AI entered the creative arena. In this article, I want to explore different AI models for creating art, discuss the differences between generative art and AI art, and explore other ways to be artistic.
AI models for image generation
I think the first time I saw art generated by AI tools was about a year ago when some of my colleagues started experimenting with a text-to-image engine called Midjourney. The results were quite impressive, of course, not without taking some time to learn how to use the tool properly. But at the time, there was an obvious shift in the way we make art and the way we perceive it. A few months later, DALL-E appeared on the scene, and since I had ChatGPT Plus, I was able to try it out right away. Another popular tool is Stable Diffusion. Although I never got to try it, these three are often compared, so it’s impossible to discuss one without acknowledging the existence of the others as they do essentially the same thing. At some point, I had to wonder what was going on under the hood of these tools. My curiosity peaked when I noticed the similarities in the images created, and over time it became easy to tell if an image was created by AI.
To begin with, there are several AI models for image generation, but there are four primary ones: Diffusion models, Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Flow-based models.
I mentioned earlier that these three text-to-image engines do essentially the same thing because they rely on the diffusion model, and since these tools have gained quite a bit of popularity, it is fair to say that the majority of AI-generated art is created using a diffusion model.
So, what is the main principle behind these models? Doing a bit of research I bumbled into a huge amount of explanations with lots of math and formulas making it sound extremely complicated (which it is). But I want to give a very short and simple description of the main working principle of each of these models.
Diffusion models gradually add Gaussian noise (In digital images, it is like random speckles of light or dark spots that make the picture look grainy. This noise can come from poor lighting or camera imperfections, adding a fuzzy quality to images) to an input image in a sequential manner through a series of steps. This process is called forward diffusion. A neural network is then trained to recover the original data by reversing the noise process. By being able to model the reverse process, it can generate new data. This is called the reverse diffusion process or, more generally, the sampling process of a generative model.
Generative Adversarial Networks can be explained in a very simple way. If you want to generate new data, you have to build two models. The first one has to be trained to generate fake data and the second one has to be trained to distinguish real from fake data. And then you just let them compete with each other. Sounds simple, right? But I think some details would be helpful here. Both Diffusion models and GAN generate images from noise, but GAN has a different underlying principle. The first model is a neural network, called the Generator, and it’s role is to produce fake data only with noise as input. The second model is called the Discriminator and it learns to identify whether an image is fake or not by receiving as input both the real images and the fake ones produced by the generator. The magic begins when you put them contesting against each other and train them simultaneously. The generator gets better and better at generating images as it tries to fool the discriminator. The discriminator gets better and better at distinguishing between fake and real images because it doesn’t want to be fooled. The result is incredibly realistic fake data from the discriminator.
Variational Autoencoders are generative models that learn the distribution of data, allowing them to generate new data points. Essentially: they learn to compress (encode) data into a compact representation and then reconstruct (decode) the original data from this compact representation as accurately as possible.
Flow-based generative models operate on the principle of transforming data by learning invertible mappings (sequences of invertible transformations) between the data distribution and a known distribution (usually a Gaussian distribution). The key feature of flow-based models is that they allow exact likelihood computation and invertibility. This makes them uniquely suited for certain types of generative tasks, such as, for example, high-fidelity image synthesis or complex data distribution modeling.
A little experiment
As mentioned above, tools like DALL-E, Midjourney, and Stable Diffusion rely on diffusion models, but that wasn’t obvious to me when I first started exploring the field. Once I learned about different generative models, I went to DALL-E and asked it to generate some images using specific models. To my surprise, I got some results even though it wasn’t really capable of doing what I asked. Funny enough, when I recently tried to use one of the prompts I used before, DALL-E replied:
I can only create images with the tools I have access to, which currently include generating images based on your descriptions using a model similar to DALL-E, rather than specifying the type of AI model like a Variational Autoencoder (VAE). If you'd like another forest image or have a different request, I'm here to help!
Unlike before, it didn’t try to generate something that somehow fit the description, so I can only assume some limitations were introduced. Here are some examples of images I got earlier this year with the prompts I used:
Generative Art vs AI Art
Now that we’ve gone through various generative models and hopefully gotten a basic idea of how these modern tools work, it’s time to clarify a few things. We talked about AI models for image generation and called them generative models, but does that mean we can call the result they produce AI generative art? First of all, the term “generative art” has been around since the 1960s, when programmers and artists started using computer-controlled robots to create paintings. Since then, it has become common to refer to such art as algorithmic art (computer generated artwork). Essentially, this type of art is created using a set of pre-defined rules or algorithms. When we talk about algorithms, they are often mathematical or logical in nature. We often use patterns we see around us and try to replicate them with some custom parameters like color, schemes, geometric shapes and so on, but the final output is always determined by the algorithm itself.
On the other side of the moon is AI-generated art, which is created in a different way using machine learning techniques such as neural networks. This art can take much more effort to produce, as you need to train a neural network on a large dataset of images, and only then can you use the trained network to create some artwork. Although the artist has some control over the parameters of the neural network and the data used to train it, the final output will never be the same and will be determined by the network itself.
What else can we do besides AI?
Obviously, there are countless ways to make art and be creative, but it’s always nice to mention things in the hopes of sparking someone’s creativity.
We have so many different tools at our disposal, from engines, programs, algorithms, actual physical machines, printers, sensors, and tons of other technical stuff, that the amount of possibilities and capabilities becomes overwhelming.
Sometimes it’s really nice to take a step back to the simpler things and explore the horizon. Combining art with other subjects like math or programming can yield amazing results. Taking mathematical formulas and using them to create complex images (fractal art) that show self-similarity at different scales with infinitely detailed patterns from simple equations has always amazed me. The same goes for creating algorithmic sculptures, forms that are impossible to create by human hands alone but can be designed through computational processes and manufactured using 3D printing technology.
There is another example that I really like, and that is image corruption or glitching. There are many ways to do this, and some ways are smarter than others, but the easiest thing to do is to modify the source code of an image and get some interesting results with very little manipulation. All you need is Notepad++ or another text editor and an image you want to modify. When you open an image with a text editor, what you’ll see is essentially the raw binary data of the image file represented as text. This binary data encodes all the information needed to display the image, such as color values for each pixel, image dimensions, compression type, and possibly metadata about the image. The cool thing about this is that you don’t need to understand it, all you need to do is pick a few lines of code, copy them, and paste them in place of some other lines somewhere else in the file. Just a few repetitions are enough to change the original image to an unrecognizable point.
Here is an example of the corrupted image:
Conclusion
It is much easier to be creative and create art now, but I still think that AI at this stage is just a tool that can be used to create things, rather than an actual artist. There is definitely art in what is generated by all these AI models, but mostly because it relies on data generated by humans and often requires input from us, which means we have to put in a little effort to generate some ideas and foster our creativity. Most of the work for us is still in growing the idea rather than implementing it. However, there are many opinions and that makes it interesting.
This article is part of XPRT.#16. You can download the magazine here
Explore deeper into the world of innovation—visit our library, 'Your Essential Guides to Innovation and Expertise,' and access a collection of magazines and eBooks that will transform your understanding of cutting-edge technology and inspire your next big idea.