GANs and creative computers

Diffusion models

Diffusion models are a type of machine learning algorithm that are used to create new images by starting with a simple image and gradually adding more complexity to it over time.

An example:

  1. Generate a simple image of a bird, which can consist of just a few basic shapes or colors.
  2. Then consider what a slightly more complex image of this same bird will look like. For example, the algorithm can look at the color, textures, edges, contours of the current image and try to predict what a slightly more complete or detailed image is likely to look like. 
  3. Add random noise to the current image, but base it on the current image. For example: you make the edges a little more subtle, or you add some extra color variations here and there. In other words, you add subtle changes to the existing pixels. 
  4. Repeat the previous two steps several times, gradually increasing the level of complexity and detail in the image at each step (you spread the amount of detail gradually and "diffusely" over the entire image).
  5. Use techniques such as interpolation or sampling to generate a final image from the series of intermediate images produced by the diffusion model. This final image will be much more complex and detailed than the original simple image of the bird.

An AI that generates images

A cafe in Montreal, 2014. Some friends were talking while drinking beer (very important!) about their work in the field of AI. You know, men among themselves. They were talking about the problem that while deep learning was good at classifying images, it was having a lot of trouble generating new images. One of them, Ian Goodfellow, got the inspiration to have multiple neural networks challenge each other to generate new content that way. You notice... beer can help with problem-solving thinking ;-). That same night, he wrote the necessary code and successfully tested it. Together with his colleagues at the University of Montreal, he developed the model further and published a paper on it called "Generative Adversarial Nets". It earned Goodfellow a job at Google Research and a position at OpenAI. GANs led to lots of machine learning innovations.  

Generator and discriminator

Image classificationis often still time-consuming and expensive. Suppose you want to teach a deep neural network to recognize cats, you first have to feed it with a mass of pictures in which the cats are labeled. That work is done by humans. The AI may get better and better at recognizing cats in new images, but don't ask it to generate a "fictional cat. A GAN combines two neural networks. The first generates new data (for example, a picture of a cat). The second network is the discriminator, which works like a classical classification network (think of a network to recognize cats). The discriminator receives the output from the generator and rates it on a scale between 0 and 1 (which seems little, but count the comma numbers in between). If the score is too low (say, between 0 and 0.5), the generator corrects the output and sends it to the discriminator again. That cycle repeats super-fast in multiple sequences until it produces data that matches the desired output. Suppose you present a classification network with a series of paintings by Vincent Van Gogh. A human labels those in advance. Thus, the AI learns the difference between paintings painted by Van Gogh and those not painted by Van Gogh. Then you have a generator generate a fake Van Gogh painting, obviously not with brush and paint on canvas, but in "pixel data. The generator continues until it can tell the discriminator that it is an authentic Van Gogh painting.

GANs have already proven their value in creating and modifying imagery. They can fix errors or missing data in images, colorize images or film ... But there is also a danger in them. They can be used to manipulate imagery and video, think "deep fake video. For the music industry, they generate new compositions in various styles, which musicians can modify or correct. But they also succeed in providing a film with an appropriate soundtrack.

Text prompt: "a cute cat sleeping in bed, leica 5 type 240, gigapixel, ultra resolution"

It is not that difficult to be able to assess the power of GANs and think of new applications. A robot or self-driving car can use GANs to generate imaginary working conditions and navigate through them without having to train on a real shop floor or in a real environment. 

Deep fake photos and film 

As brilliant and simple as the idea behind GANs is, they do have some limitations. They still require an abundance of training data. For example, if you want to generate a "deep fake video" of Putin, you still need a mass of existing video footage of this president. If you do not have enough historical data, the GAN will not be able to produce a solid end result. 

Creativity? 

GANs cannot invent totally new things either. They can, however, combine existing data in new ways. But isn't that exactly how human creativity works as well?  

Since the summer of 2022, the success of image generators such as Dall-E ( https://openai.com/dall-e-2/), Midjourney ( https://midjourney.com/), Stable Diffusion grew. It is downright amazing what these generators are already capable of. Based on a simple text as an "input command," the AI generates a brand new image. The way is open for generators that generate not only still images, but also animations and film. 

Text prompt:  white colored mud house, timber framed along an old small brick road, November, fall, scene from a horror movie, middle ages, insanely detailed, cinematic, color image, dramatic lighting, insanely detailed, hyperrealistic."

Different results with the same prompt

When you ask an image generator to generate an image based on a text prompt, you may get different results each time you make the request, even if you use the same prompt. This is because the image generator is using a process called stochasticity, which means that it incorporates randomness into its outputs.

Specifically, the generator uses a random input (often called a "latent vector") to generate the image. This means that even if you use the same text prompt, the random input used by the generator will be different each time, which can result in different outputs.

Additionally, the generator is trained to create images that are similar to the images in the original dataset, but not identical. This means that even if you use the same prompt, the generator may generate different images that still match the general characteristics of the prompt.

Finally, it's worth noting that image generation using machine learning is still a relatively new and rapidly evolving field, and the algorithms used by image generators are not yet perfect. There may be some variability in the outputs due to imperfections in the algorithm or limitations in the training data.

Overall, while you may get different results each time you use an image generator with the same prompt, the generator will still create images that match the general characteristics of the prompt and are often visually interesting and creative.

Image generators

Generative Video

Generative Audio

Next page