What are diffusion models and how do they work?
What are diffusion models and how do they work in the context of AI art.
What are diffusion models?
Diffusion models are inspired from the physics concept of non-equilibrium thermodynamics.
In physics, thermodynamics is the study of the relation between heat, work, and temperature, and their relation to energy, entropy, and the physical properties of matter and radiation.
In equilibrium thermodynamics, systems are in a state of balance and do not change over time. In non-equilibrium thermodynamics, systems are continuously changing and evolving.
Diffusion models apply this concept to various fields, such as physics, chemistry, biology, and economics, to understand the movement of particles, substances, or ideas over time and space. These models help us understand the dynamics of complex systems and the patterns of their behavior.
By studying diffusion models, we can gain insights into how things spread and how they interact with their environment. Consider for example a drop of paint in a glass of water:
When the drop of paint first hits the water, the density of the paint is very high in one spot and zero in other parts of the water. By the laws of physics, the drop of water will diffuse in the water until it reaches an equilibrium.In the physical world, it is not possible to reverse this fusion process of paint and water. The goal of ‘diffusion models’ is to learn a model that can reverse this fusion process and bring the drop of water to its original state. In other words, the drop of paint being in one spot includes some information and as the diffusion process progresses, we lose this information.When you think of diffusion models for generative AI art, it is a similar process to the drop of paint in a glass of water. The information we have with the drop of paint is equivalent to clear images and if we work backwards from the diffused paint, we work backwards towards clear images.
How diffusion models work in AI
Diffusion models in the sphere of machine learning are generative models that can generate new data based on training data. They work by ‘destroying’ the training data through the process of adding ‘ Gaussian noise ‘ and then learning to recover the data by reversing this noising process. Simply put, diffusion models can generate coherent images from noise.
The noise applied to the training data is based on a Markov chain.
A Markov chain is a stochastic model that describes a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. This means that in a chain of events, the current time step depends only on the previous and there are no time steps with cross dependencies with other time steps that do not immediately follow each other.
This step procedural process makes it tractable for the noise added in the training data to be reversed later. A diffusion model for image generation works as a Markov process where it takes steps to add noise to the image until the image finally only consists of noise. It then later learns how to reverse this noise adding process.
Once a diffusion model has been trained, when it is given only noise, it is now able to generate high quality images.
The diffusion process: adding the Gaussian noise
A Gaussian noise is noise that has the probability distribution of a Gaussian distribution (normal distribution). In a normal distribution, there can be a different mean and variation of values of the noise which would affect the location and the width of the distribution but the bell curve will remain the same.
Adding a Gaussian noise to an image means changing the pixels of that image slightly and the area of the probability distribution.
Mathematically, the Gaussian noise is added like so:
Consider we have an uncontaminated image, a free of noise source image \( S(x,y) \) and we add noise from \(e(x, y)\) drawn from a Gaussian distribution.
Then the observed image is \(O(x, y)\) is given by:
\(O(x, y) = S(x, y) + e(x, y)\)
Adding Gaussian noise is simply a continuous process of adding more of (\e(x, y)\) drawn randomly from the probability density function of its Gaussian distribution.
Originally published at https://blog.stablematic.com on March 8, 2023.