https://arxiv.org/abs/2310.04378
https://arxiv.org/abs/2311.05556
https://github.com/luosiallen/latent-consistency-model
Hey I’m @naklecha. In this post I’m going to walk you through latent consistency models (LCMs). This blog is going to be highly detailed and catered towards people that want to learn about LCMs, their math and code. Let’s jump right in!
Traditional diffusion models (like stable diffusion) are very slow. We need to fix this.
But, we need to understand diffusion models before we learn about LCM. Explained next.
The image below is the architecture of Stable Diffusion.
x is the input image
then x is encoded using ε which produces z, this z is called a latent space vector (its much smaller than the image). Encoding images into latent vectors could be a whole blog by itself, so I’m not going to get into it right now.
For now, all you need to know is that z is a compressed representation of an input image x.
PS: when you hear latent vector, think of a list of numbers that represent an image in a compressed format.