Stable Diffusion and SDXL boosted thanks to Latent Consistency Models (LCM) and LoRAs

You probably know Stable Diffusion and his big brother SDXL, these AI image generation models that allow you to create images from simple textual descriptions. But did you know that it is now possible to significantly accelerate their inference pipeline thanks to Latent Consistency Models (Or LCM) ?

Developed by a Chinese team, LCMs are a distillation technique that drastically reduces the number of steps necessary to generate an image with Stable Diffusion or SDXL, while maintaining optimal quality. Instead of the usual 25 to 50 steps, we can go down to just 4 to 8 steps!

Concretely, this means speed gains of a factor of 10 on a recent Macor the possibility of generate images in less than a second on an RTX 3090. Enough to modify uses and workflows, making AI image generation accessible to everyone, even without high-end hardware.

But the most interesting thing is that thanks to a new method called LCM LoRA, it is possible to apply this optimization to any fine-tuned SDXL or Stable Diffusion model, without having to distill it entirely. As a reminder, the LoRAs (For Low-Rank Adaptation) are small adapters that are added to the model to give it superpowers, a bit like plugins. They thus make it possible to combine the advantages of LCM (ultra-fast inference) with the flexibility of fine-tuning.

And all this is perfectly integrated into the library Broadcast by HuggingFace. So, with just a few lines of code, you can load an SDXL pipeline, apply a LoRA LCM to it, change the scheduler and presto, you’re ready for lightning-fast inference!

This acceleration opens the way to exciting new use cases for AI image generation:

  • Accessibility : generative tools become usable by everyone, even without the latest GPU.
  • Rapid iteration : artists and researchers can test more ideas and variations in record time.
  • Generation on demand : we can imagine personalized image services in near real time.
  • Cost reduction : Ultra-fast inference makes it possible to consider production workloads, even on CPU or with a limited budget.

And for more information, here are some links:

Ready to generate images at full speed?

It’s up to you and thanks again to Lorenper for the info!



PREV Sewing course for women in webinars: Bow dress 06/2024
NEXT ChatGPT: 7% of Quebec students have already used it to do an assignment for them