DayFR Euro

xAI’s imaging AI performs…and disrupts

xAI unveiled a new image generation model in December 2024 with highly advanced photorealistic capabilities. Without any limitations.

The days when we knew how to distinguish deepfakes from real photographs seem to be over. Launched on December 9, 2024, Aurora, xAI’s new artificial intelligence, can generate photorealistic images of personalities without any security filter. In addition to the absence of guardrails, the model manages to generate impressively realistic photorealistic images thanks to an unusual technological approach. Explanations.

xAI abandons latent diffusion

It is an observation: xAI is starting to make its mark in the generative AI landscape. After unveiling Grok 2, an LLM with near-state-of-the-art performance, the teams from Elon Musk’s AI laboratory developed Aurora by moving away from the traditional architecture of text-to-image models. Unlike Midjourney, Dall-E or Firefly, Aurora is not based on a latent diffusion architecture but on a MoE (mixture-of-experts) base, usually used to develop LLMs.

More concretely the difference lies in the way in which the models construct the image. Latent diffusion models start from random noise which they gradually denoise to bring out the desired image. Aurora, on the other hand, builds the image sequentially, token by token, similar to how an LLM generates text word by word. The MoE architecture could in particular allow the model to call on different specialized experts depending on the aspects of the image to be generated: one expert could focus on faces, another on textures, yet another on the overall composition.

Aurora was also trained on a dataset mixing text and images, unlike other models which process this data separately. xAI conjures up “billions” of images and text from the web. The dataset is most certainly composed of images and text retrieved from X. In fact, the social network had modified its conditions of use in November to clearly indicate that the information shared would be used to train AI systems.

Better understanding of prompts

The use of an autoregressive model (in this case an MoE) is not new. The technique comes directly from OpenAI’s work in 2020 on ImageGPT (an image generator already based on a Transformer). Although model editors have moved away from this approach, it seems to be making a comeback. The latest version of Gemini (Gemini Flash 2.0) seems to adopt a similar approach by unifying the generation of text and other modalities (image and audio).

This approach offers concrete advantages over traditional models (Dall-E, Midjourney, Stable Diffusion, etc.). By building the image sequentially like text, Aurora demonstrates a finer understanding of prompts and generates more consistent details. For example, when a user asks for “a ginger cat with white paws”, by building the image gradually, the model maintains better consistency with the details requested in the prompt.

Prompt: “a red cat with white paws”. © Aurora / Grok

The use of autoregressive models particularly excels in generating text in images. Signs, logos and inscriptions are now perfectly readable, whereas diffusion models often produce distorted or illegible characters.

-
Prompt: “A paper newspaper with the title: “JOURNAL DU NET””. © Aurora / Grok

Unprecedented photorealism

Aurora’s strong point undoubtedly lies in the realism of the images generated. The model performs particularly well in generating faces and complex scenes, with remarkable consistency in details and textures. The most total freedom of expression requires, the model can reproduce personalities to perfection.

For example, it is possible to generate false encounters between various historical figures. Example below with the fictional meeting between Donald Trump, Elon Musk and Vladimir Putin.

Prompt: “A photograph depicting a meeting between Donald Trump, Elon Musk and Vladimir Putin on the Champs-Élysées in .” © Aurora / Grok

Even more disturbing, it is possible to generate fake historical archive images. Example below with the fictional meeting of Nikola Tesla and Elon Musk in 1940.

Prompt: “Archival image from 1940 in black and white. Nikola Tesla meets Elon Musk.” © Aurora / Grok

Another interesting possibility is that the xAI model can perfectly reproduce copyrighted logos. For example, below we manage to make Aurora imagine a car with the Kering logo.

Prompt: “A modern and elegant car with the Kering logo on the hood.” © Aurora / Grok

Legal risks

In conclusion, using Aurora in a professional context requires great caution. Unlike other image generation models on the market (Midjourney, DALL-E, Firefly), Aurora does not currently have security filters limiting the creation of sensitive or protected content.

Additionally, X has not clarified the licensing of images generated via Aurora in Grok. The upcoming launch of a dedicated API by xAI should be accompanied by more precise conditions of commercial use, paving the way for supervised professional exploitation of the model.

--

Related News :