Google Whisk: the image generator that reinvents the creative process

Friday 20th December 2024 05:35 AM

Google launches Whisk, its new image generator, within the Google Lab platform. More than competing with Dall-E from OpenAI, Midjourney or Grok 2, Whisk brings a touch of originality to its operation: for once, you will not enter a written prompt. At the moment, France is not one of the countries in which Whisk is available. However, it is possible to try the tool using a VPN – by locating across the Atlantic – and by creating a Google account whose main language is United States English.

Advertising, your content continues below

Creative process and precise refinement of renderings

Whisk has three main sections from which images will be generated: subject, scene and style. The principle of this generator is to combine them together and obtain creative results. For each section, we upload an illustration that we already have: it will be used as a basis by Whisk. However, it is also possible to write a prompt in text format to generate an image in a specific section. For example, for the “style” part, we could write the word “plush” (in English). This has the effect of generating an image of a stuffed animal, which is subsequently used to generate the atmosphere of the rendering created by Whisk. Note in passing that several images can be added to each section; therefore, there can be two subjects for a single generation.

Advertising, your content continues below

To generate this illustration in Whisk, we sent a photo of a Shiba Inu dog as the subject. We described a cozy cafe environment in a kawaii style for the scene. We then entered this general prompt for rendering: Subject eats a delicious pastry.

Once the image is generated by Whisk, it is possible to modify it by adding details or removing certain elements. To do this, the tool provides a text box in which the user can request modifications using a simplified prompt. If Whisk allows unlimited modifications, what is striking is that the generator keeps the rest of the image in its almost identical shape with each new prompt.

Use Gemini to describe an image… then generate a new one

If Whisk allows all Internet users – initiated or not to generative AI – to easily create personalized images, the tool nevertheless works based on a prompt in text format. When you send a photo to Whisk, it is processed by… Gemini. The latter analyzes it and describes it in text. The combination of descriptions from the three sections makes it possible to generate the final rendering from a new prompt.

This mode of operation also explains the sometimes imprecise images generated: for technical reasons, Gemini only extracts the information it considers important from the images you send it. In other words, by uploading your best portrait to the “Subject” section, Gemini should correctly understand the color of your clothes, the color of your hair, whether you wear glasses (or not), etc. However, it will not be able to create a carbon copy of your profile.

Advertising, your content continues below