He was the last of the three hyperscalers to enter the generative AI dance. During his re:Invent 2024 conference which is taking place right now in Las Vegas, Amazon boss Andy Jassy lifted the veil on a family of fundamental models called Nova designed “for a wide range of tasks, as well as an industry-leading price/performance ratio”. Available in the giant’s Bedrock service, this family includes six models.
These include Nova Micro (a very fast text-to-text model), Nova Lite, Nova Pro and Nova Premier (multimodal models that can process text, images and videos to generate text). The firm has launched two other multimodal models: Nova Canvas which generates studio quality images and Amazon Nova Reel which generates studio quality videos.
According to the firm, the Nova Micro model is considered the best price-performance ratio. For its part, Nova Premier is most suitable for complex reasoning tasks and to serve as a basis for fine-tuning custom models.
Performances that match those of Llama 3.1, Gemini 1.5 or even GPT-4o mini
Amazon is not afraid of anything. In its series of benchmarks, the giant reports comparable performances between its different models and those of the larger ones. Nova Micro, for example, performed equal to or better than Llama 3.1 8B on all 11 applicable benchmarks, as well as Gemini 1.5 Flash-8B on all 12 applicable benchmarks. The Seattle firm attributes this success rate to its peak speed of 210 output tokens per second, making it ideal for applications requiring rapid responses.
For its part, the Nova Lite model is also very competitive with other models of the same type, equaling – or even being better – on 17 of the 19 benchmarks compared to OpenAI’s GPT-4o mini and on 17 of the 21 benchmarks compared to Google’s Gemini 1.5 Flash-8B. Another surprise, this multimodal model can compete with Claude Haiku 3.5 from Anthropic in around ten tests. Its other multimodal model Nova Pro competes with GPT-4o, Gemini 1.5 Pro and Claude Sonnet 3.5v2. These two iterations of the Nova family of models are known for excelling in instruction tracking and multi-modal agentic workflows, assures Amazon.
Results can be attributed to the fairly long pop-up window of each of these models: Nova Micro supports a context length of 128K input tokens, while Nova Lite and Nova Pro support a context length of 300K tokens, or 30 minutes of video processing. “At the beginning of 2025, Amazon will support a context length of more than 2M input tokens,” says the firm. Note that the three versions Micro, Lite and Pro support more than 200 languages.
Competition is getting tough
Amazon Nova Micro, Nova Lite and Nova Pro are generally available today while Nova Premier will be available in the first quarter of 2025. The message is clear: Amazon can, like Google, Microsoft or even OpenAI, do multimodal, at low cost, with very low latency. Additionally, Nova models have been optimized to make them easy to use in agentic applications that require interacting with a company’s proprietary systems and data via multiple APIs, the firm adds.
A way to tackle Microsoft head-on, which already offers a collection of agents powered by AI that can be customized as desired depending on the sector and domain where these agents are supposed to intervene. And to show its desire to assert itself in this area, Amazon is already preparing the sequel.
The Canvas and Reel models ready to compete against DALL-E 3, Stable Diffusion or even Gen-3 Alpha
Regarding its two image and video generation models, Canvas and Reel, Amazon claims that they are capable of competing with other solutions on the market. Starting with DALL-E 3 from OpenAI and Stable Diffusion for Canvas and Gen-3 Alpha from Runway for Reel. Both models come with features like watermarking to trace the source of an image, and creative moderation, which limits the generation of potentially harmful content.
Currently, Nova Reel currently generates six-second videos, and will support the generation of videos up to two minutes in length in the coming months. By comparison, Meta launched a text-to-video model last October to generate videos up to 16 seconds long, while Google has just unveiled Veo, a similar model capable of generating one-minute scenes. OpenAI, the first of its kind to release such a solution, unveiled Sora in February capable of generating one-minute scenes. So far, the solution has not been made available to the general public.
Other multimodal models to come during 2025
The giant wants to add a speech-to-speech model to its Nova family in the first quarter of 2025. “The model is designed to transform conversational AI applications by understanding streaming voice inputs in natural language, interpreting verbal and non-verbal cues (like tone and cadence), and providing natural human-like interactions , back and forth with low latency”, indicates the firm.
Another model should appear during 2025. The latter will be able to take text, images, audio and video as inputs, and generate outputs in one of these modalities, either with native multimodal capabilities for a “multimodal” result. Objective: to simplify the development of applications where the same model can be used to perform a wide variety of tasks, such as translating any files from one modality to another, editing them, and feeding them. AI agents that can understand and generate all modalities.
Beta users on deck
They are 123RF, Deloitte, Musixmatch, Palantir, SAP and Shutterstock. And they all decided to integrate the different models of the Nova family into their processes to get their own products and services off the ground. For example, 123RF and Shutterstock use Nova Canvas and Nova Reel to simplify the design process with faster, easier-to-use tools for visual creators. A new market is opening up: that of the generation of AI images whose quality is guaranteed by these image banks.
In another register, that of music, Musixmatch intends to do essentially the same thing. With 80 million users and a database of more than 11 million unique lyrics, the platform wants to integrate Nova Reel into Musixmatch Pro to help artists produce clips that match their lyrics.
SAP, for its part, intends to integrate Amazon Nova models into its family of LLMs supported by its AI Core generative AI hub. With it, developers will be able to create additional functionalities for Joule, the AI co-pilot from SAP, and above all push solutions driven by AI capable of relying on data from the German publisher.
Selected for you