At the Re:Invent 2024 developer conference, Amazon unveiled its own family of AI foundation models, called “Amazon Nova.” The release states that the Nova offering is currently available through the AWS Bedrock AI platform. The strengths of Nova highlighted are multiple: from the analysis of complex documents to the creation of videos through the design of complex AI agents. In addition, Nova would allow you to benefit from reduced latency and costs for all types of generative AI tasks.
The Nova family includes models specialized in different tasks. The simplest one is called “Nova Micro”. It accepts text prompts and generates text responses. The next model is called “Nova Lite”. It can process images and videos up to 30 minutes long, but only produces text responses. As part of his keynote at Re:Invent 2024, Amazon CEO Andy Jassy compared this model to rival OpenAI’s GPT4o-Mini.
More efficient, “Nova Pro” is a “high-performance multimodal model offering the best combination of precision, speed and cost for a wide range of tasks,” according to Amazon. It processes up to 300,000 input tokens and is suitable, among other things, for agent-based workflows that require calling APIs and tools to perform complex tasks. The model can process text, videos and images, but it also analyzes financial documents or program codes of up to 15,000 lines. This model produces text responses. During the keynote, the Amazon CEO compared Nova Pro to GPT4o.
All three Nova models are available now on Bedrock, but for now only in a few US AWS cloud regions. Linguistically, however, the models already seem ready for expansion. According to the release, they understand and generate more than 200 languages. They would work particularly well in German, French or Italian, notes the supplier.
Amazon is preparing other versions of Nova. At the beginning of 2025, the company wants to launch “Nova Premier” on the market, the “most efficient multimodal model”, indicates the press release. It is suitable for complex “reasoning” tasks and building user-defined AI models. In this case, an already existing AI model is retrained for a specific use.
Nova will also generate photos and videos
To these different versions are added the “creative content generation models”. Unlike the templates mentioned so far, creative templates do not generate text: “Nova Canvas” produces images from text and “Nova Reel” produces videos. These can initially last up to 6 seconds, then will reach up to 2 minutes, Jassy explained during the keynote.
“Nova Reel” would outperform existing models in human evaluation of video quality and consistency, writes AWS. These models are also already available in US regions of the AWS Cloud. Note that so-called creative versions can currently only work with English prompts.
Those who prefer to talk and listen will still have to wait until next year. During the keynote, the head of Amazon announced for spring 2025 a version of Nova that understands and reproduces spoken language. Finally, in mid-2025, an “any-to-any” version of Nova should see the light of day, that is to say a version capable of processing and transmitting any type of input.