OpenAI pushes the limits of AI

OpenAI pushes the limits of AI
OpenAI pushes the limits of AI

VS Unlike previous launches, OpenAI is taking a very phased deployment approach for GPT-4o. Initially accessible via ChatGPT Plus, GPT-4o is also available for “Team” plan users and will soon be available for the “Enterprise” edition. Additionally, this new model is available to users of the free version of ChatGPT, opening the door to wide adoption. On the API side, GPT-4o can be used on Chat Completions, Assistants and Batch endpoints, with availability in the OpenAI playground as well as on Microsoft’s Azure OpenAI Service offering.

What makes the difference

OpenAI claims that GPT-4o is faster than GPT-4 Turbo at generating text, a claim confirmed by several user reviews. The performance of GPT-4o in terms of text processing, reasoning and code would be equivalent to that of GPT-4 Turbo. However, GPT-4o particularly stands out in the processing of languages ​​other than English, as well as in the areas of vision and audio. It is important to note that the GPT4o knowledge base ends in October 2023, two months before that of GPT-4 Turbo. However, this does not hinder its extensive capabilities. The context window remains the same at 128k tokens, with a maximum output of 4k tokens. Another notable improvement is cost efficiency: inference with GPT-4o costs half as much as with GPT-4 Turbo, with rates of $5 per million input tokens and $15 per million input tokens. output tokens.

A multimodal era

GPT-4o is described as being “natively multimodal”, capable of processing a variety of input modes, including voice, text and images. OpenAI plans to launch real-time video processing functionality, although this capability is currently limited to dividing videos into image sequences. The company emphasizes that additional work is needed on infrastructure, overtraining, and security before making this feature widely available.

A trial of these modalities is planned in a restricted circle (alpha version) on ChatGPT Plus and on the API, with a delivery time measured in months. GPT-4o integrates the latest capabilities of the Advanced Data Analysis (formerly Code Interpreter) building block, enabling ChatGPT to perform complex operations such as anomaly detection and correction, data aggregation and integration , as well as statistical and temporal analysis. The model can create interactive tables and graphs using libraries like pandas and Matplotlib. In terms of voice processing, and unlike previous models which used separate networks for speech recognition and synthesis, GPT-4o integrates a single neural network to process all types of content.

Competitive context

The launch of GPT-4o came at a strategic time, just before the Google I/O conference, where Google unveiled new AI products, notably as part of its Project Gemini. Sam Altman, CEO of OpenAI, described GPT-4o as an important step in the evolution of the company’s vision. Initially focused on creating benefits for the world, this vision has evolved into a more pragmatic approach, focused on making AI models available through paid APIs. During the GPT-4o live demonstration, the model impressed with its ability to interact naturally with users, processing multimodal data such as audio, video and text in real time.

Featured features include solving complex math problems, facial emotion recognition, audio content generation and real-time conversation translation. With notable improvements in speed, cost, and language and data processing capabilities, GPT4o is well positioned for adoption by a wide range of users, from individuals to large enterprises. Ultimately, the future of these technologies will depend on their successful integration into various sectors, providing unprecedented possibilities for interaction between machines and humans.

-

-

PREV Canonical launches Ubuntu Core 24
NEXT Microsoft blocks the main way to install Windows 11 without an account, but there is one trick left