As part of the deployment of its Lucie open source language model, Linagora is in discussions with Exaion, Outscale, OVH and Scaleway to set up a suitable infrastructure.
For an actor wishing to distribute their generative AI assistant on a large scale, Gafam’s cloud offers represent a ready-made solution. Hyperscalers in fact offer adapted machine resources, with an almost unlimited capacity to accommodate very high volumes. The underlying challenge: being able to manage a colossal traffic load on relatively heavy processing.
“We are currently working with Exaion (cloud subsidiary of EDF, editor’s note), Outscale, OVH and Scaleway with the aim of deploying on a large scale our open source language model Lucie which has 7 billion parameters”, confides Michel-Marie Maudet, general manager of the free software services company (SS2L) Linagora. A model that the CEO describes as an SLM for small language model. Objective stated by the Issy-les-Moulineaux company: to prove at the Paris Open Source AI Summit that the actor is organizing on January 22. next that it is possible to offer an open source equivalent to ChatGPT based on a sovereign cloud infrastructure.
Faced with this challenge, the CEO of Linagora remains lucid. “No French cloud has yet carried out such an operation. We are therefore going to wipe the plaster,” he says bluntly. “The most advanced of them remains from our point of view Scaleway (with more than 1000 Nvidia H100 type GPUs already deployed, editor’s note). It tends towards an experience quite similar to that of Amazon Bedrock (the AWS service dedicated to generative AI, editor’s note).”
Is the multicloud path essential?…
To define its infrastructure needs, Linagora began by evaluating traffic scenarios, notably by estimating the number of requests as well as the volume of tokens input and output per user. From there, SS2L evaluated several Nvidia cards: the RTX A4000, the L4, the L40S and the H100. In each case, a standard benchmark has been established. The challenge for Linagora is to achieve an architecture with web front-ends supporting the chat interface, and behind the scenes a load balancer based on the open source LiteLLM brick responsible for directing processing to the GPU inference points of the most suitable sovereign cloud. For example, if the user wishes to keep their data on a trusted cloud, the flow will be routed to Outscale and supported by the latter’s SecNumCloud-labeled GPUs.
“We are currently moving towards a multi-cloud architecture as we believe that a single sovereign cloud will not be able to cover all our use cases and will also not be able to provision the power necessary for a general public launch on its own. “, underlines Michel-Marie Maudet. “From there, the challenge is to demonstrate our ability to infer our model among several French cloud operators.”
….”No”, answer the sovereign clouds
On the Scaleway side, we maintain the capacity to accommodate, including on an LLM (for large language model) of more than 100 billion parameters, load increases of several hundred or even several thousand simultaneous users. “We ensured the global launch of the Kyutai Foundation’s Moshi voice chat, which represents a significant increase in support,” recalls Frédéric Bardolle, AI lead product manager at Scaleway. Behind the scenes, Moshi relies on a model called Helium which turns out to be quite close to Lucie since, like the latter, it has 7 billion parameters.
“We can handle up to several hundred thousand requests per second”
What about OVHcloud? The Roubaix cloud offers AI Endpoints. A service, currently in beta, designed to deliver language models via token billing. Under the hood, the supplier already markets around forty including Llama-3.1-70B-Instruct or Mixtral-8x22b-Instruct. “This offer is fully adapted to Lucie,” maintains Gilles Closset, global AI ecosystem leader at OVHcloud. “We fully support the underlying infrastructure layer. Knowing that we have the ability to handle up to several hundred thousand requests per second without problem.”
In terms of graphics cards, OVHcloud uses resources adapted depending on the model. “We offer L4 graphics cards for small models, L4S for intermediate models, and H100 for large models,” explains Gilles Closset. In the coming months, OVHcloud also plans to make available, in addition, AMD MI325X, AMD Blackwell, without forgetting Nvidia H200.
At Outscale (Dassault Systèmes group), we also want to be confident. “Since September 2024, we have started to offer Mistral’s premium language models as part of an LLM as a Service offering which aims to accommodate other generative AIs in the future,” indicates David Chassan, director of strategy at Outscale. Inference-oriented, the offer in question integrates Codestral Mistral AI, Mistral Small, Ministral 8B 24.10 and Mistral Large. For each model, the supplier implements an ad hoc machine infrastructure. The stack includes, for example, two L40 graphics cards for Mistral Small, and four H200 GPUs for Mistral Large. Configurations designed for business use, but far from suitable for general public use and audience volume.
Asked whether Outscale is capable of holding the charge on a larger scale, David Chassan is reassuring. “Dassault Systèmes has more than 350,000 customers around the world (and 24% of turnover generated in the cloud, editor’s note). This gives us a significant strike force in terms of machine power”, he underlines. “However, our main added value in AI as in the cloud in general consists of provisioning a dedicated stack for each client. From this point of view, Outscale remains the only cloud equipped with SecNumCloud certified GPUs”, summarizes David Chassan. “Our primary aim is to serve organizations and institutions that wish to protect their data and intellectual property.” A message that has the merit of being clear.