Hugging Chat, Whisper, Stable Diffusion… Here are the open source alternatives to the best AI on the market.
AI accessible to all is possible in 2024. Alongside the proprietary players in generative AI, the open source ecosystem has grown considerably over the last 24 months. Free alternatives are today almost as efficient as their proprietary equivalent. The JDN lists the best free AI tools and models on the market, for generating text, images and even transcription.
Hugging Chat: the alternative to ChatGPT
Hugging Chat stands out as one of the most promising open source alternatives to ChatGPT. Developed by Hugging Face, the chatbot can be configured with several cutting-edge models: Llama-3.1 70B from Meta, Command R+ from Cohere, Qwen2.5-72B (from Qwen), Llama-3.1-Nemotron 70B from Nvidia, Llama-3.2 -11B Vision from Meta, Hermes 3 from NousResearch, Mistral Nemo from Mistral AI and finally Phi 3.5 from Mistral AI. For summary or text generation, Llama-3.1 70B is preferred. To send images to the model for analysis, use Llama-3.2-11B.
Over the past few months, Hugging Chat has really grown. It is possible to create your own personalized bots and use tools, such as with ChatGPT: web search, image generation, image editing, calculator… It is also possible to use one of the 37 tools (in November 2024) developed by the community.
Stable Diffusion: the alternative to Dall-E and Midjourney
While proprietary AI still dominates image generation, open source models have made significant progress over the past 12 months. The most popular, Stable Diffusionallows you to generate images in a host of different styles. The latest version 3.5 offers better adhesion to the prompt, more detailed images and overall more realistic results. His strength? It is possible to run it locally with a relatively reasonable configuration (with the RTX range from Nvidia in particular).
Another alternativeFLUX.1 Devdeveloped by the Black Forest laboratory, offers very good results. FLUX.1 Dev offers excellent image quality. It performs particularly well with complex prompts and offers very good understanding of detailed scenes. Due to its hybrid architecture, the model is often faster than Stable Diffusion at inference but requires more hardware resources.
Whisper: the alternative to the STT models of cloud providers
Google Speech-to-Text at Google Cloud, Amazon Transcribe at AWS, Azure Speech to Text at Microsoft… Cloud providers have long dominated AI transcription. But the arrival of Whisper at OpenAI is starting to reshuffle the cards. Available as open source with regular updates (at least once a year), Whisper offers a solid alternative to proprietary speech-to-text models.
The model remains very efficient, even with audio recordings containing a lot of noise and whatever the language. Its only limit? A limited vocabulary, particularly in ultra-specialized lexical fields. (example: medical acronym). Finally, the latest Turbo version (large-v3-turbo) offers much faster generation with a marginal loss of precision (less than 5%).
To infer the model without using a dedicated server or a paid API, it is possible to run the model for free on Google TPUs with Google Colab.
Audio, video: open source models lagging behind
The much newer sector of generative AI for audio and video is still lacking in maturity. Several proprietary models like Runway or Pika for video, or Suno AI and MusicFX for audio, are starting to offer acceptable results. On the other hand, open source is still not very advanced. The rare high-performance models, like AudioCraft of Goal or Stable Video Diffusion of Stability, are still close to the research state and do not produce any truly qualitative results.
Despite the considerable resources of the tech giants, open source AI today competes with proprietary solutions in several areas. This success is largely due to Meta, which made powerful models like Llama accessible, but also to Hugging Face, whose platform welcomes new models improved by the community every day.
The main challenge for open source AI no longer lies so much in the quality of models as in access to inference resources. Hosting and running models represents substantial costs, and even open source vendors that currently offer free inference platforms, like Hugging Face with its Hugging Chat, may not be able to maintain this free indefinitely.