Artificial intelligence companies like OpenAI are seeking to overcome delays and unexpected challenges in finding ever-larger language models by developing training techniques that use more human-like ways to allow algorithms to “think.”
A dozen AI scientists, researchers and investors told Reuters they believe the techniques, which are the basis of OpenAI’s recently released o1 model, could reshape the arms race in the field of AI and impact the types of resources that AI companies have insatiable demand for, from energy to types of chips.
OpenAI declined to comment for this article. After the release of the viral chatbot ChatGPT two years ago, tech companies, whose valuations have benefited greatly from the AI boom, publicly argued that “scaling” current models by adding data and computing power would systematically lead to the improvement of AI models.
But today, some of the most prominent scientists in the field of AI are speaking out about the limits of this “bigger is better” philosophy.
Ilya Sutskever, co-founder of the Safe Superintelligence (SSI) and OpenAI AI Labs, recently told Reuters that the results of increasing pretraining – the phase of training an AI model that uses a large amount of unlabeled data for understanding language patterns and structures – has reached a plateau.
Sutskever is widely recognized as an early proponent of the idea of making significant advances in generative AI by using more data and computing power for pre-learning, which ultimately gave rise to at ChatGPT. Mr. Sutskever left OpenAI earlier this year to found SSI.
“The 2010s were the age of scaling, but we are once again in the age of wonder and discovery. Everyone is looking for the next thing,” said Mr. Sutskever. “It’s more important than ever to scale what’s right.
Sutskever declined to elaborate on how his team is approaching the problem, saying only that SSI is working on another approach to developing pretraining.
Behind the scenes, researchers at leading AI labs have experienced delays and disappointing results in the race to release a large language model that outperforms OpenAI’s nearly two-year-old GPT-4 model , according to three sources familiar with the private matters.
“Drives” for large models can cost tens of millions of dollars by running hundreds of chips simultaneously. Researchers may not know how the models perform until the run is complete, which can take months.
Another problem is that large language models absorb huge amounts of data, and AI models have exhausted all the easily accessible data in the world. Power shortages have also hampered training cycles, as the process requires large amounts of energy.
To overcome these difficulties, researchers are exploring “real-time computing”, a technique that improves existing AI models during the so-called “inference” phase, that is, when the model is used. For example, instead of immediately choosing a single answer, a model could generate and evaluate multiple possibilities in real time, ultimately choosing the best path forward.
This method allows models to devote more processing power to difficult tasks such as math or coding problems or complex operations that require human-like reasoning and decision-making.
“It turned out that making a robot think for just 20 seconds during a poker game achieved the same performance as multiplying the model by 100,000 and training it for 100 000 times longer,” Noam Brown, an OpenAI researcher who worked on o1, said at the TED AI conference in San Francisco last month.
OpenAI adopted this technique in its new model known as “o1”, formerly known as Q* and Strawberry, which Reuters first reported on in July. The O1 model can “think” about problems in multiple steps, similar to human reasoning. It also involves the use of data and feedback from doctoral students and industry experts. The secret sauce of the O1 series is another set of workouts performed on “basic” models like the GPT-4, and the company says it plans to apply this technique to more basic models and more important.
Meanwhile, researchers at other leading AI labs, such as Anthropic, xAI and Google DeepMind, have also been working to develop their own versions of the technique, according to five people familiar with the work.
“We see a lot of low-hanging fruit that we can pick to improve these models very quickly,” Kevin Weil, product manager at OpenAI, said at a technology conference in October. “By the time people catch up, we’ll try to be three steps ahead.
Google and xAI did not respond to requests for comment and Anthropic had no immediate comment.
The implications could change the competitive landscape for AI hardware, dominated until now by insatiable demand for Nvidia’s AI chips. Prominent venture capitalists from Sequoia to Andreessen Horowitz, who have paid billions to fund expensive AI model development at numerous AI labs, including OpenAI and xAI, are taking note of the transition and evaluating the impact on their expensive bets.
“This change will move us from a world of massive pre-training clusters to inference clouds, which are distributed cloud-based servers for inference,” Sonya Huang, partner at Sequoia Capital, told Reuters.
Demand for Nvidia’s cutting-edge AI chips has fueled its rise to become the world’s most valuable company, overtaking Apple in October. Unlike learning chips, where Nvidia dominates, the chip giant could face more competition in the inference market.
Asked about the possible impact on demand for its products, Nvidia recalled the company’s recent presentations on the importance of the technique behind the o1 model. Its CEO, Jensen Huang, has spoken of growing demand to use its chips for inference.
“We have now discovered a second scaling law, and it is the scaling law at the time of inference… All of these factors have led to incredibly high demand for Blackwell,” Huang said. last month at a conference in India, referring to the company’s latest AI chip.