The open-source AI boom is built on Big Tech’s handouts. How long will it last?

Either way, the bumper crop of free and open large language models puts this technology into the hands of millions of people around the world, inspiring many to create new tools and explore how they work. “There’s a lot more access to this technology than there really ever has been before,” says Biderman.

“The incredible number of ways people have been using this technology is frankly mind-blowing,” says Amir Ghavi, a lawyer at the firm Fried Frank who represents a number of generative AI companies, including Stability AI. “I think that’s a testament to human creativity, which is the whole point of open-source.”

Melting GPUs

But training large language models from scratch—rather than building on or modifying them—is hard. “It’s still beyond the reach of the vast majority of people,” says Mostaque. “We melted a bunch of GPUs building StableLM.”

Stability AI’s first release, the text-to-image model Stable Diffusion, worked as well as—if not better than—closed equivalents such as Google’s Imagen and OpenAI’s DALL-E. Not only was it free to use, but it also ran on a good home computer. Stable Diffusion did more than any other model to spark the explosion of open-source development around image-making AI last year.  

This time, though, Mostaque wants to manage expectations:  StableLM does not come close to matching GPT-4. “There’s still a lot of work that needs to be done,” he says. “It’s not like Stable Diffusion, where immediately you have something that’s super usable. Language models are harder to train.”

two doors made of blue skies swing open while a partial screen covers the entrance from the top


Another issue is that models are harder to train the bigger they get. That’s not just down to the cost of computing power. The training process breaks down more often with bigger models and needs to be restarted, making those models even more expensive to build.

In practice there is an upper limit to the number of parameters that most groups can afford to train, says Biderman. This is because large models must be trained across multiple different GPUs, and wiring all that hardware together is complicated. “Successfully training models at that scale is a very new field of high-performance computing research,” she says.

The exact number changes as the tech advances, but right now Biderman puts that ceiling roughly in the range of 6 to 10 billion parameters. (In comparison, GPT-3 has 175 billion parameters; LLaMA has 65 billion.) It’s not an exact correlation, but in general, larger models tend to perform much better.   

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button