Click to learn more about author Paul Barba.
As most anyone following AI knows, the R&D company OpenAI recently released a paper on GPT-3, its third-generation language model which, at 175 billion parameters, has the claim to fame of being about an order of magnitude larger than any language model that came before it.
Currently available in private beta to select developers, GPT-3 has shown that it can generate everything from believable short stories, rap songs, and press releases to HTML code for creating web page layouts, all with minimal inputs or prompts. This is a very big deal.
Until now, the most advanced language models include Google’s BERT, Microsoft’s Turing Natural Language Generation, and GPT-3’s predecessor GPT-2, which can do things like complete sentences in a natural-sounding way, suggest short replies to email messages, offer answers to basic questions, and generate text that seems like it could be written by a human. While impressive, oftentimes, these models also generate clunky or absurd results, giving skeptics reason to believe that we’re still a very long way from machines being able to approximate human-level language capabilities.
With Language Models, Bigger Is Better
What the GPT-3 paper demonstrates is how the size of language models influences accuracy on a number of language tasks that humans show mastery of: closed book question answering, resolving ambiguous pronouns, common sense reasoning, and advanced reading comprehension, to name a few. Owing to its sheer size, GPT-3 goes well beyond what simpler language models can achieve.
Looking at all the different models, clear trendlines appear. Adding more parameters has a fairly predictable impact on accuracy. And following the trends, we consistently see an intersection with human-level accuracy in the 15-20 trillion parameter range, about 100 times larger than GPT-3.
The Cost of All That Computing Power
Estimates for the cost of training GPT-3 range from $4-12 million, which appears to grow linearly with model size. Therefore, we can extrapolate that a 15-20 trillion parameter model would require somewhere in the $300M to $1.3B investment range.
The promise would be that, given a handful of examples of some textual task, the model could perform at a human level without any retraining, hence general AI — that is, rather than using examples as training data to change the synapses in the brain, it figures out how to do the task by just “processing” it. Frequently, a description of the task with no examples would suffice.
Some Caveats
While this is just rough trendline fitting, several caveats exist: 1) Small errors could lead to an order of magnitude effects in the extrapolation, 2) There’s no guarantee the existing trend holds, and 3) It’s unclear if there’s sufficient text available to just keep scaling up what’s being done today. Even the authors note the potential shortcomings of the model, writing:
“A more fundamental limitation of the general approach described in this paper — scaling up any LM-like model, whether autoregressive or bidirectional — is that it may eventually run into (or could already be running into) the limits of the pretraining objective.”
So, although the tasks that GPT-3 is performing are difficult for most machines and even humans at times, they seem unlikely to plumb the depths of human abilities like complex programming, novel writing, or sophisticated strategic reasoning. And more concerningly, it has shown, like many other AIs before it, that it is susceptible to generating biased language around gender, ethnicity, sexuality, and race.
So, Will $1B Lead to General AI?
Caveats aside, what’s incredible is that GPT-3 suggests we’re within striking distance of the computational power needed to run a real general-purpose AI. At a $1B investment, it seems plausible we could achieve an AI capable of things like passing the Turing Test, having general conversations on wide-ranging topics in a completely human way (or even inhuman because it knows more than a normal human), being able to complete complicated instructions, and forming the backbone of virtual assistants that can do basically anything a human assistant could, not to mention the various business contexts in fields like sales, marketing, and programming.
For decades people have been predicting an imminent emergence of true artificial intelligence. For the first time, these extrapolations don’t seem so far-fetched to me. Core improvements to our deep learning architectures are almost certainly still required, but the basic building blocks seem to be here, and an extrapolated computational budget seems well in reach of large corporations and governments, never mind what further advances in hardware might bring.