Generative AI has a fairly short history, with the technology being initially introduced during the 1960s, in the form of chatbots. It is a form of artificial intelligence that can currently produce high-quality text, images, videos, audio, and synthetic data in seconds. However, it wasn’t until 2014, when the concept of the generative adversarial network (GAN) was introduced, that generative AI evolved to the point of being able to create images, videos, and audio that seem authentic recordings of real people.
Currently, generative AI is a major component of ChatGPT and its variations.
The 1950s
Generative AI is based on machine learning and deep learning algorithms. The first machine learning algorithm was developed by Arthur Samuel in 1952 for playing checkers – he also came up with the phrase “machine learning.”
The first “neural network” capable of being trained was called the Perceptron, and was developed in 1957 by a Cornell University psychologist, Frank Rosenblatt. The Perceptron’s design was very similar to modern neural networks but only had “one” layer containing adjustable thresholds and weights, which separated the input and output layers. This system failed because it was too time-consuming.
The 1960s and 1970s
The first historical example of generative AI was called ELIZA. It could also be considered an early version of chatbots. It was created in 1961 by Joseph Weizenbaum. ELIZA was a talking computer program that would respond to a human, using a natural language and responses designed to sound empathic.
During the 1960s and ’70s, the groundwork research for computer vision and some basic recognition patterns was carried out. Facial recognition took a dramatic leap forward when Ann B. Lesk, Leon D. Harmon, and A. J. Goldstein significantly increased its accuracy (Man-Machine Interaction in Human-Face Identification, 1972). The team developed 21 specific markers, including characteristics such as the thickness of lips and the color of hair to automatically identify faces.
In the 1970s, backpropagation began being used by Seppo Linnainmaa. The term “backpropagation” is a process of propagating errors, backward, as part of the learning process. The steps involved are:
Processed in the output end
Sent to be distributed backward
Moved through the network’s layers for training and learning
(Backpropagation is used in training deep neural networks.)
The First AI Winter Separates Machine Learning and Artificial Intelligence
The first AI winter began and ended from roughly 1973 to 1979 – promises were made, but expectations weren’t kept. Agencies who had funded research for artificial intelligence (Darpa, NRC, and the British government) were suddenly embarrassed by the lack of forward movement in its development.
However, machine learning (ML) continued to evolve. Not because it was still receiving government funding, but because machine learning had become extremely useful to business as a response tool. Machine learning had started as a training technique for AI, but it was discovered it could also be used to perform simple tasks, such as answering the phone and transferring calls to the appropriate person. While ML programs might not be capable of carrying on an intelligent conversation, they could perform basic, but very useful tasks. Businesses were not interested in giving up on a tool that was both cost-efficient and useful.
Businesses chose to fund their own research for the development of machine learning, and former researchers reorganized themselves into a separate industry – until merging with AI again in the 1990s.
Although neural networks were proposed in 1944 by two University of Chicago researchers, Warren McCullough and Walter Pitts, the first functional “multilayered” artificial neural network, the Cognitron, was developed in 1975 by Kunihiko Fukushima.
Neural networks lay a foundation for the use of machine learning and deep learning. Their design supports input and output layers, and the hidden layers between them are used to transform the input data, making it useful to the output layer. With this new design, facial and speech recognition improved dramatically. Hidden layers also provide the foundation for deep learning.
In 1979, Kunihiko Fukushima suggested developing a hierarchical, multilayered artificial neural network, that he named Neocognitron. This was the first deep learning neural network. His design supported the computer’s ability to learn how to identify visual patterns, and more specifically, handwritten character recognition. His design also allowed significant data to be adjusted manually, allowing humans to increase the “weight” of certain connections.
In 1982, another discovery was made by John Hopfield, who developed a new form of neural network – the Hopfield net – using an entirely different approach. The Hopfield net collected and retrieved memories more like the human brain does than previous systems did.
However, the second AI winter began roughly in 1984 and continued until 1990, and slowed the development of artificial intelligence, as well as generative AI. The anger and frustration with broken promises and broken expectations were so intense, the term “artificial intelligence” took on pseudoscience status, and was often spoken about with contempt. A broad sense of skepticism had developed regarding AI. Funding was, unfortunately, cut for the majority of AI and deep learning research.
In 1986, David Rumelhart and his team introduced a new way of training neural networks, using the backpropagation technique developed in the 1970s.
Deep learning became a functional reality in the year 1989, when Yann LeCun and his team used a backpropagation algorithm with neural networks to recognize handwritten ZIP codes.
Deep learning uses algorithms to process the data and to imitate the human thinking process. It employs layers of algorithms designed to process data, visually recognize objects, and understand human speech. Data moves through each layer, with output from the previous layer presenting input needed for the next layer. In deep learning, the additional layers that are used provide higher-level “abstractions,” producing better predictions and better classifications. The more layers used, the greater the potential for better predictions.
Deep learning has become an extremely useful training process, supporting image recognition, voice recognition, and processing vast amounts of data.
The 1990s and AI Research Recovers
Because funding for artificial intelligence began again in the 1990s, machine learning, as a training mechanism, also received funding. The machine learning industry had continued to research neural networks through the second AI winter, and began to flourish in the 1990s. Much of machine learning’s continued success was the use of character and speech recognition, combined with the overwhelming growth of the internet and the use of personal computers.
The concept of “boosting” was shared in 1990, in the paper The Strength of Weak Learnability, by Robert Schapire. He explained that a set of weak learners can create a single strong learner. Boosting algorithms reduce bias during the supervised learning process, and include machine learning algorithms that are capable of transforming several weak learners into a few strong ones. (Weak learners make correct predictions over slightly 50% of the time.)
The computer gaming industry deserves significant amounts of credit for helping in the evolution of generative AI. 3D graphics cards, the precursors to graphic processing units (GPUs), were first introduced during the early 1990s to improve the presentation of graphics in video games.
In 1997, Juergen Schmidhuber and Sepp Hochreiter created the “long short-term memory” (LSTM ) to be used with recurrent neural networks. Currently, the majority of speech recognition training uses this technique. LSTM supports learning tasks that require a memory covering events thousands of steps earlier, and which are often important during conversations.
Nvidia (responsible for many game technology advancements) developed an advanced GPU in 1999, with computational speeds that were increased by a thousand. Their first GPU was called the GeForce 256.
It was a surprising realization that GPUs could be used for more than video games. The new GPUs were applied to artificial neural networks, with amazingly positive results. GPUs have become quite useful in machine learning, using approximately 200 times the number of processors per chip as compared to a central processing unit. (Central processing units, or CPUs, however, are more flexible, and perform a broader selection of computations, while GPUs tend to be tailored for specific use cases.)
The 2000s
The Face Recognition Grand Challenge, a promotion to improve facial recognition technology, was funded by the U.S. government and took place from 2004 and 2006. It resulted in new facial recognition techniques and face recognition performance. The newly developed algorithms were up to ten times more accurate than the face recognition algorithms used in 2002. Some of the algorithms could even identify differences between identical twins.
The 2010s and Virtual Assistants and Chatbots
On Oct 4, 2011, Siri, the first digital virtual assistant that was considered functional, came as a service with the iPhone 4S. The use of chatbots also increased significantly.
In 2014, the concept of the generative adversarial network (GAN) was presented. GANs are used to create images, videos, and audio that seem like authentic recordings of real situations.
A generative adversarial network uses two neural networks that have had simultaneous adversarial training: One neural network acts as a discriminator and the other as a generator. The discriminator has been trained to distinguish between generated data and real data. The generator creates synthetic data and tries to imitate real data. Practice allows the generator to become better at generating ever-more realistic recordings to trick the discriminator. GANs can create synthetic data that is difficult, if not impossible, to recognize as artificial.
The 2020s and Smarter Chatbots
In November of 2022, OpenAI introduced ChatGPT, a generative AI combined with large language models. ChatGPT, and its variations, have achieved a new level of artificial intelligence. These “smarter chatbots” can perform research, support reasonably good writing, and generate realistic videos, audio, and images.
The combination of generative AI training with large language models has resulted in artificial intelligence that has the ability to think and reason. They also might have the ability to “imagine.” ChatGPT has been accused of hallucinating, which could be interpreted as the use of imagination.