It’s almost unbelievable that a few years ago we were struggling with the very limited technology of chatbots, which, while useful, were only capable of providing very structured answers to a client’s questions. And now, enter 2022, the year where a Google engineer claims that a chatbot is sentient, and where you can use AI to talk to the deceased. Welcome to the future.
The idea of a talking computer is nothing new, in fact, it can be traced to the early beginnings of computer science. Early electronic computers (back in the day, “computer” was the term for people who made calculations) were used to crack the Enigma machine, a cipher device widely used by the Third Reich for communications during World War II.
Years later, Alan Turing would posit the idea that to prove computer sentience would require a machine with such advanced language capabilities, that a human wouldn’t be able to differentiate between it and another human being. This is what we know of today as the Turing test.
How did we go from an algorithmic answer to AI capable of fully fledged conversations? And what does that mean for the tech industry?
How Do Language Models Work?
By most historical accounts, the first “chatbot” was ELIZA, a project created by Joseph Weizenbaum from MIT in 1966. By today’s standards, it was a rudimentary algorithm that detected certain keywords and returned an open-ended question. For example, if I wrote about my mother’s cooking, ELIZA would follow up with “Tell me more about your mother.”
ALICE was the first chatbot to create replies based on natural language processing, which allowed for more sophisticated conversations. Unfortunately, the more you talked with ALICE, the more the model would be strained and oddities would pop up. In other words, it was fine for simple exchanges, but it couldn’t handle complex inputs or long-term conversations. Why?
ALICE, like most language models, is based on probability. As far as we know, computers don’t understand language in the same terms as we do. If I think of the word “apple,” I can ponder about the meaning of the word, as in, what is an apple? We can think about its qualities, the taste of an apple, examples of apples I’ve tasted in my life, and so on. Language models think in terms of regression weights.
Let’s say, for example, that I ask someone if they want an apple. They ponder for a few seconds, perhaps evaluating if they are hungry, if they trust me enough, or if they have an appetite for fruit at this precise moment. Based on this attribute process, they reply, “Sure, I would love an apple.”
In contrast, the language model takes the question “Do you want an apple?” and knows that this sentence is correlated with either an affirmative or a negative answer. It also calculates the probabilities and realizes that in the context of an affirmative answer for that question, it should involve words such as “appetite” and “thank you.” The end result could be something like, “Yes, that sounds appetizing, thank you very much,” or even “Of course not, silly. I’m a computer, I don’t eat.”
GPT-3 and the Power of Big Data
Considering the almost infinite amount of possible words and their relations to one another, we need 2 things to train a language model capable of creating human-like conversation. First, much like a child, a computer needs to learn words and how they are connected. Second, we need the processing power to handle the very complex nature of language. A model can only go as far as the hardware it runs on.
The year 2020 saw the release of GPT-3 by OpenAI, a San Francisco-based artificial intelligence laboratory that has garnered the attention of investors and computer scientists worldwide. As the name implies, this is the third version of their language transformative model, and it’s leaps and bounds better than anything that has come before.
GPT-3 is one of the most advanced language models on the planet. It was trained with millions of sentences from websites, Wikipedia, and books. The end result is a language model with the unprecedented size of a 2,048-token-long context and 175 billion parameters—making it one of the biggest language models, requiring 800 gigabytes of storage. That is absolutely massive by any standard.
Big Data solved one of the biggest hurdles of artificial intelligence: having enough data to accurately train a model. GPT-3 wouldn’t be what it is without Common Crawl, a nonprofit organization that offers massive datasets of petabytes of web data. It’s a massive archive with millions of webpages freely available for data scientists worldwide.
That kind of connectivity, plus the exponential growth in computer power in the last couple of decades, has created nothing short of a revolution in terms of AI and, more specifically, natural language processing. GPT-3 is so advanced that it was able to create a paper about itself that is currently being reviewed and potentially published.
Lessons To Learn From GPT-3
OpenAI is offering its commercial API to the public. For any company interested in a powerful language model for their own projects or chatbots, it’s a gift from the heavens. As an example, Replika, the AI chatbot marketed as an empathic friend that garnered the attention of both the media and investors, was originally built using the OpenAI API.
Why would we want to work with a language model like GPT-3? On one hand, this is opening the doors to a massive disruption in the healthcare industry. Imagine AI doctors or counselors serving as the first line of interaction with clients and patients, redirecting patients based on their symptoms to human doctors or psychologists, while also providing simple advice and reminders such as how and when to take certain medicines.
And that’s just the tip of the iceberg. Elon Musk’s robot presentation showed that there is a market out there for AI assistants, be it robots or virtual companions, and models like GPT-3 provide more natural human-computer interactions.
Imagine GPT-3 helping students with learning disabilities, guiding their writing process, and providing support to foster their growth. Or how about an office companion who takes data and produces an executive summary in seconds? What about digital content creators?
On the other hand, while few companies and startups would have the resources of a massive institution like OpenAI, that doesn’t mean that there isn’t room for smaller companies. GPT-3 is impressive, but it’s not the only model out there, GPT-J and GPT-Neo are open-source alternatives that offer the same functionalities as OpenAI’s solutions without the concern for a commercial license. Same technology from smaller groups.
The lesson here is that we are living in a moment where even a casual gamer has a graphic card with enough processing power to train an AI. One look at GPT-3 should tell us the value of nonprofit organizations and open-source technologies and the role they have in empowering small-scale projects.
And finally, GPT-3 is a reminder that we are living in volatile times where massive leaps in technological solutions are creating new opportunities. It’s time to keep our ears to the ground for whatever is coming next and ask ourselves how technology can empower our own business?