Introduction to Generative AI

Introduction to Generative AI
SparkOptimus TeamMatti van Engelen
Written by
Matti van Engelen
&
The SparkOptimus Blog Team

<div class="insights_cta-component">This article is the first in our new series of articles on Generative AI. See the full series below</div>

In this series, we will guide you through the hype, showing where you can leverage Generative AI in your business to create value. We aim to educate and inspire CEOs, CCOs, innovation directors, and other business leaders interested in Generative AI. In this first article, we will provide you with a history of Generative AI, an overview of how it works, and a preview of other topics we will touch on in the rest of the series.

Table of contents

Introduction to AI

To fully appreciate why Generative AI is such a disruptive technology, we need to place it in context of the broader AI landscape, and explain what tasks classic (non-generative) AI systems have focused on. Let’s start with some basic terminology and buzzwords.

In essence, AI is any machine performing tasks that would typically require human intelligence. Almost all AI-focused development currently utilizes machine learning: algorithms that allow machines to train themselves on existing data, and then use what they learned to make predictions about new data. Some example of machine learning algorithms are neural networks, decision trees, and support-vector machines. A specific sub-field of machine learning is deep learning; where the ‘deep’ refers to such algorithms using multiple ‘layers’ – in essence increasing the complexity of the model to allow it to ‘learn’ better.

Machine learning algorithms have historically focused on two main tasks:

  1. Classification – grouping objects into preset categories. E.g., segmenting your customer base depending on their buying patterns, or classifying emails as spam or not spam
  2. Prediction – predicting a future value based on historic data. E.g., identifying a customer’s most likely next purchase, or predicting a stock value

Importantly, both of these tasks are about predicting a ‘bounded’ value. In classification algorithms, the desired categories are determined beforehand (e.g., ‘spam’ and ‘not spam’) and in prediction algorithms the value or item to be predicted is also from a fixed option set (e.g., ‘stock price’ is always be a € amount, while ‘next purchase’ is always an available product).

Classic machine learning algorithms, focused on classification and prediction, have been heavily used across industries for over a decade.

How is generative AI different?

Generative AI is any AI model that can generate new content, i.e., content that has never been seen before. There are already Generative AI systems focused on text, images, audio, video, code, and more. These models are a specific type of deep learning models.

The key difference with classic AI systems is that these Generative AI algorithms are extremely ‘unbounded’ – i.e., they can generate content that is not constrained by any boundaries (e.g., ChatGPT generates answers to question, but these answers are often text that has never been seen before – and the answers certainly don’t come from a set of available answers).

Why now?

So why has Generative AI only really taken off in 2022, when these classic AI models have been around for over a decade? There are three main drivers of the recent progress in Generative AI:

  1. New machine learning models
    The machine learning models used for Generative AI have only been around since 2017 – when Google published a paper called ‘attention is all you need’, which showed the development of the ‘Transformer’ model. This model was then used to develop the first Large Language Models (LLMs) – more information on this in the next section.
  2. Capital investments
    Microsoft invested $1 billion in OpenAI (the creator of ChatGPT and Dall-E) in 2019, and added an additional $1 billion in 2023 after the launch of ChatGPT. Besides funding for model development, there are significant costs associated with model training and upkeep. It’s estimated that training a model like ChatGPT costs anywhere from $5- 25 million. To generate an answer to a question on ChatGPT is estimated to cost around $0.01 – $0.10, around 20-200x more than the cost of executing a Google search.
  3. Decreasing computing costs
    As processing power has developed, and more and more specialized chips have become available to run large machine learning models, costs to train these models have come down significantly. While still expensive, costs would have been prohibitively high even a few years ago.

How does generative AI work?

To understand how complex models like ChatGPT are trained, we need to take a slightly deeper look at Neural Networks, Transformers, Large Language models and Reinforcement Learning with Human Feedback. While all expansive research topics in their own right, we will provide you with the basics of what this all means, how they come together to create ChatGPT, and how they explain some of the current barriers in Generative AI.

<div class="insights_cta-component">Note: The way Generative AI works for image and video creation is different, but Transformers are still at the core of developments in this area. Since text generation models are most advanced, we will focus on them for the remainder of this section.</div>

Let’s start with one of the most popular machine learning models out there: the Neural Network.

In essence, a neural network is a machine learning algorithm that translates some input into some output (e.g., Spanish language to English language; or a picture of a cat or a dog into the word ‘cat’ or the word ‘dog’).Neural networks are loosely modelled on the neurons in a biological brain. They consist of layers of neurons, that transmit ‘signals’ to other neurons. These ‘signals’ are essentially real numbers, and the way signals are combined in a neuron is a function of those numbers. Neural networks can ‘learn’ by changing the function used by each neuron to combine its input signals.

One thing neural networks have traditionally been quite poor at, is processing a large input sequence all at once. For example, when translating from Spanish to English, they would treat the Spanish sentence word-for-word, leading to potentially wrong translations (e.g. “No hablo Español” would lead to “No speak Spanish”).

Enter the Transformer: these are a specific type of neural network that contain efficient ways to deal with large pieces of input all at once (e.g., leading to the right translation of “I don’t speak Spanish”).

These Transformer models were then used to create the first Large Language Models (LLMs). An LLM is still a neural network (Transformers are neural networks), so it translates some input into some output. LLMs translate an input sentence into an output sentence that adds one word. For example, its input might be ‘The Shawshank’ and its output would be ‘The Shawshank Redemption’. These models are typically trained on large datasets, consisting of internet data and books. This allows LLMs to generate very naturally-sounding text, by taking into account a long piece of input text, and predicting the next word.

There are many such LLMs out there, including GPT-3 (the basis for the first version of ChatGPT), LaMDA (used by Google to power Bard), and LLaMA (by Facebook). An LLM is used as a basis to train models like ChatGPT; but can also be trained to perform other tasks, like answering customer service questions, or generating contracts.

To illustrate how this training process works, let’s take a look at how ChatGPT was trained. In essence, ChatGPT is still a neural network (specifically a transformer): it takes some input (a question) and transforms it into some output (an answer). As mentioned before, it uses an LLM as a baseline model, and is further trained to perform this specific task.

One of the key challenges for models like ChatGPT is that there is limited training data available – there simply aren’t enough question-answer pairs to train the model (compare this, for example, to training data on translating Spanish to English. Therefore, OpenAI hired 40 people (called ‘labelers’) to manually write & answer 15.000 questions. This input and output was used to train a basic version of ChatGPT. They then used this basic version to answer 35.000 additional questions multiple times, and asked the labelers to rank these answers from best to worst.

Their feedback was used to improve the model even further. And now comes the key step: a second neural network, called a ‘reward model’ is trained on the 35.000 questions and ranked answers, to mimic the ranking behavior of the labelers. This reward model can then be used in a loop with the basic version of ChatGPT to improve it: ask a question, answer it multiple times, rank the answers, and use that to improve the model. This type of training process is called ‘reinforcement learning with human feedback’.

This immediately illustrates why some particular behavior arises in these type of models. They can be very convincingly wrong, because labelers prefer convincing answers (especially when they don’t know the answer to a question themselves). They are generally quite friendly, again, because the labelers prefer friendly answers. In the end, these 40 labelers have quite a significant influence on the output of the model – as the reward model is based on their ranking.

How will consumers use generative AI?

If you’ve been using ChatGPT or similar tools in the past months, you’ve probably already experienced that the possible uses of Generative AI for consumers are almost limitless: from planning holidays, to finding recipes, and from personal fitness programs to movie recommendations. The essential difference between Generative AI and other ways of finding this information, is that consumers can adapt their queries to receive completely personalized responses. No longer do they need to sift through blogs or rely on search engines to provide them with the information they need, they can find it using Generative AI.

With Generative AI being integrated in search engines, even more consumers will start using Generative AI to find the information they need, which will most likely have a significant impact on SEA.

How will generative AI impact your business?

In our next article, we will discuss in more detail how Generative AI will impact your organization across business function: including marketing, legal, IT, and customer service. Although Generative AI is still an immature technology, there are plenty of use cases right around the corner.

<div class="insights_cta-component">Questions? Comments? Want to have a conversation with our experts about the contents of this article? Get in touch with our team now!</div>

Next up in this article series

We have learned a lot through helping our clients over the years, and we’ll be sharing our key insights with you in a number of publications – see below the list of topics we will cover:

  • Generative AI use cases – Exploring what Generative AI applications are out there already, and which we expect to be possible in the (near) future
  • Generative AI and the future of work – How Generative AI will impact our work, which jobs will change or disappear, and which new roles will be required
  • The importance of data in Generative AIWhy high quality data is (even more) important for Generative AI, and how you can get your data ready for Generative AI
  • Tooling & prompting – What tools are available to implement Generative AI, and how can you write efficient prompt to leverage tools like ChatGPT in your work
  • Generative AI risks/pitfalls/ethical/diversity concerns – What can go wrong and how to make sure to avoid it

Stay tuned!

We hope you’re as excited as we are and please let us know if you have specific topics or questions you would like us to share with you.

Matti van Engelen
Practice Lead Data & AI

Ready to transform your organization? Discover how with our 1-week scan.