Everyone has an idea of what “AI” is and what it does based on colloquial usage. It’s hard to tell apart hype from reality. We’ve all argued over whether its output is genius or madness. Companies are making dramatic overtures towards its adoption––some of them are even telling the truth. Its ability to write poetry is both shockingly adept and coldly mechanical.

What’s it really doing under the hood? Some have taken to calling it a “fancy autocomplete.” That’s an overly reductive view, but not completely off-base. It’s using a Transformer, a type of neural network, to predict the next token of a text given its previous inputs. That’s the “T” in “GPT.”

The details are unimportant to most business owners; just know that it’s eerily effective at predictions given sufficient pre-training. That’s the “P” in “GPT.”

You can think of AI as a big math problem with billions of variables that gets solved every time you ask it a question. A neural network isn’t like an app that runs on your phone––it’s a purely mathematical construct. It’s a matrix that gets traversed from one layer to the next in order to generate its output. That’s the “G” in “GPT.”

A neural network is not unlike neurons in the human brain, which is where the name comes from. It’s a vast lattice of nodes, each that takes some tiny input, transforms it, and sends it out to the next node. Over a large enough surface area, this is what we could call “intelligent.”

Also like the human brain, a neural network is trained to recognize the right answers by iterating many times over the wrong answers, getting marginally closer each time to what the scientists defined in advance as being correct; essentially, it’s practicing. Once it gets close enough, it will be fine-tuned and inched ever closer to correctness. The “right answers” are determined by statistical properties about the training data (the corpus) called weights. This is how it knows whether the word “blue” means a color or a mood.

As we’ve noted: a neural network is a graph with parameters, not a computer program. When a model is finished and ready for distribution, the actual data that physically exists on disk isn’t code ready to execute––it’s those weights, which are simply long sequences of numbers. The actual program that runs the graph is the inference engine, a totally distinct concern with its own hardware requirements.

Practically speaking, no one is training their models from scratch––they start with one that’s already trained by one of the big players and then fine-tune it for their needs. They produce the large language models (LLMs) you associate with AI, although LLMs are not the entirety of the AI world.

You may have heard of AI hallucinations. Since a model’s output is entirely based on statistics, the “creativity” of its output can be adjusted in order to make the graph more adventurous along its journey. This is a knob on the model that can be tuned called the temperature. Much like your own brain, sometimes you want it to be more creative and other times you don’t. All answers are hallucinated, even the right ones, it’s just that sometimes it looks more obvious than others. Because a model’s output is non-deterministic even given the same input, there is variation on how reliable people say it is.

Over a large enough sample size, it starts to resemble the statistical average of the human condition itself. That’s why it tends to sound a certain way unless prompted otherwise (and even then it struggles). It repeats memetic patterns, clichés, punctuation, emoji, and other artifacts of our analog communication. It holds a mirror up to our collective psyche and we don’t always like what it shows us. Some have expressed concern that we are the ones now being trained on AI rather than the reverse––that our thoughts are being warped by what it tells us.

But that’s beyond the concern of business owners. Hopefully we’ve established a solid baseline of knowledge for future articles. We will be diving into each of these steps in more detail, so stay tuned.