The Dopamine Series Part 1: The Learning Algorithm Hidden in Your Brain
The algorithm that beats world champions at Go is running in your brainstem right now.
In 2016, DeepMind's AlphaGo crushed Lee Sedol, one of history's greatest Go players. The core engine: temporal difference learning. Researchers later used the same approach to crack protein folding—a 50-year unsolved problem—and won a Nobel Prize for AlphaFold.
The strange part: computer scientists didn't invent this algorithm. They discovered it. In your neurons. Dopamine has been running temporal difference learning for millions of years before anyone wrote a line of code.
The Incomplete Story
You've heard the simplified versions:
Dopamine equals pleasure. Wrong. Dopamine fires before the reward, not during. The spike is anticipation, not enjoyment.
Dopamine encodes reward prediction error. Closer. Reality exceeds expectations, dopamine spikes. Reality disappoints, it dips. This is the Rescorla-Wagner model from 1972—correct, but incomplete.
There's a deeper story. One that explains why AI researchers found the same algorithm in their code that neuroscientists found in your brain.
The Temporal Difference Insight
The old model: compare what you expected to what you got. Predicted reward of 5, received 7, prediction error of +2. Dopamine fires. Learning happens.
But this breaks in the real world. Rewards are delayed. A foraging animal doesn't know if a path leads to food until it reaches the end. If learning only happens at final outcomes, you're stuck waiting. No updates mid-journey.
Richard Sutton and Andrew Barto solved this in 1988. The insight: don't compare expectations to outcomes. Compare successive expectations to each other.
Picture an animal searching for food. It starts down a path with some expectation of success. Five steps later, it spots a familiar landmark—this path led to berries before. Its expectation jumps. That jump is the reward signal. Dopamine fires not because it found food, but because its prediction of finding food just improved.
No waiting for the end. The system updates continuously. Every moment, dopamine compares "what I expected a second ago" to "what I expect now." Improvement triggers a spike. Deterioration triggers a dip.
Learning happens in real time. Long behavior sequences work because each step reinforces the previous one. The algorithm bootstraps itself through time.
The Algorithm That Crawled Out of Your Mind
In the 1990s, neuroscientists started recording from dopamine neurons during learning tasks. The results were startling.
Firing patterns matched Sutton and Barto's algorithm exactly. Not approximately. Exactly. Dopamine neurons weren't just computing reward prediction error—they were computing temporal difference reward prediction error. Comparing successive expectations. Updating in real time. Chaining predictions forward through time.
The algorithm that computer scientists derived from first principles was already installed in every mammal on earth.
Fast forward to DeepMind. AlphaGo, the system that beat the world Go champion? Temporal difference learning. AlphaFold, which cracked the 50-year protein folding problem and won a Nobel Prize? Same algorithm.
As neuroscientist Reed Montague puts it: the algorithm "crawled out of your mind into a program." Now systems running that algorithm routinely exceed human performance at tasks humans invented.
A strange loop. We discovered an algorithm in our neurons. We implemented it in silicon. The silicon beat us at our own games. Now we use those AI systems to understand our neurons even better.
The Opponent: Serotonin
Dopamine doesn't work alone. It has an opponent.
When dopamine rises, serotonin often falls. When serotonin rises, dopamine often drops. They're not just different neurotransmitters—they appear to compute different things, though the relationship is more complex than simple opposition.
Dopamine tracks positive expectations. Things getting better. Rewards approaching.
Serotonin's role involves behavioral inhibition, patience, and time perception. It modulates waiting and impulse control rather than simply tracking negative predictions.
This explains a paradox about SSRIs. These antidepressants boost serotonin. The dopamine-serotonin systems are entangled—shifting one affects the other, though the mechanisms vary by brain region and context.
This might be why SSRIs work brilliantly for some people and backfire for others. The dopamine-serotonin balance is individual and region-specific. Manipulate one, and you influence both—in ways nobody can fully predict.
The opponent-like dynamics also explain why both systems exist. Dopamine alone creates reckless optimism—every path looks good, every risk worth taking. Serotonin alone creates paralysis—too much waiting, too much caution. Together they create balanced learning. Opportunity and restraint, both computed.
When the System Shifts
Dopamine signaling can change under chronic stress or trauma. The system doesn't simply "flip"—rather, baseline states shift and receptor sensitivity alters.
Under prolonged stress, deprivation, or perceived emergency, dopamine circuits can begin predicting threats instead of rewards. This represents adaptation to a hostile environment, not the same neurons reversing their computation.
This explains the "hungry judge" phenomenon. Judges hand down harsher sentences right before lunch. More lenient after eating. Not willpower failure—altered dopamine signaling. In a depleted state, the prediction system tilts toward negativity.
Trauma can make this chronic. If your nervous system stays in emergency mode, dopamine keeps computing threats instead of rewards. The same algorithm that learned to chase pleasure now learns to anticipate pain.
This isn't weakness. It's not moral failure. It's an algorithm adapting to survival conditions. The fix isn't trying harder—it's signaling to your nervous system that the emergency is over.
What This Means for You
Understanding dopamine as an algorithm changes how you think about motivation, learning, and decisions.
Dopamine is currency. Your brain uses it to compare unlike things—food, sex, achievement, status—on a common scale. This is why you can choose between options that have nothing in common. Dopamine converts everything to the same unit: expected value.
Effort strengthens the circuit. Deliberately slowing down and making things harder strengthens the dopamine learning system. Easy dopamine—phones, sugar, cheap entertainment—trains weak predictions. Effortful dopamine—delayed rewards, hard problems, real achievement—trains strong ones.
You're always foraging. The dopamine system evolved for animals making continuous decisions: stay in this patch or move on? Apply for this job or keep looking? Date this person or keep searching? The algorithm doesn't know it's 2026. It thinks you're still hunting berries.
This foraging logic explains why "good enough" feels so hard. Your dopamine system constantly compares the current option to the expected value of alternatives. Even when you're satisfied, the algorithm keeps computing: stay or go?
The Takeaway
Your brain runs algorithms that computer scientists independently discovered and built into machines. Those machines now beat humans at games humans invented, predict protein structures no human could model, and keep improving.
This isn't metaphor. It's not "like" an algorithm. It is the algorithm. Temporal difference learning. Successive expectations updating in real time. Rewards chained through predictions of predictions.
Understanding this reframes everything. Motivation isn't mysterious—it's output from a prediction system. Learning isn't vague—it's prediction errors updating weights. Decisions aren't arbitrary—they're foraging computations on a dopamine scale.
The algorithm in your head powers AlphaGo and AlphaFold. It's also plastic—it updates based on experience. Every prediction that resolves, every expectation that exceeds or disappoints reality, trains the model.
You're training it right now.
The question is whether you're doing it deliberately.
What You Can Do Right Now
Your dopamine system updates constantly. Here's how to train it deliberately:
This week: Pick one task where you'll track prediction accuracy. Before starting, write down: "I expect this to take X minutes and result in Y outcome." After finishing, compare expectation to reality. You're training your prediction system with accurate data.
Today: Notice when you switch tasks mid-effort. That's your dopamine system pulling you toward something with better predicted value. Don't fight it—investigate it. What expectation failed? What alternative looked better?
The algorithm learns from every resolved prediction. You're already training it. These exercises make the training deliberate.
Next: In Part 2, discover why twenty million people paid money to die hundreds of times in Elden Ring—and how the game accidentally implements your brain's learning algorithm perfectly. You can steal the technique.
Read Part 2: The Elden Ring Effect →
The Dopamine Series
This is Part 1 of a series exploring how your brain's reward system actually works:
- Part 1: The Learning Algorithm (you are here) — What dopamine really does
- Part 2: The Elden Ring Effect — See the algorithm in action through gaming
- Part 3: The Controller in Your Skull — How to hijack your reward circuitry
- Part 4: Motivation Is Not a Resource — Apply it to daily life
This article draws on Richard Sutton and Andrew Barto's work on temporal difference learning, Wolfram Schultz's recordings of dopamine neurons, and Reed Montague's computational neuroscience research on valuation and decision-making.