Written by futurist Thomas Frey
From the Shirtless Philosopher of 1943 to ChatGPT — people, breakthroughs, winter, and a single idea that refused to die
man without a shirt
In January 2026, Marc Andreessen sat down for an 81-minute podcast conversation on his a16z show and did something that most technology commentators wouldn’t bother to do. He started at the beginning. This is not the beginning of AI cycles, large-scale language models, or even deep learning. He began speaking in 1943. Papers, seaside villas, and neurophysiologists are seen shirtless discussing the future of computing in archival footage from 1946. He clearly doesn’t care about the formality that comes with this topic.
That man was Warren McCulloch. His observation that computers could one day be built on models of the human brain, using neural networks rather than pure mathematical logic, was a path not taken for most of the next 80 years. Andreessen’s point was simple and important. What feels like an overnight revolution is actually the price of 80 years of gambling by a small group of people who spent most of that time being ignored, defunded, and told they were wrong. Understanding that history explains why what’s happening now is different from everything that’s come before, and why it probably won’t stop.

Shirtless philosopher Warren McCulloch, thinking about the first artificial neuron, has proven a revolutionary idea. The idea is that intelligence can be built rather than programmed, setting a path that will take decades to fully unfold.
1943: The paper that started it all
Neurophysiologist Warren McCulloch and math genius Walter Pitts ran away from home to attend college lectures as teenagers, and while technically homeless at the time, published The Logical Calculus of Ideas Inherent in Neural Activity at the University of Chicago. This paper proposed the first mathematical model of neural networks. It is an artificial neuron that takes input, applies a weighted threshold, and performs an output based on logical rules.
The ideas embedded in it were radical. The logic of the human brain is formally described and can be replicated computationally. Rather than being imitated by clever programming, it is actually replicated by interconnected units that learn by adjusting their own weights. The computer industry chose a different path. The idea was to build a literal mathematical machine that executed explicit instructions at high speed. It takes another 80 years for the neural pathways to fully develop. But it has always been there and trended by a minority who believe it is the more important direction. John von Neumann cited this paper. Norbert Wiener found it fundamental. Inspired by McCulloch, Marvin Minsky, who would later become one of the central figures in AI, built an early neural network in 1951 that used 3,000 vacuum tubes to simulate 40 neurons. The seed has been planted.

“Can machines think?” – Alan Turing’s single question in 1950 ignited a field that is currently reshaping what it means to be human.
1950–1956: Turing’s question and the birth of the field
In 1950, Alan Turing published Computing Machinery and Intelligence, which began with a question that became the field’s defining provocation. “Can machines think?” Rather than immerse himself in philosophy, he proposed a practical test. If a machine can convince a human judge that it’s human through just a text conversation, that’s enough evidence of intelligence to be taken seriously. The Turing Test was born.
Six years later, John McCarthy organized a two-month workshop at Dartmouth with a bold premise. McCarthy, Marvin Minsky, IBM’s Nathaniel Rochester, and Claude Shannon, the father of information theory, argued that “every aspect of learning and other features of intelligence can in principle be described so precisely that we can build machines to simulate it.” It was at Dartmouth in 1956 that McCarthy coined the term “artificial intelligence.” The field had names, researchers, and ambitions. What was missing for decades was the ability to deliver.
1957–1969: Perceptrons, Promises, and the First Winter
In 1957, psychologist Frank Rosenblatt built the Perceptron at Cornell University, the first artificial neural network that could learn from data and update its internal connections based on errors. The Navy provided funding. The New York Times declared that it would one day “walk, talk, see, write, reproduce, and be conscious of its existence.” By the mid-1960s, perceptrons were ubiquitous.
Minsky, then a classmate of Rosenblatt’s at Bronx High School of Science, published Perceptron with Seymour Papert in 1969. This book mathematically proved that a single layer network cannot solve basic logical functions such as XOR. Funds went bankrupt. Researchers fled to symbolic AI. The first AI Winter has arrived. The tragedy, as Minsky later acknowledged, was that the book said that a multilayer network could solve XOR, but no one yet knew how to train it. It will take another 17 years to solve this problem.

Jeffrey Hinton’s backpropagation was not an immediate success, but it made deep learning possible and waited decades for computing to catch up and unleash modern AI.
1986: The algorithm that changed everything
Jeffrey Hinton spent several AI winters convinced that the brain’s massive parallelism held the key to machine intelligence, and that continuing to work with neural networks was almost professional suicide. In 1986, Hinton, David Rumelhart, and Ronald Williams published a paper, “Learning Representations by Backpropagation of Errors,” which solved the unit assignment problem and made training multilayer networks mathematically tractable.
Backpropagation works by running the network forward, measuring the output error, and propagating that error signal backwards through all layers, adjusting each connection proportionally to its contribution to the error. Neural networks can now learn all the way down. A second short spring followed, and then a second winter. Expert systems (sophisticated rule-based programs) briefly became mainstream in the 1990s, but then proved too fragile and expensive to maintain. Funds ran out again. But backpropagation was real. The tools existed. We were waiting for a computer fast enough to be used on a large scale.
Quiet days: Lucan, Bengio and the believers
In the 1990s and 2000s, a small community kept neural network programs alive on the fringes. Yann LeCun of Bell Labs demonstrated that convolutional neural networks can read handwritten digits reliably enough for a real bank check processing system. In actual commercial deployments, it is quiet and receives little attention. Yoshua Bengio from the University of Montreal presented fundamental research on language models and distributed word representations. This is an intellectual forerunner of the large-scale language models that would emerge 20 years later. Hinton, LeCun, and Bengio, who will jointly win the 2024 Nobel Prize in Physics for their contributions to machine learning, continued to build on the theoretical and empirical foundations during years when the prevailing sentiment was that deep learning was at a dead end. they were wrong about that. The rest of the field was wrong about them.
2012: The moment when the current era began
In October 2012, Jeffrey Hinton and his students Alex Krizhevsky and Ilya Satskeva entered the ImageNet visual recognition competition using a deep convolutional neural network called AlexNet. They won, and the difference shocked everyone. AlexNet achieved a top-5 error rate of 15.3%. The next best entry was 26.2%. It’s not an incremental improvement. It’s a discontinuity, a kind of gap that indicates one team was playing a fundamentally different game. The key was a combination of deep neural network architecture, large training datasets, and repurposed GPUs for the necessary parallel matrix math training. Hinton later summed it up with characteristic dryness: “Ilya thought we should do it, Alex made it work, and I won the Nobel Prize.”
Within a few months, every major technology company was hiring neural network researchers. Google acquired the startup founded by Hinton. Facebook has opened an AI lab. Funds that had twice abandoned this field have returned. And this time the technology actually worked at scale for a real problem with real economic value. The third spring has arrived. Unlike the first two, it didn’t end.

When Google’s DeepMind mastered Go and defeated South Korea’s Lee Sedol, it was more than just a game. That was the moment when human intuition met a machine with a mind that surpassed it.
2017 and beyond: Transformers, scale, and their arrival.
In June 2017, eight researchers at Google published “Attending Is All You Need,” perhaps the most important research paper in the history of AI. Transformer architecture replaced sequential recurrent networks with self-attention. This is a mechanism in which all parts of a sequence are weighted by relevance and all other parts can be considered simultaneously. The transformer could be trained in parallel, scaled to much larger datasets, and, importantly, its functionality improved in ways not entirely predictable from the smaller version. Scale your models, scale your data, scale your compute, and get a qualitatively better system. This law of scaling drove everything that followed.
OpenAI’s GPT series demonstrated the trajectory of GPT-1 in 2018, GPT-2 in 2019, and GPT-3 in 2020 with 175 billion parameters. Each generation can do things that the previous generation simply couldn’t do. In 2016, DeepMind’s AlphaGo mastered Go well enough to defeat the world’s best human players. In 2020, AlphaFold solved a protein folding problem that had puzzled structural biologists for 50 years. And on November 30, 2022, OpenAI released ChatGPT. It reached 100 million users in two months. This is the fastest adoption of consumer technology in history. Not because new features were introduced, but because conversational interfaces made the full functionality of large language models readable to anyone using a browser. Millions of people sat down, typed text, and watched some thoughts happen in response.
the people who made it happen
McCulloch and Pitts provided the basic concept. Turing provided a philosophical framework and organizational questions. McCarthy named this field. Minsky built that institutional architecture even as he nearly destroyed it with his 1969 book. Rosenblatt provided the first learning machine in this field. Hinton maintained the neural network for two winters, solving training problems, and watching with a mixture of pride and trepidation as the system he helped build outperformed his expectations. LeCun was the first to prove that field convolutional networks and their learned representations can outperform hand-engineered features at real-world scale. Bengio provided much of the basic theory and became the most prominent voice in the field on safety. Sutskever co-authored AlexNet, co-founded OpenAI, and drove the GPT series until he left to found Safe Superintelligence. Sam Altman, Elijah’s former mentor, decided to open ChatGPT to the public, turning an abstract technical discussion into a pop culture experience.
Andreessen’s 80-year composition is not of historical interest in itself. It’s a structural discussion about where we are. Technologies that reshape civilizations rarely materialize on the schedules their inventors hope. The right ideas, the right hardware, the right data need to be aggregated and made available to the public at the right time. Ideas usually come first and the rest wait for decades. What began as a conversation between shirtless philosophers about building machines based on models of the brain has become the most important technological transition in our lifetimes. It took 80 years. It was worth the wait.
