On December 8th, 2024, Nobel Prize winner in physics, Hinton took the stage and gave a speech entitled “The Boltzmann Machine.”
At the time, Oura Magna Auditorium at Stockholm University was packed, and the global spotlight was concentrated here.
He briefly understood his journey with John Hopfield to promote basic machine learning discoveries using neural networks.
Currently, the central content of Hinton's speech was officially published in the Journal of the American Physical Society (APS) on August 25th.
Paper link: https://journals.aps.org/rmp/pdf/10.1103/revmodphys.97.030502
In the 1980s there were two promising gradient calculation techniques –
One is the backpropagation algorithm, which is now a core engine for deep learning and is almost ubiquitous.
The other is the Boltzmann Machine Learning algorithm. This is no longer in use and gradually fades away from people's views.
This time, the focus of Hinton's speech was on the “Boltzmann Machine.”
Initially, he decided to explain complex technical concepts to everyone without using formulas, saying he was going to do something humorous and “silly.”
Hopfield Network
Find the lowest energy points
What is “Hopfield Network”?
Hinton began with a simple binary neural network and introduced the core ideas of “Hopfield Network.”
Each neuron has only two states: 1 or 0. Most importantly, neurons are connected by symmetrical weights.
The global state of the entire neural network is called “construction” and has “goodness.”
That “good” is determined by the sum of the weights between all active neurons. For example, in the diagram above, all red boxes have a total weight of 4.
This is a good network configuration, and energy is negative for good.
The overall point of the “hopfield network” is that each neuron determines how it reduces energy through local computations.
Here, energy represents “badness.” Therefore, whether or not you turn neurons on or off depends entirely on the “symbol” of the total weighted input.
Through continuous updates of neuronal states, the network ultimately stabilizes at the “lowest energy point.”
However, it is not the only low energy point, as “hopfield networks” can have the lowest energy points. This depends on the initial state and the random sequence of decisions of the neurons that will ultimately remain.
Below are the lower energy points: Correct – When the hand neural network is active, its goodness is 3 + 3-1 and energy is -5.
The appeal of the “Hopfield Network” lies in its ability to associate the lowest energy points with memory.
Hinton vividly explained, “If you enter an incomplete memory fragment and apply binary decision rules continuously, the network can complete memory.”
So, if “lowest energy points” represents memory, the process of stabilizing the network with the lowest energy points is SO, known as “content – addressable storage.”
This means that the network is complete by activating only a portion of the item to access a particular item in memory and applying this rule.
It's not just memory storage
However, in order to interpret “sensory input”
Next, Hinton also shared Terence Seinowski (a student at Hopfield), an innovative application for the “Hopfield Network” –
It is used to construct interpretations of sensory inputs as well as to store memories.
They divided the network into “visible neurons” and “hidden neurons.”
The former receives sensory input such as binary images. The latter is used to construct interpretations of sensory input. The energy of a particular configuration of the network represents a poor interpretation and is hoping for a low energy interpretation.
Hinton took the classic vague line drawing as an example of how networks handle the complexity of visual information. Necker Cube.
In the following drawings, some view it as a “convex polyhedron” and others as a “concave polyhedron.”
So how can we draw two different interpretations from this line drawing into a neural network? Before that, we must think: what information can the lines in the image tell us about the edges of three dimensions?
Visual Interpretation: 2D to 3D to 3D
Imagine looking out the window of the outside world and drawing the outline of the scenery seen in glass.
At this point, the black lines on the window are actually the edges you drew.
And the two red lines are the gazes that start from your eyes and pass through either end of this black line.
The question is, what edges in the real world form these black lines.
In fact, there are many possibilities. All three different dimension edges will eventually produce the same line in the image.
So, the most troublesome thing for a visual system is how to infer from these two dimension lines and determine which edges actually exist.
For this reason, Hinton and Sejnowski designed a network that could convert lines in images into the activated state of “line neurons.”
They are then connected through excitatory connections to “three dimension neurons” (green) and created to inhibit each other such that only one interpretation is activated at a time.
In this way, many principles of perceptual optics are reflected.
Hinton then applied this method to all neurons. The question is which edge neurons should be activated?
You need more information to answer this question.
When humans interpret images, they all follow certain principles. For example, if two lines intersect, it is also assumed that they intersect at the same point in three dimension spaces, with the same depth.
Furthermore, the brain tends to view objects as crossing at right angles.
By rationalizing the connection strength, the network can form two stable states. This corresponds to two three-dimensional interpretations of the “necker cube”: concave and convex polyhedrons.
This visual interpretation method brings two core problems.
Search Problem: The network can be stuck locally optimally, staying on better interpretations, and not jumping to better interpretations.
Learning Questions: How to automatically learn the network rather than manually configuring it.
Search Problem: Noise Neurons
In the case of “search problems”, the most basic solution is to introduce neurons with noise, or “stochastic binary neurons.”
The states of these neurons are “binary” (either 1 or 0), but their decisions are highly probabilistic.
A strong positive input turns them on. Strong negative inputs turn them off. Inputs close to zero introduce randomness.
Noise allows neural networks to “climb the slope” and jump from poor interpretations to better interpretations, just like looking for the lowest point in the valley.
Boltzmann distribution + machine learning
By randomly updating hidden neurons, the neural network ultimately approaches SO, known as “thermal equilibrium.”
Once thermal equilibrium is reached, hidden neuronal states form an interpretation of the input.
In thermal equilibrium, low energy states (corresponding to better interpretations) are likely to occur.
Taking the Necker Cube as an example, networks tend to ultimately choose a more rational three-dimensional interpretation.
Of course, thermal equilibrium does not mean that the system remains in a single state. Instead, the probability distribution of all possible configurations is stable and follows the Boltzmann distribution.
In the Boltzmann distribution, once a system reaches thermal equilibrium, the probability of a particular configuration in a particular configuration is determined entirely by the energy of that configuration.
Furthermore, this system is more likely to have a low energy configuration.
Physicists have a trick to understand thermal equilibrium. You need to imagine a huge “ensemble” consisting of many identical networks.
Hinton said, “Imagine the same identical hopfield network as each starts at a random state and gradually stabilizes the proportion of configurations through random updates.”
Similarly, low-energy configurations have a higher proportion in the “ensemble.”
In summary, the principle of the Boltzmann distribution is that low energy configurations are much more likely to occur than high energy configurations.
In “Boltzmann Machine,” the goal of learning is to ensure that the network essentially matches the impressions that are formed when perceiving an actual image of the “awakening” state when it generates an image called “dream, random imagination.”
If this match can be achieved, the hidden neuronal state can effectively capture the reasons behind the deep seat behind the image.
