The “inner loop” of the HRM model architecture consists of two iterative modules. Both modules use the attention mechanism in a standard transformer block configuration. One “L module” is designed to quickly process low-level computations. Another “H module” is designed to handle long-term planning and more advanced reasoning.
The L module basically works like a standard RNN, but it tends to quickly focus on short-term patterns and stop updating hidden states. However, the state update at timestep t in a standard RNN is only conditioned by the hidden state at the previous timestep. t-1updates the hidden state of the L module. zL— so what it focuses on is also conditioned by the current hidden state of the H module zH.
The hidden states of the H module change much more slowly than the L module. The inner loop operates on the next cycle. T Timestep: After L module updates hidden state zL T Many times, the H module uses the following final state: zL Go to update zH. every time step Tthe L module often already converges to a local equilibrium and stops updating. However, since it is updated, zL conditioned on the current value of . zHeach update zH Establishes a new context for the L module. This starts a new “convergence phase” that allows lower-level modules to continue learning.
This means that every time the L module “solves” a short-term task, the H module is updated. Updates to the H module instruct the L module to resolve several issues. new short-term tasks. The H module essentially performs long-term planning, and the L module performs smaller subtasks associated with that long-term planning. This loop is T L module update will be executed N times. both T and N A tunable hyperparameter.
Overall, the core HRM architecture that powers the inner loop includes four learnable components.
-
Ann output network it receives the final value zH and use soft max This function converts hidden states into probabilities and is used to predict the value of the output tokens (which collectively represent the solution to the puzzle).
