New ways to prepare quantum states advance both quantum simulation and quantum computing. Xiaotian Nie and colleagues from Intelligent Quantum Inception Co, in collaboration with Tsinghua University, presented an adaptive measurement feedback protocol powered by reinforcement learning that works with limited information. This method addresses important challenges by circumventing the need for complete quantum state knowledge and instead relying solely on the history of measurements to guide the process. By employing a new stochastic reward system, the team successfully prepared the ground state and generated GHZ states in the Bose-Hubbard model, demonstrating a scalable and experimentally viable route towards the preparation of strong quantum states.
Adaptive reinforcement learning accelerates ground state preparation and entanglement generation
A six-fold improvement in the energy convergence rate in ground state preparation was achieved, reducing the required measurement time in a non-interacting system from a γT of >3 to a γT of 1.2. This breakthrough surpasses previous methods that relied on fixed protocols, which have had difficulty reaching comparable low-energy states, especially in highly interacting, near-critical regions that are more difficult to control. Intelligent Quantum Inception Co and Tsinghua University have developed an adaptive measurement feedback protocol using reinforcement learning. This enables the preparation of scalable quantum states without requiring complete knowledge of the quantum system. The team also successfully prepared a GHZ state, a key resource for quantum information processing, using only single-qubit measurements and feedback rotation, achieving energies close to ground state -4. However, the current results do not demonstrate performance on large systems or address the challenge of maintaining consistency on real hardware.
Implementation of discretized weak measurement and feedback control
Time is processed in discretized form and divided into short intervals of duration δt. During each interval, the system undergoes a weak measurement with respect to the observable ct, after which the feedback unitary generated by Ft is applied. A weak measurement of a Hermitian observable ct with measurement strength γ is expressed by the Claus operator: Mt(cm’t) = 4γδt π 1/4 e−2γδt (ct−cm,t)2. where cm’t is the corresponding noisy measurement that follows a normal distribution: P (cm,t) ~N μ = ⟨ct⟩, σ2 = 1 8γδt. The parameter γ controls the tradeoff between information gain and measurement backaction. Increasing the value improves measurement accuracy, but also increases disturbances to the quantum state. Based on the measurement results cm’t, a feedback operator Ft is selected according to the policy and applied to correct the evolution of the system. The time-evolving unitary operator is given by Ut = e−i( H+ Ft)δt. Here H is the original system Hamiltonian.
As a result, the system state |ψ
This means that the weight vectors αt and βt completely specify the measurement and feedback actions at step t. The GRU recurrent network observes the measurement weight αt and the measurement result cm’t and outputs the feedback weight βt and the next measurement weight αt+1. The evolution of the feedback after cm’t is registered is deterministic, so αt+1 can be generated in the same forward pass. To maintain experimental compatibility, rewards must also be accessible. Ideally, the reward would be the negative energy expectation at the final state ⟨−H⟩, but this is not possible within a single trajectory because the expectation must be averaged over multiple trajectories.
Additionally, noncommutative Hamiltonian terms require incompatible measurement settings. In the Bose, Hubbard model, the hopping term Hkin is measured by time-of-flight imaging, and the interaction term Hint uses in-situ imaging. Therefore, the final reward is constructed from a single randomly sampled term. Writing H = P k Hk, one term Hk is selected and measured with probability pk to obtain the eigenvalue Eki. The importance-weighted reward R = −1 pk Eki is an unbiased estimator of total negative energy.[R] = −P k⟨Hk⟩= −⟨H⟩. To improve the stability of training, each term is centered around the expected value of the target’s ground state and is defined as Hk = Hk −⟨Hk⟩0 with the expected value of the target’s ground state as ⟨· · · ⟩0.
The reward R = −(1/pk) Eki has a mean of zero in the ground state of all sampled terms and changes the objective by a constant. Centering and optimal term sampling significantly suppress variance while maintaining reward unbiasedness and experimental compatibility. This reward design, measurement training, and feedback control policy ensures that we are no longer limited to simulation environments that rely on privileged access to the complete quantum state and are ultimately limited by the exponential growth of the Hilbert space.
Instead, the same training framework will be compatible with the experimental trajectory. The iterative policy parameters are optimized using Proximity Policy Optimization (PPO, implemented in the PureJaxRL library), a stable policy gradient method that limits overly large updates between successive iterations. In each training round, the agent interacts with the measurement, feedback loop to collect trajectories, estimate the corresponding returns and benefits from the probabilistic final reward, and update the policy accordingly.
Repeating this step results in a closed-loop measurement, feedback strategy that gradually drives the system toward the target low-energy state. Numerical demonstrations illustrate the proposed framework for two representative tasks: ground state preparation and GHZ state preparation in the Bose-Hubbard model (BHM). The ground state of the one-dimensional four-site Bose, Hubbard model at unit filling is considered using a Hamiltonian containing only hopping and on-site interaction terms, HBHM = −JX ia† iai+1 + Hc + U 2X i ni(ni−1). According to Wu et al.’s choice of operators, the measured observable and feedback operators are ct = X j αt,j nj, Ft = (βt,1 + iβt,2) X ja†jaj+1 + Hc, so the policy adaptively selects the density-weighted measurement profile and the complex hopping feedback amplitude. We study three regions: non-interacting, strongly interacting, and near-critical, with a fixed measurement strength γ/J = 0.3, initialized with unit packing volume state |1, 1, 1, 1⟩ with 10% mixing of single particles, and hole excitation to model the perturbations.
Quantum state preparation from routine measurements avoids detailed system characterization
Scientists are increasingly focusing on preparing quantum states, a process that is critical both to simulating complex systems and building future quantum computers. This work offers clear advantages through its practicality, as the new method circumvents the need for exhaustive quantum state knowledge, a long-standing obstacle in this field. Instead, it relies on readily available measurements to guide the process. While recognizing the prevalence of competing quantum control techniques, such as adiabaticity and shortcuts to reinforcement learning, the method prepares quantum states using only measurements collected periodically during experiments. This avoids the need for detailed knowledge of the system’s quantum state, which is often inaccessible.
Scientists have demonstrated a new way to prepare the ground state of a quantum system and generate GHZ states using only regularly collected measurement data. This is important because existing techniques often require complete knowledge of the quantum state of the system, which is impractical for larger, more complex systems. The researchers successfully applied this adaptive measurement feedback protocol to Bose-Hubbard model and GHZ state preparation, establishing a scalable approach to quantum state preparation. The authors point out that this method allows accurate estimation of the target’s energy while avoiding reconstruction of the complete quantum state.
👉 More information
🗞 Experiment-ready measurements – Preparing feedback quantum states with reinforcement learning
🧠ArXiv: https://arxiv.org/abs/2606.13005
