Grover’s Search powers massive MIMO user scheduling for 5G and B5G

Researchers are tackling the considerable computational challenges of 5G and Beyond 5G (B5G) massive MIMO user scheduling. This problem is complicated by its scale and the need for efficient channel state information (CSI) management. Ruining Fan from University College Dublin, along with Xingyu Huang, Mouli Chakraborty and colleagues, introduces a new framework that integrates Grover’s search algorithm with quantum reinforcement learning (QRL) to more effectively navigate exponentially large scheduling spaces. Their work is important because it demonstrates a practical implementation using quantum gate-based circuits that mirror reinforcement learning layers. And importantly, the simulation results show significant performance improvements over both classical convolutional networks and existing quantum deep learning benchmarks.

Quantum reinforcement learning for massive MIMO scheduling

The model is implemented using quantum gate-based circuits, mimicking the layered architecture of reinforcement learning, where quantum operations serve as policy updates and decision-making units. Efficient user scheduling remains a key algorithmic challenge in massively MIMO systems. Existing research uses a variety of methods to address this problem, including classical machine learning (ML) and quantum computing-based proposals. Due to the limited computing power of base stations (BSs), traditional approaches are limited to predicting the CSI for future time intervals based on existing datasets.

Previous research has often required supplementary methods such as long-term short-term memory networks for support. By leveraging quantum-based algorithms, CSI database traversal becomes more feasible, resulting in higher average sum rates for hybrid quantum communication systems compared to those using traditional ML techniques. The scheduling task can be viewed as a combinatorial search for the optimal user, antenna assignment, and Grover’s amplitude amplification can effectively emphasize high-reward policies. Moreover, oracles can guarantee the trade-off between exploration and exploitation in reinforcement learning, making them a natural basis for the proposed QRL scheme.

Grover’s search algorithm is a QRL-based search algorithm that establishes a quantum search technique that can amplify the probability of finding an optimal solution in an unsorted database quadratically faster than classical algorithms. The oracle identifies feasible solutions and guides the learning path toward near-optimal policies through amplitude amplification. This implementation includes a quantum gate-based circuit that mimics the hierarchical functionality of reinforcement learning architectures, with one-to-one quantum operations acting as decision makers and policy updates. This integration reduces the computational burden of CSI traversal, enables faster convergence to high-throughput scheduling strategies, and overcomes the scalability barriers faced by traditional methods.

It also lays the groundwork for quantum communication algorithms inspired by Grover’s search, which could play a key role in 5G and B5G. At the same time, the system achieves near-optimal scheduling with higher throughput than traditional methods. Additionally, the evaluation and scalability of the model is demonstrated by comparing the proposed QRL model with traditional CNN and quantum neural network (QNN) benchmarks under different scenarios. The scalability of this method is demonstrated by consistently achieving higher average sum rates as user antenna configurations are scaled up, which may be essential for B5G networks.

The remaining sections present the detailed design of the proposed algorithm, followed by numerical analysis and comparative evaluation with existing methods. A single cell massive MIMO downlink system is considered to include A antennas at the BS, with a single antenna serving T terminal users. The downlink channel of each user is denoted as nt ∈CA. The antenna configuration of the BS is flexible but uses a rectangular configuration. That is, the BS contains X rows of antenna strings and Y antennas in each row. The number of users served by the BS simultaneously is T ≤ A = X · Y.

The received signal at each terminal user t can be expressed as yt = nt · x + gt. where x = PT t=1 √pt · st is the signal transmitted through all antennas in the BS and E[|st|2] = 1 is the data symbol for user t. The noise component gt follows the pattern of additive white Gaussian noise (AWGN) gt ~CN(0, σ2 t). To maintain fairness, each user receives the same amount of energy P = √pt = p Ptotal/T, ∀t. The ergodic sum rate can be expressed as Ssum = TX t=St. St is the ergodic rate estimated by the Shannon capacity and the corresponding signal-to-interference-plus-noise ratio (SINR) concept: St ≈log2[1 + Et(SINRt)] .

Due to the large-scale Grover search problem, an approximate form of the ergodic rate for each user t can be used without affecting the final training result of the system. Compared to the Rayleigh fading channel condition widely used in massive MIMO systems, the Rician fading channel model has a Rician K factor that balances the channel condition between pure line-of-sight (LoS) and full scattering, which provides additional robustness to the proposed algorithm by generating flexible feedback under structured channels. Stacked channels in the system are written as N =. [n1, ., nu] ∈CA×T, known as traditional CSI. For correlated Rician fading channels, the channel vector is defined as Nr t = K · nLoS t + K · nNLoS tp Mt. where K and K are coefficients subtracted from the Rice K-factor, nLoS t ∈CT ×1 represents the LoS channel vector component, and nNLoS t represents the non-line-of-sight (NLoS) mean centered component with normalized variance.

Also, Mt ∈CT ×T is the channel correlation matrix of the NLoS component. With the help of Rician fading channels, equation (1) can be expressed in a detailed beamforming method. yt = TX t=1 PNr tftSt + gt = PNr tftSt + TX x=1,x=t PNr xfxSx. Note that time (t + 1) represents the ergodic sum rate achievable in the long run, ω ∈(0, 1) is the forgetting factor of past rates, and a is a small number that prevents division by zero. Grover Search is an algorithm that uses quantum superposition and parallelism to traverse databases quadratically faster than classical computers with the help of quantum circuits.

The abstract architecture is shown in Figure 0.1, with solid lines representing its three main layers. The initial superposition layer, also known as the Hadamard layer, is responsible for initializing the state of each input qubit. The Hadamard gate creates a uniform superposition across all possible states, allowing Grover’s agents to search the entire solution space in parallel. For T users, the same number of Hadamard gates and wires are required. First, N = T qubits are used to encode all users |ψ0⟩= |0⟩ NN, which are then fed into the Hadamard gate in the corresponding dimension, resulting in |ψ1⟩= HNN |ψ0⟩= A 2N−1X i=0 · |θ⟩ (where A = 1 √ 2N is the uniform probability amplitude for each superposed state) is obtained. |θ⟩= |θ1θ2 . θN⟩ is the computational basis state transformed from the scheduling vector in Equation (8).

Additionally, in the second oracle layer, any positive action in the iterations is marked, using Pauli-X preconditioning followed by multi-controlled Z gates to flip the phase of the state. The marked states are stored in the space M ⊆{0, 1}N, and the marking action of the oracle follows the rule OM |θ⟩= ( −|θ⟩, θ ∈M, |θ⟩, θ /∈M. Moreover, the diffusion operator layer mirrors and reflects the oracle from the last layer, but centers on the uniform state. This layer depends on the diffusion operator defined as Diff = . 2 · |U⟩⟨U| . |U⟩ is the homogeneous state in Equation (12) and I denotes the identity matrix that guarantees the core amplification step of the Grover search that performs an inversion about the mean value.The detailed design of the proposed Grover search quantum circuit is shown in Figure 0.2 using only 5 qubits for simplicity.

Grover search improves scheduling efficiency in quantum reinforcement learning

The results showed a 51% performance improvement compared to CNN and 43% performance improvement compared to QDL, establishing a significant advance in mMIMO user scheduling capabilities. The study focused on a single-cell massive MIMO downlink system with A antennas at the base station, each serving T terminal users with one antenna. The system employed a rectangular antenna configuration with X rows of antenna strings and Y antennas in each row. Here, the number of served users, T, is less than or equal to the total number of antennas, A, defined as the product of X and Y. The study measured the received signal at each terminal user, where yt = nt · x + gt. where x represents the signal transmitted through all antennas and gt represents additive white Gaussian noise with a complex normal distribution with zero mean and variance. σ2t.

Scientists calculated the ergodic sum rate to evaluate the system’s performance, taking into account the effects of channel conditions and transmit power. Tests have demonstrated the scalability of this method, consistently achieving higher average sum rates as user antenna configurations are scaled up. This is potentially essential for future B5G networks. This breakthrough enables quantum communication algorithms inspired by Grover’s search, which could play a key role in improving the efficiency and capacity of wireless communication systems. This effort establishes the foundation for more efficient and scalable mMIMO systems, paving the way for improved performance in next-generation wireless technologies.

Quantum reinforcement learning improves performance of mMIMO scheduling

Specifically, the QRL model exhibits a more sensitive response to high signal-to-noise ratio (SNR) environments, achieves higher average sum rates, provides more than 5 dB improvement in certain configurations, and remains robust across a variety of user and antenna scaling scenarios. The authors acknowledge that the current validation is limited to a six-user and eight-antenna configuration, which may not fully represent all potential mMIMO deployments. Future research may focus on extending the model to larger and more complex network configurations and exploring applications in future quantum communication networks.

👉 More information
🗞 Quantum Reinforcement Learning for Massively MIMO User Scheduling Inspired by Grover’s Search
🧠ArXiv: https://arxiv.org/abs/2601.20688

Source link