Taking advantage of noise in quantum reservoir computing

The QML task considered in this work consists on predicting the excited electronic energy $E_1$ from the corresponding ground state ${|{\psi _0}\rangle }_{R}$ with energy $E_0$ for the LiH molecule, using noisy QRs. Three noise models are considered in this study: the depolarizing channel, the amplitude damping channel and the phase damping channel. Full description of the QML task and noise models is provided in section “Methods” below.

Figure 1 shows the mean squared error (MSE) in $E_1$ predicted with our QRs as a function of the number of gates, for different values of the error probability p (colored curves) and noise models (panels), together with the results for the corresponding noiseless reservoir (in black).

As expected, the general tendency of the MSEs is to grow with the noise characterized by p. However, a careful comparison of the three plots in Fig. 1 surprisingly demonstrates that the amplitude damping noise renders results which are significantly different from those obtained in the other two cases. Indeed, if the number of gates and error probability are small enough, the QRs with amplitude damping noise provides better results than the noiseless QR. The same conclusion applies for the higher values of p, although in those cases the threshold number of gates for better performance decreases. This is a very significant result, since it means that, contrary to the commonly accepted belief, the presence of noise is here beneficial for the performance of the quantum algorithm, and, more importantly, it takes place within the limitations of the NISQ era. As an example, for $p=0.0005$ (green curve) all noisy reservoirs render better performance than the noiseless counterpart when the number of gates is smaller than 135. Current quantum processors typically have error rates around $p=0.001$, which are expected to be significantly reduced soon by employing error-correction techniques¹².

A practical criterion to decide when noise can be used to improve the performance of QRC is provided in Table 1, which shows the averaged fidelity between the output noisy state $\rho$ and the noiseless state ${|{\psi }\rangle }$ for the circuits subjected to an amplitude damping noise with different values of the error probability. The number of gates has been chosen to be as large as possible provided that the noisy reservoirs outperform the noiseless ones. These results imply that when the fidelity is greater than 0.96, the noisy reservoirs outperform the noiseless ones at the QML task, and accordingly the noise should not be corrected. Finally, also notice that for $p=0.0001$ the fidelity is always higher than 0.96, and thus the performance of the noisy QRs is always higher or equal than their noiseless counterparts. Table 1 also shows that the number of gates needed to outperform the noiseless reservoirs is of the order of 100 quantum gates, which corresponds to an average circuit depth of 10–15 gates. Recently, there have been multiple applications of quantum machine learning algorithms using shallow quantum circuits of similar depth. In particular, multiple shallow quantum neural networks have been combined to solve six benchmark classification problems in Ref.²⁸. Also, a hybrid quantum-classical graph neural network, which used quantum circuits of depth<10, was developed for particle track reconstruction in particle acceleration experiments²⁹. Finally, in Ref.³⁰, the authors propose the design of quantum datasets for quantum machine learning tasks, where the classification label is encoded in the amount of entanglement of the quantum states. Their results shows that the quantum datasets are successfully implemented with circuit depth smaller than 7. Therefore, quantum circuits with depths of 10-15 gates can provide useful applications in various domains. This is the regime where the amplitude damping noise provides an advantage over noiseless quantum circuits in our setting, which suggests that this type of noise may be beneficial to other QML tasks. The extension of this analysis to other applications, such as time series forecasting, will be explored in future works.

Table 1 (Averaged) Fidelity between the noisy and noiseless final quantum states for the circuits with amplitude damping noise (see text for details).

A second conclusion from the comparison among plots in Fig. 1 is that the behavior for depolarizing and the phase damping channels is significantly different than for the amplitude damping one. In the former cases, the performance of the noisy reservoirs is always worse than that of the noiseless one, even for small error probabilities.

A third result that can be extracted from our calculations is that the tendency of the algorithm performance when the reservoirs have a large number of gates is the same for the three noise models considered (except for the smallest value of $p=0.0001$). While the performance of the noiseless reservoirs stabilizes to a constant value as the number of gates increases, the noisy reservoirs decrease their performance, seemingly going to the same growing behavior. This is due to the fact that the quantum channels are applied after each gate, and thus circuits with a large number of gates have larger noise rates, which highly decreases the fidelity of the output state. For this reason, even though increasing the number of gates has no effect in the noiseless simulations, it highly affects the performance of the noisy circuits, and thus the number of gates should be optimized in this case.

Having analyzed the MSE results, we next provide a theoretical explanation for the different behavior of the three noisy reservoirs. In the first place, the depolarizing and phase damping channels give similar results, except that the performance of the former decreases faster than that for the latter. This effect can be explained with the aid of Table 2, where the averaged fidelity of each error model over the first 200 gates is given.

Table 2 (Averaged) Fidelity between the noisy and noiseless final quantum states for the circuits with the three noise models.

As can be seen, the depolarizing channel decreases the fidelity of the output much faster than the phase damping, which explains the different tendency in the corresponding ML performances. On the other hand, the amplitude damping channel is the only one that can improve the performance of the noiseless reservoirs in the case of few gates and small error rates. The main difference between amplitude damping and the other channels is that the former is not unital, i.e. it does not preserve the identity operator.

Let us consider now how this fact affects the distribution of noisy states in the Pauli space. For this purpose, let $\rho ‘$ be the $n-qubit$ density matrix obtained after applying $N-1$ noisy gates, (with the noise described by the quantum channel $\epsilon$), and then apply the N-th noisy gate U. The state becomes $\epsilon (\rho )$, defined as:

$$\begin{aligned} \epsilon (\rho ) = \sum _{m=1} M_m \rho M_m^\dagger , \quad \rho = U \, \rho ‘ \, U^\dag , \end{aligned}$$

(1)

where $\rho$ is the state after applying gate U without noise. Now, both $\rho$ and $\epsilon (\rho )$ can be written as linear combinations of Pauli basis operators $\{P_i\}_i$, where each one of them is the tensor product of the Pauli operators $\{ X,Y,Z,\mathbb {I}\}$ as

$$\begin{aligned}{} & {} \rho = \sum _i a_i P_i, \quad \text {with }a_i = \frac{1}{2^n} \text {tr}(P_i \rho ), \end{aligned}$$

(2)

$$\begin{aligned}{} & {} \epsilon (\rho ) =\sum _i b_i P_i, \quad \text {with }b_i = \frac{1}{2^n}\text {tr}[P_i \epsilon (\rho )]. \end{aligned}$$

(3)

Notice here that some of the coefficients $b_i$ will be used to feed the ML model after applying all the gates of the circuit and make the final predictions. Thus, expanding the final quantum states in this basis is suitable to understand the behavior of the QRC algorithm. Next, we study the relation between coefficients $\{a_i\}$ and $\{b_i\}$. Since the operators $P_i$ are tensor product of Pauli operators, it is sufficient to study how each of the noise models $\epsilon$ maps the four Pauli operators. The results are shown in Table 3, where we see that $\epsilon (P_i)$ is always proportional to $P_i$, except for $\epsilon (\mathbb {I})$ with the amplitude damping channel. Indeed, it is for this reason that, with depolarizing or phase damping noises, the quantum channel only mitigates coefficients in the Pauli space. On the other hand, the amplitude damping channel can introduce additional non-zero terms to the Pauli decomposition. Also, this explains why, for low noise rates, the shapes of the MSE curves for depolarizing and phase damping are similar to that for the noiseless scenario, but not for the amplitude damping one. Table 3 also explains why the phase damping channel provides states with higher fidelity than the depolarizing channel. The phase damping channel leaves the Z operator invariant, and also produces lower mitigation of the X and Y coefficients compared to the depolarizing channel. For this reason, even though both the depolarizing and phase damping channels are unital, the depolarizing channel decreases the ML performance faster, and its correction should be prioritized.

Table 3 Expressions for the error channel $\epsilon$ when applied to the four basis Pauli operators.

Let us provide a mathematical demonstration for this fact. For any Pauli operator $P_i$, the coefficient in the Pauli space with the depolarizing and phase damping channels is

$$\begin{aligned} b_i = \frac{1}{2^n}\text {tr}[P_i\;\epsilon (\rho ) ] = \frac{1}{2^n}\alpha _i \;tr(P_i \rho ) = \alpha _i \; a_i, \quad 0 \le \alpha _i \le 1, \end{aligned}$$

(4)

and therefore the noisy channel mitigates coefficient $a_i$. However, let us take a gate with amplitude damping noise. Suppose channel $\epsilon$ acts non-trivially on qubit j, that is, the Kraus operators for $\epsilon$ are of the form $\tilde{M}_m = \mathbb {I} \otimes \cdots \otimes M_m \otimes \mathbb {I} \otimes \cdots \mathbb {I}$, with $M_m$ in the j-th position. Suppose now that we measure $P_i$ (the i-th operator in the Pauli basis associated to coefficient $a_i$), where $P_i$ acts as a Z operator on the j-th qubit ($P_i = P^0 \otimes \cdots P^{j-1} \otimes Z \otimes P^{j+1} \otimes \cdots P^n$). Let’s also take $P_k=P^0 \otimes \cdots \otimes P^{j-1} \otimes \mathbb {I} \otimes P^{j+1} \otimes \cdots \otimes P^n$, with $a_k$ associated to $P_k$. Then, the coefficient $b_i$ is

$$\begin{aligned} \begin{array}{rl} b_i = &{}\displaystyle \frac{1}{2^n} \text {tr}[P_i\epsilon (\rho )] = \frac{1}{2^n} \sum _l a_l \text {tr}[P_i \epsilon (P_l)] \\ =&{} \displaystyle \frac{1}{2^n} \Big (a_i \text {tr}[P_i \epsilon (P_i)] + a_k \text {tr}[P_i \epsilon (P_k)]\Big ) \\ =&{}\displaystyle \frac{1}{2^n}\Big (a_i (1-p) \text {tr}[P_i^2] + a_k\text {tr}[P_i(P_k + pP_i)]\Big )\\ =&{} (1-p)a_i + p a_k \end{array} \end{aligned}$$

When $a_i=0$ but $a_k \ne 0$, the coefficient $b_i$ is different from 0, and thus the amplitude damping noise introduces an extra coefficient in the Pauli space. Therefore, we can conclude that the amplitude damping channel allows to introduce additional non-zero coefficients in the Pauli space, instead of only mitigating them. For this reason, for p small enough, the amplitude channel can introduce new non-zero terms in the Pauli space without mitigating too much the rest of them.

The previous theorem can be further illustrated with a two qubits toy model example. We design a QR with the three different quantum noise models and calculate the distribution of the Pauli coefficients at the end of the circuit. Figure 2 shows the outcomes of the measurements for a random circuit with 10 gates and an error rate of $p=0.2$. We see that all noise models mitigate the non-zero coefficients. However, the shadowed area shows a region where the noiseless simulation (as well as the depolarizing and phase damping simulations) give zero expectation values. More importantly, the amplitude damping circuit has non-zero expectation values for the same operators, which means that this quantum channel has introduced non-zero terms in the Pauli distribution. For small error rates, the noisy QRs provide better performance, since having amplitude damping noise produces a similar effect in terms of performance of the QRs as having more quantum gates in the circuit, as can be seen in Fig. 1 and also in Fig. 3 from Ref.²⁴. To better visualize this effect, we design 4000 random circuits and see how the final state $\rho$ fills the Pauli space. Since the Pauli space in the 2-qubit system is a 16-dimensional space, we use a dimensionality reduction technique called UMAP³¹ to visualize the distribution in 2D. The results are shown in Fig. 3. We see that the amplitude damping channel fills the Pauli space faster than the other circuits, including the noiseless QR, thus confirming the hypothesis that the amplitude damping channel acts equivalently as having more quantum gates.