Femto-joule threshold reconfigurable all-optical nonlinear activators for picosecond pulsed optical neural networks

Silicon-based reconfigurable PhC cavity ANA

Owing to the relatively small third-order nonlinear coefficient, exciting third-order nonlinearity on conventional silicon waveguides requires high power, resulting in significant power consumption for device operation⁴⁵. To solve this challenge, a resonant line-defect PhC cavity was designed for reconfigurable ANAs. The device was designed and fabricated on the basis of a standard silicon-on-insulator photonics platform with a two-dimensional periodic circular air hole array and a line defect. A scanning electron microscope image of the device is shown in Fig. 1a. This PhC resonant cavity offers two key advantages: first, by leveraging the slow-light effect^46,47 of the PhC cavity, the interaction between light and the device is enhanced (Fig. 1b), increasing nonlinear effects and allowing a smaller device footprint. Additionally, we strategically designed the PhC cavity with a relatively weak slow light effect. This approach not only enabled us to reduce the device size but also effectively mitigated the issues of insertion loss and narrow bandwidth, ensuring the overall performance and functionality of the device. Second, through the design of the PhC cavity, light pulses resonate and increase the energy within the device, further enhancing the third-order nonlinear effects⁴⁸. By inducing changes in the effective refractive index of the silicon device through Kerr third-order nonlinearity, which leads to a redshift in the device’s transmission spectrum, as shown in Fig. 1e, multiple types of NAFs can be constructed by selecting different incident light wavelengths on the basis of specific resonant peaks, thereby achieving reconfigurable ANAs (Fig. 1f).

**Fig. 1: Design, simulation and performance of a silicon reconfigurable PhC cavity ANA.**

The transmission spectrum of the device was measured via a continuously tunable laser, as shown in Fig. 1c. More details of the measurement system are provided in Section Ⅰ in the Supplementary Information. The nonlinear absorption curve was obtained by measuring the change curve of the device’s transmittance after the input of broad-spectrum femtosecond pulses, as shown in Fig. 1g. Owing to the two-photon absorption effect⁴⁹, the relative transmittance of the device tends to decrease as the input light energy increases. The device exhibited several resonant peaks designed for amplifying third-order nonlinearity (specific design details in Section Ⅱ in the Supplementary Information), with a resonant peak Q factor on the order of hundreds. The femtosecond laser output was coupled into the device through grating coupling, and the output spectra with different input pulse energies are depicted in Fig. 1h. The output spectrum redshifts with increasing input pulse energy, which is attributed to the third-order nonlinear effect in silicon, leading to an increase in the effective refractive index of the silicon cavity and resulting in a redshift of the device’s resonant peaks. This phenomenon could be explained by classical cavity perturbation theory⁵⁰. A simplified formula to calculate the resonant peak shift $\Delta \lambda$ caused by third-order nonlinearity in the microcavity is derived in Section Ⅲ of the Supplementary Information:

$$\Delta \lambda =\frac{\Delta {n}_{{eff}}}{{n}_{g}}\cdot {\lambda }_{0}={{n}_{2}}_{{eff}}\cdot Q{P}_{{peak}}\cdot {\lambda }_{0}$$

(1)

where $\Delta {n}_{{eff}}$ denotes the waveguide effective index change due to the change in the material index caused by the Kerr effect, ${n}_{g}$ denotes the model group index,$\,{\lambda }_{0}$ represents the probe resonant wavelength, ${{n}_{2}}_{{eff}}$ represents the effective third-order nonlinear coefficient of the waveguide, ${P}_{{peak}}$ represents the pump pulse peak power coupled into the cavity, and the PhC resonant cavity has a quality factor $Q$.

Thus, it can be concluded that the shift of the resonant peak is amplified by the quality factor ($Q$) of the resonant cavity. Figure 1i shows the variation curve of the center wavelength of the resonance peak at 1539–1540 nm with the change in input light power, along with the calculated change in the corresponding effective refractive index $\varDelta {n}_{{eff}}$. These results clearly indicate that upon coupling a femtosecond pulsed laser into the ANA, as predicted earlier, strong third-order nonlinear effects are induced, causing a shift in the device’s resonant peak. The response time is less than 2 ps, as shown in the inset of Fig. 1i.

In addition, the drifts of the resonant peaks make it possible to achieve reconfigurability and programmability of the nonlinear response in the PhC cavity ANA. When a single-wavelength pulse is input, since it is not enhanced by the photonic crystal cavity, the potential two-photon absorption effect is far smaller than the cavity-enhanced Kerr effect. At different wavelengths of the resonant peaks, the trends of the device transmittance change induced by the resonant peak shift vary. In other words, different NAF curves can be generated by changing the wavelength of the incident light. Through the introduction of a filter with a 1 nm 3 dB spectral bandwidth into the saturation absorption measurement setup configuration (Section Ⅰ in the Supplementary Information), the device’s transmittance for picosecond pulses at different wavelengths varied with the input light power. The device, excited by light pulses of less than 500 fJ at other wavelengths, generates distinct activation function curves, with trends in line with the changes in the transmission spectrum shown in Fig. 1j.

Therefore, ANAs with hundreds of femtojoule level thresholds can be reconfigured by taking advantage of the Kerr effects in silicon-based PhC devices. However, the picosecond pulsed optical neural network needs an ANA with a lower threshold for higher-performance optical computing. To further reduce the threshold power of the device, we can approach it from multiple aspects. On the one hand, we can further optimize the design of the PhC cavity, increase its quality factor, enhance the cavity’s ability to concentrate the energy of optical pulses, and thus strengthen the Kerr effect in the cavity. On the other hand, we can optimize the device fabrication process, further reduce device losses, improve energy utilization efficiency, and thereby further lower the activation threshold. In addition, by integrating with graphene materials, we can utilize their excellent optical properties such as saturable absorption to enhance the nonlinear response of the device. We will discuss this in detail in the next section.

Femto-joule threshold graphene-silicon PhC ANA

To further reduce the threshold of the ANA, the graphene material was integrated into the silicon PhC cavity (Fig. 2a). As shown in Fig. 2b, owing to the Pauli blocking effect, the optical absorption of graphene gradually decreases with increasing light intensity, and once the intensity exceeds the threshold power, it saturates, with a femtosecond-level response time^51,52,53.

**Fig. 2: Principle, properties and performance of graphene-silicon PhC cavity ANAs.**

Therefore, by leveraging the saturable absorption effect of graphene, we designed a graphene-silicon PhC cavity ANA. Graphene was transferred to the PhC device via a standard wet transfer process⁵⁰ and patterned through electron beam lithography. Figure 2c shows the Raman spectrum of graphene transferred to the sample. The fabrication process flow of our devices and the material properties of the graphene are shown in Section Ⅳ in the Supplementary Information. Figure 2d shows the transmittance spectra of the device before and after graphene transfer. Although the transfer of graphene increases the device’s losses, the resonant peaks are preserved. Owing to the slow-light effect, the interaction between the light pulses and graphene was enhanced⁵¹, significantly reducing the saturation threshold power of graphene and guaranteeing an ultrafast saturation response time.

To verify the ultralow threshold power and ultrafast response speed of the device combined with graphene, saturable absorption tests and pump-probe tests were performed on a graphene-silicon PhC cavity ANA. The saturable absorption curves are shown in Fig. 2e, f. A comparison between a conventional straight waveguide device covered with 15 μm of graphene (Fig. 2e) and a graphene-silicon PhC cavity ANA (Fig. 2f) reveals an ultralow threshold power of 4 fJ (50% saturation transmittance)^36,54 due to slow light and cavity-enhanced effects. Compared with the activation threshold of several hundred femtojoules for the silicon photonic crystal cavity ANA device (Fig. 1j), the threshold of the graphene – silicon integrated PhC cavity ANA has been significantly reduced. Additionally, pump-probe measurements were also conducted on the device, as shown in Fig. 2g. The device exhibited increased transmittance after the pump light passed through, returning to its original value within 2 ps, with a full width at half maximum response time of 1.05 ps.

Here, an optical nonlinear switch device with ultralow threshold power and ultrafast response time was realized by combining the graphene saturable absorption effect with the slow light cavity enhancement effect. We survey the current state-of-the-art ANAs in Table 1. Our device has achieved at least four orders of magnitude greater figure of merit than other on-chip ANAs. In addition, by modulating the incident wavelength on the basis of the design of the PhC cavity resonant peaks, multiple different types of NAFs can be achieved.

Table 1 Comparison of state-of-the-art ANAs

Taking advantage of silicon Kerr third-order nonlinearity effects, as discussed in the above section, the nonlinear response of the graphene-silicon PhC cavity ANA can be reconfigured. When the input pulse was selected near the wavelengths of 1541 nm, 1540 nm, and 1534 nm, ReLU-type NAF(Threshold: 120 fJ)¹⁸, sigmoid-type NAF(Threshold: 30 fJ)⁵⁵ and linear-type NAF could be achieved, as shown in Fig. 2h–j (details of the configuration can be found in Section Ⅴ in the Supplementary Information). Overall, a wavelength-modulated reconfigurable high-speed ANA has been achieved. The device can realize various NAFs on the basis of the design of the transmittance spectrum, with response times of less than 4.5 ps for activation functions. Clearly, the reconfigurable ANA can saturate at such low power levels with a picosecond response time, indicating the potential for achieving more energy-efficient all-optical neural networks.

On-chip picosecond pulsed optical neural network and neural network training

Current on-chip optical computing architectures are based on modulating continuous wave light^56,57, which has the issue of low power density, making it difficult to activate the material’s nonlinear properties. By using ultrafast pulsed light, it is possible to increase the instantaneous power density without exceeding the material’s thermal damage threshold while effectively stimulating its nonlinear properties. Therefore, pulsed light is highly suitable for realizing all-optical computing architectures. Here, as shown in Fig. 3a, we propose a wavelength division cascaded picosecond pulse optical computing network architecture and analyze the performance requirements of the devices involved.

Fig. 3: General block diagram of an on-chip picosecond pulsed optical neural network and the performance of graphene/silicon heterojunction nonlinear response activation functions (GSNR AFs) on three binary classification datasets.

The entire architecture consists of a spatial-temporal offset-multiplexed signal loading layer signal loading layer, a fully connected layer with picosecond-response nonlinear activation capability, and an output layer, as shown in Fig. 3a.ⅰ. The spatiotemporal misalignment multiplexed signal loading layer includes a high repetition rate picosecond light source, a high-speed broadband modulator, and time-division misalignment units. The picosecond light source with high repetition rate (100 GHz) and narrow pulse width (150 fs) can be implemented through two approaches: off-chip (fiber femtosecond source) or on-chip (mode-locked laser source) solutions (both currently facing significant technical challenges that require further research and development). After being coupled into the on-chip system, the light pulses are encoded by a balanced broadband high-speed modulator (100 GHz bandwidth)⁵⁸. After passing through an on-chip broadband filter, the pulses are split into multiple beams with 1 nm intervals through a wavelength division device (an ID-WDM⁵⁹ as shown in Fig. 3a.ⅴ) and then encoded spatiotemporal misalignment through waveguide delay and combined through an ID-WDM into a waveguide. The spatial thermal noise introduced by fabrication imperfections and the temporal thermal noise caused by thermal fluctuations in WDMs can be compensated for via partially coherent light illumination methods^60,61,62.

The fully connected layer with picosecond-response nonlinear activation capability comprises a signal distribution layer, regenerative signal neurons (Fig. 3a.ⅲ) with linear weights (Fig. 3a.ⅳ) and NAFs (Fig. 3a.ⅱ), and a signal bundling layer. The pulses encoded by the last layer are passed through multiple layers of the MMI to distribute the encoded pulse signals to different neurons for processing. Each neuron consists of synapses and activations. The output pulses from the previous layer are sent to an ID-WDM and split into different wavelengths (λ₁ + λ₂ + ··· + λ_n), weighted differently⁶³ and combined through an ID-WDM into a waveguide. Next, these pulses pass through an ANA as pump light and are filtered out at the output (the filter is not shown in the architectural diagram). Then, a new single—wavelength pulse (λ_k, k = 1,2, ···, n. These wavelengths can be the same as those of the previous stage to achieve wavelength multiplexing) is nonlinearly activated and transmitted as the output of a single neuron to the next layer of the network. By cascading and changing the intervals between splitting and wavelength division multiplexing channels and regenerating wavelengths, the scale of the fully connected layer can be arbitrarily changed, achieving matrix compression, pooling, transformation, and other optical computing functional modules. After completing the fully connected operation, the signals are sent to the output layer, which is the signal-fully connected layer that directly connects to high-speed detectors for signal output.

In the picosecond pulsed optical neural network architecture, the multiply-accumulate operations based on phase-change materials exhibit near-zero static power consumption. When the activation energy per computing unit is maintained below 30 fJ, the system demonstrates the potential to achieve computing power density on the order of 10³ TOPS/mm² and computing power energy efficiency density reaching 10⁶ TOPS/W/mm². Consequently, the development of reconfigurable all-optical nonlinear activators (ANAs) featuring ultralow threshold (<30 fJ), picosecond-scale response, and multi-wavelength compatibility will be pivotal for overcoming the power consumption bottleneck in next-generation ultra-high-speed optical computing networks.

To provide an initial assessment of the classification ability of the picosecond pulse optical neural network proposed above, we simplified it into a picosecond pulsed optical fully connected neural network for classification tasks (details of the architecture can be found in Section Ⅵ in the Supplementary Information). We then built a fully connected network based on PyTorch and scikit-learn libraries to simulate its performance. The nonlinear responses generated by our ANAs were fitted into an NAF curve through the linear interpolation method and normalization adjustment (see Section Ⅶ in the Supplementary Information). The NAF curves replaced the classical activation functions in the fully connected network accordingly to solve three kinds of binary classification problems.

Three binary datasets are generated for statistical analysis: concentric circles, crescent moon shapes, and linearly separable classification, as shown in Fig. 3. The size of each binary classification dataset is 1000 instances, divided into training, validation, and testing sets at a 6:2:2 ratio. The comparison is between our designed ANA and the identity function (no activation). As illustrated in Fig. 3b–e, various activation functions have distinct impacts on the decision boundaries in binary classification tasks, resulting in different levels of final model training accuracy. Sigmoid-type NAF (Fig. 3c) has the best classification accuracy (96%) on concentric circle datasets and the best classification accuracy (94.5%) on crescent moon datasets. ReLU-type NAF (Fig. 3b) has the best classification accuracy (89%) on linearly separable datasets. Figure 3f displays the learning curves for the three datasets. The results align with the widely accepted understanding that sigmoid-type activation functions perform well in binary classification tasks. This is primarily because the sigmoid-type function maps any real number to a range between 0 and 1, making their output highly suitable for interpretation as probabilities. However, owing to the shallow depth of our model, the nonlinear transformations introduced by the activation functions have a more direct and visible impact on the final decision boundary shape, resulting in its sharp angular features in the GSNR AF2’s decision boundary. Compared with GSNR AF2, GSNRs AF1 and 3 display smoother decision boundaries, leading to their gradual activation curve characteristics. Overall, GSNR AF2 is the best option for our network, achieving an average classification accuracy of 92.7% while maintaining high energy efficiency with a low threshold of 60 fJ.

The on-chip picosecond pulse ONN not only works effectively on simple tasks such as binary classification tasks but also performs well in more complex image classification tasks. To solve these more challenging tasks, the spatiotemporal misalignment multiplexed picosecond pulsed optical neural network proposed above was used, as depicted in Fig. 3a. This architecture could significantly enhance device reusability and efficiency. Two neural networks are constructed via PyTorch for image classification tasks on the MNIST and CIFAR-10 datasets. The network structures are based on convolutional neural networks⁶⁴ and residual networks⁶⁵, and the details of the networks are illustrated in Figure S8 (see Section Ⅷ in the Supplementary Information). The raw input data samples are shown in Fig. 4a, f. Both datasets consist of ten classes and follow a standard class-balanced split: 40,000 images for training, 10,000 for validation, and 10,000 for testing. Comprehensive visualizations of the trained networks’ internal representations are provided in Fig. 4b, g. These figures offer an in-depth look at the output of each neural network block, with color coding representing activation intensities. This detailed representation allows for a holistic understanding of how information propagates through the network, from input to output, highlighting the transformations at each stage of the model.

**Fig. 4: Performance comparison of NAFs on the MNIST and CIFAR-10 datasets.**

To monitor the training process, the current model is evaluated on the validation set at each epoch, generating learning curves, as shown in Fig. 4c, h. When different activation functions are used to train a dataset, variations in accuracy occur due to their influence on the model’s nonlinear capabilities and gradient propagation. In both datasets, ReLU-type GSNR AF shows the best performance, with 97.53% classification accuracy in the MNIST dataset and 82.96% classification accuracy in the CIFAR-10 dataset. The confusion matrices for the test dataset images are presented in Fig. 4d, i, providing a comprehensive visualization of the models’ classification performance and highlighting potential areas of misclassification. Compared with the identity function (We compare its training results with those of several commonly used activation functions, and the results are presented in the Supplementary Information Section Ⅸ), GSNR AF1 demonstrated a 0.38% accuracy improvement on the MNIST test set and a 46.51% accuracy improvement on the CIFAR-10 test set. This substantial difference in accuracy improvement between the two datasets can be attributed to their inherent characteristics and complexity levels. The MNIST dataset consists of simple black-and-white handwritten digit images with relatively linear features. Consequently, a simple linear model could also achieve good classification results. In contrast, the CIFAR-10 dataset contains complex color images of objects that exhibit greater intraclass variations and a more intricate feature space, which requires more robust nonlinear feature extraction capabilities. Accordingly, ReLU-type GSNR AF demonstrates a significant advantage on the CIFAR-10 dataset because it effectively captures and represents complex nonlinear relationships in the data, such as the interactions between object shapes, textures, and colors. The heatmaps in Fig. 4e, j illustrate the networks’ activation patterns across different image regions. Models employing ReLU-type activation functions effectively highlight key features extracted by convolutional layers, such as areas potentially corresponding to car wheels, license plates, and the circular contours of digit ‘0’. In contrast, although a model without an NAF can detect simple features such as the central void in digit ‘0’, it struggles to effectively learn and emphasize more complex features of the car. In conclusion, GSNR AF1 demonstrates remarkable versatility by effectively capturing nonlinear features, thereby significantly enhancing the model’s classification accuracy and feature extraction capabilities across diverse datasets.

From the above model results, different tasks require distinct optimal NAFs, which emphasizes the need for reconfigurable ANAs (see Section Ⅸ in the Supplementary Information). Furthermore, to evaluate the practical performance of our picosecond optical pulse neural network architecture, we theoretically projected its classification capability on the MNIST dataset in Section Ⅹ in the Supplementary Information. Using optimistic yet reasonable estimation methods, our architecture demonstrates the potential to achieve a computational density of 2.13 × 10³ TOPS/mm² and an energy efficiency density of 0.71 × 10⁶ TOPS/W/mm² within a compact 4.15 mm² chip area, revealing the promising potential of all-optical neural networks compared to conventional electronic approaches.

Source link