Genetically programmable optical random neural networks

Machine Learning


Numerical studies

Before constructing the experimental setup, we make use of an accurate beam propagation method (BPM) simulation to validate our main idea of controlling optical random projection. The details of the simulation are given in Supplementary Note 1. Here, our design leverages Johnson-Lindenstrauss Lemma32, which states that random projection can be used for dimensionality reduction purposes33, which is at the core of learning from high-dimensional data. We benchmark our idea on a binary classification task using the Breast MNIST dataset containing 780 samples, and with an 80/20 train-test split, we achieve the GA accuracy evolution plot presented in Fig. 1c. In GA accuracy plots, the fitness of each candidate angular orientation is denoted with a dot. Boxes extend from the first quartile (Q1) to the third quartile (Q3) for a given generation, and the whiskers extend to the farthest data point lying within 1.5× the inter-quartile range (Q3–Q1). Flier points are those past the end of the whiskers. The median value in a generation is denoted with a horizontal line. As can be seen from the figure, simulation results show improvements over the baseline ridge classification accuracy of 66.67%, which corresponds to unprocessed samples. After the first iteration in GA, a ridge classification accuracy of 67.31% is reported, which is already above the baseline performance. Based on the classification accuracies in the following iterations, GA proposes an optimal point where the ridge classification accuracy reached its maximum, 77.56%. We present the improvement in classification accuracy by plotting confusion matrices at the beginning (Fig. 1a) and the end (Fig. 1b) of the programming process. The confusion matrices are normalized such that entries on the same row sum up to 100, barring rounding errors. These confusion matrices for the simulated Breast MNIST dataset demonstrate that classification performance can be significantly improved by programming the optical computing platform and searching for an optimal random projection kernel heuristically. After validating the programmability of optical random projection, we move on to the experimental setup and results in the next subsection.

Fig. 1: Simulated learning results for the Breast MNIST dataset.
figure 1

Row-wise normalized confusion matrices (CMs) corresponding to the first (a) and the last (b) genetic algorithm (GA) iteration. As classification accuracy (Acc) increases, the confusion matrices become more diagonal. c Evolution of the accuracy during GA. The starting and the best accuracy metrics are reported with blue- and red-filled squares, respectively. The fitness metric of each gene is denoted with a data point, and the distribution of fitness metrics for a given generation is visualized with a box–whisker plot. Box edges range from the first quartile (Q1) to the third quartile (Q3), and the whiskers (error bars) extend to the farthest data point lying within 1.5× the inter-quartile range (Q3–Q1). The median value in a generation is illustrated with a horizontal line.

Experimental studies

Experimental realization of our genetically programmable optical random neural network is illustrated in Fig. 2. It consists of a continuous-wave (CW) laser source, a spatial light modulator (SLM), a pair of lenses to perform Fourier transforms, a scattering medium placed on a disc located on the Fourier plane to provide optical random projection and a camera to collect optically processed information. The optical information encoding process is carried out by superimposing each sample of the dataset, an 8-bit image, as a phase pattern on the SLM. After the encoding step, the information-carrying optical field is processed linearly all the way up until the recording plane with no power consumption. Optical random projection/mapping is realized with a diffuser.

Fig. 2: Schematic of random projection-based programmable optical computing setup.
figure 2

The experimental setup consists of a spatial light modulator for encoding information onto laser light, a scattering medium on an off-axis rotation stage, and an imaging system to relay light and a camera to decode information. The scattering medium performs random matrix multiplication in the complex Fourier domain, corresponding to convolution in the space domain. The off-axis alignment of the center of rotation of the disk produces the intermediate variation of an approximate translation of the random pattern along with a minor rotary effect at the detection plane. Therefore, at each angular orientation, pixel values at the camera are modified either partially or completely.

When coherent light passes through the diffuser, different optical free paths are formed at different points on the scattering media, and at the camera plane, we obtain a speckle pattern. A small change in the orientation of the diffuser changes the result of the element-wise matrix multiplication. Since the element-wise matrix multiplication is performed in the Fourier domain, a convolution is performed in the space domain. In other words, the convolution kernel is dependent on the orientation of the scattering medium. Therefore, we propose searching for and finding an optimum kernel (diffuser surface) with better feature extraction capabilities. Since the information-carrying laser beam occupies a small region on the diffuser, introducing a new region to it by rotating the diffuser changes the random projection kernel. For this purpose, we constructed a disk covered with adhesive tape, which resulted in a search space where there is only one degree of freedom (angular position), allowing us to search for better kernels easily. The center of rotation of the disk is positioned off-axis, meaning that rotating the disk corresponds to the intermediate variation of an approximate translation of the random pattern along with a minor rotary effect at the detection plane. If the center of rotation of the disk were aligned on-axis, a mere rotation of the random pattern would have been obtained at the detection plane, leaving the pixel values unaffected. In such a scenario, since rotation is a linear operation, the readout layer would have produced identical accuracy metrics through iterations. The difference between aligning the center of rotation of the disk on- and off-axis is further illustrated in Fig. S3 of Supplementary Note 5.

A stepper motor is employed to control the orientation of the diffuser with the half-stepping method, where one full rotation of the disk is divided into 4096 steps. Without benefiting from optimization algorithms and trying each possible kernel individually, evaluating the resulting accuracy would take a long time experimentally. Thus, we utilize a heuristic search algorithm, genetic algorithm (GA)34, which is also employed in the training and design of optical neural networks35,36,37, for optimizing for the maximum accuracy, which is a function of the angular position, \(\theta\). Such a method requires several trials to meet the designated task and decreases optimization time in experiments. In the broader spectrum, GA lies within model-free (or blackbox) optimization algorithms38. In contrast to model-based approaches, it does not require full knowledge of the system, and the classification accuracy can be effectively parametrized with a single parameter, which is the angular orientation of the disk. Although model-based approaches allow the system to be modeled with a collection of random matrices, memory issues arise with 4096 different complex, random matrices. Therefore, the choice of GA simplifies the optimization process to a large degree. The deployment of a heuristic search algorithm like GA can also be attributed to the uneven structure of the complex media, where there is no a priori relationship between the complex random matrices obtained from distinct points significantly far from each other. However, we want to emphasize that since the rotation steps are small, similarities in output images corresponding to matrices originating from a neighborhood of consecutive angular positions are observed in our experiments (see Supplementary Note 7 for quantitative metrics). GA parameters used in this study and the backend processing steps are explained in the “Methods” section.

In experiments, we initially employ GA on the Breast MNIST dataset and obtain the GA evolution plot shown in Fig. 3c. With the same 80/20 train-test split as in simulations, the ridge classification accuracy is improved to 82.05% starting from 73.72%. The confusion matrices corresponding to the first and the best GA iteration are given in Fig. 3a, b, respectively. To further ensure the claimed improvement is not due to noise, we perform 5-fold cross-validation on randomly projected samples corresponding to the best GA iteration. As a result, a mean classification accuracy of 73.08% with a standard deviation of 7.95% was obtained. For comparison, the first GA iteration produces 5-fold cross-validation metrics with a mean classification accuracy of 71.28% with a standard deviation of 2.88%. We additionally performed a paired-sample Student’s t-test, and the returned value of \(h=1\) indicates that the t-test does reject the null hypothesis at the 1% significance level.

Fig. 3: Experimental learning results for the Breast MNIST dataset.
figure 3

Row-wise normalized confusion matrices (CMs) corresponding to the first (a) and the last (b) genetic algorithm (GA) iteration. c Evolution of the accuracy (Acc) during GA illustrated with a box–whisker plot. As opposed to other experimental results, a population size of \(p=6\) is employed. The starting and the best accuracy metrics are reported with blue- and red-filled squares, respectively. The fitness metric of each gene is denoted with a data point, and the distribution of fitness metrics for a given generation is visualized with a box–whisker plot. Box edges range from the first quartile (Q1) to the third quartile (Q3), and the whiskers (error bars) extend to the farthest data point lying within 1.5× the inter-quartile range (Q3–Q1). The median value in a generation is illustrated with a horizontal line.

After experimentally demonstrating and validating the tunability of optical random projection on a small-scale dataset with samples having 28 × 28 pixels resolution, we note that we can accommodate high-resolution samples in our experimental setup and fully leverage the parallelism provided by our optical computing platform, where we are only limited by the SLM resolution, which is 600 × 800 pixels. For this reason, we decided to scale up and tackle a more complex classification task using the COVID-19 X-Ray24 dataset with higher resolution (400 × 400 pixels) samples. However, the SLM used in this study operates at a 60 Hz repetition rate since liquid crystal technology is limited in terms of speed. To decrease the time spent on programming the random projection kernel and make our optical computing method useful for larger datasets, we propose to form a smaller subset of the dataset to be considered a proxy for the full set for the programming step. For this purpose, out of the full dataset (2481 samples), a subset of size 300 containing randomly selected 150 positive and 150 negative samples is created. As previously, an 80/20 train-test split is used. This way, our goal is to reduce the programming time while maintaining the improvements in classification accuracy. As demonstrated in Fig. 4b, we observe a significant increase in the ridge classification accuracy from 78.33% to 93.33% while programming the optical computing platform with GA.

Fig. 4: Experimental learning results for the COVID-19 X-Ray dataset when a subset of the full dataset is used for the genetic algorithm (GA).
figure 4

a Confusion matrix (CM) corresponding to the first GA iteration. b Evolution of the accuracy (Acc) during GA illustrated with a box–whisker plot. c CM was obtained at the end of GA. d CM corresponding to the full dataset when all the samples are passed through the optimal angular position yielded by GA. The starting and the best accuracy metrics are reported with blue- and red-filled squares, respectively. The fitness metric of each gene is denoted with a data point, and the distribution of fitness metrics for a given generation is visualized with a box–whisker plot. Box edges range from the first quartile (Q1) to the third quartile (Q3), and the whiskers (error bars) extend to the farthest data point lying within 1.5× the inter-quartile range (Q3–Q1). The median value in a generation is illustrated with a horizontal line.

To evaluate the performance of the subset method, we optically process all the samples of the COVID-19 X-Ray dataset with the optimized random projection kernel proposed by GA, employing an 80/20 train-test split on the whole dataset. An accuracy of 90.14% is achieved over the baseline accuracy level of the dataset, which is 74.85%. These results demonstrate that the genetically programmed optical computing platform provides 15% higher accuracy. In contrast, when we use GA for the full dataset instead of 300 samples, we observed a maximum accuracy level of 91.15%. It shows that by utilizing a randomly selected subset and decreasing the programming time significantly, we sacrifice a 1% performance difference in classification, and even a small subset gives rise to improvements in classification accuracy. We can observe the evolution of accuracy via the confusion matrices presented in Fig. 4a, c. The confusion matrix corresponding to the entire dataset is also given in Fig. 4d. With the COVID-19 X-Ray dataset results, we validate the high-resolution sample processing capability of optical random projection on top of programmability demonstrated earlier. We would like to note that such high resolutions are atypical in conventional machine learning frameworks.

After experimentally validating the programmability and high-resolution sample processing capabilities of our scheme, we explored the popular Fashion MNIST dataset with 70,000 samples31 to benchmark the performance of our programmable optical random projection scheme and to evaluate our method’s performance for multilabel classification. Similar to the COVID-19 X-Ray dataset, we utilize a subset to decrease the programming time of our optical neural network. Thus, a subset of randomly selected 3000 samples with an 80/20 train-test split is created from the original dataset. To set a baseline for our method, we apply the ridge classification over the selected subset and obtain 73.83% accuracy. At the beginning of the programming, after the first GA iteration, an accuracy of 71.33% is obtained. At the end of the programming, this classification accuracy is increased to 81.00% (Fig. S7b), resulting in an approximately 10% improvement. The entire Fashion MNIST dataset with 60,000 training and 10,000 test samples is optically processed for the programmed condition, and the classification accuracy of 83.06% is achieved. The row-wise normalized confusion matrices captured during the evolution of GA (Fig. 5a, c) and inference (Fig. 5d) further supported our claim that programming random neural networks effectively decreases classification errors. Additional results on high-resolution datasets can be found in Supplementary Notes 10 and 11.

Fig. 5: Experimental learning results for the Fashion MNIST dataset when a subset of the full dataset is used for the genetic algorithm (GA).
figure 5

a Confusion matrix (CM) corresponding to the first GA iteration. b Evolution of the accuracy (Acc) during GA illustrated with a box–whisker plot. c CM was obtained at the end of GA. d CM corresponding to the full dataset when all the samples are passed through the optimal angular position yielded by GA. As opposed to other experimental results, a population size of \(p=6\) and \(n=7\) generations are employed. The starting and the best accuracy metrics are reported with blue- and red-filled squares, respectively. The fitness metric of each gene is denoted with a data point, and the distribution of fitness metrics for a given generation is visualized with a box–whisker plot. Box edges range from the first quartile (Q1) to the third quartile (Q3), and the whiskers (error bars) extend to the farthest data point lying within 1.5× the inter-quartile range (Q3–Q1). The median value in a generation is illustrated with a horizontal line.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *