Figure 1 provides a comprehensive schematic of the aggregate neural network framework. We divide the framework into four major subcategories: FDTD simulations, surrogate network, inverse neural network, and post-processing. As visualized in Fig. 1, the general process flow is FDTD simulations are used to train the surrogate neural network and the surrogate is used to make large-scale predictions derived from a library of materials. The predictions are then used to train an inverse neural network component. The input of the inverse network is a desired optical spectrum (λ, ε, R, T) and the output is the predicted micropyramid base span, height, substrate thickness (Xspan, Zspan, and tsub), and a vector of complex refractive index values (n(λ), k(λ)) that correspond to the input wavelength. The post-processing module then interprets the output. Here, user constraints—such as the maximum aspect ratio—are used to adjust the predicted output and provide appropriate new solutions that satisfy the restrictions. The adjusted solutions are then passed through both FDTD and the surrogate model. Based on two metrics—the error between the desired input and the constrained output, and the error between the surrogate and FDTD outputs—a decision is made to either retrain the surrogate with the new simulation data, or to stop the loop if the solution is deemed to be sufficient and accurate. Any details not discussed in any of the major subcategories for all modules and connections can be found in the methods section, the supplementary materials, or in the linked code repository.

Flow-chart representation of aggregated neural network methodology and inverse network architecture. Solutions generated in the surrogate (forward) solution are used to train the inverse solver. The input to the inverse neural network is a 400 × 1 vector of wavelength, and the wavelength dependent emissivity, reflectivity, and transmissivity. The output of the inverse neural network is a set of material properties that correspond to the input wavelength and geometric properties. The output is evaluated in FDTD and if the results violate user set constraints, alternative solutions are calculated that fall within the set constraints.
FDTD simulations
Solutions using the FDTD method, while accurate, can be time intensive—this is especially true for large or geometrically complex structures. For this work, the training, validation, and testing data used by the surrogate neural network is compiled from simulations completed in Lumerical’s commercially available 2D/3D FDTD solver. The simulation framework provides exact solutions for Maxwell’s equations across a finite element mesh, and the absorption and dispersion are derived from the resulting electrical fields28,74. Rather than simulating a 3-dimensional (3D) pyramid microstructure, we simulate the middle-cross section in 2-dimensions (2D) to minimize simulation time and to enable the generation of large quantities of simulation data. While this does lead to an overestimation of the micropyramid’s emissivity20 compared to the 3D micropyramid simulations, the results are still accurate as we do not vary the incidence angle in our simulations and assume the broadband wavelength source to be at a normal angle to the material’s surface. Additionally, while we could choose to use a semi-analytical approach like RCWA to run the simulations75 to estimate the optical properties of a 2D structure, FDTD’s accuracy, scalability, and its applicability to other more complex geometries make it a far viable long-term solution. The simulations are based upon a micropyramid geometry visualized in Fig. 2, with the key independent geometric parameters being the triangle base span (xspan), height (zspan), and substrate thickness (tsub). For this work, we assume that Kirchhoff’s law is valid and the emissivity can be derived from α = ε = 1 – R – T, where reflectivity (R) and transmissivity (T) are calculated from power monitors above and below and domain respectively and where absorptivity (α) is synonymous with emissivity (ε)72. To develop the simulation datasets used to train the surrogate, we generate and simulate matrices of randomly generated uniformly values for the xspan, zspan, and tsub for each material included. For simplicity, we assume no additional coating materials, hierarchical structures or surface roughness. Additional details on our FDTD simulation methodology can be found in both the methods section and in our prior work2,11,20,32.

(a) Images for the convolutional neural network are formulated using the wavelength dependent material data. Each image contains information for a singular wavelength point. (b) Diagram of the convolutional neural network process for predicting material dependent optical properties from the generated images.
Surrogate neural network
Deep learning modules have been shown to be exceptionally strong and versatile in solving the so called “forward” problem36,52. In this case, the problem to solve is the optical response from a geometric and material input for a uniform periodic micropyramid surface. The design intent of the surrogate neural network is to act as an ultra-fast and accurate predictor of optical properties such that we can rapidly and accurately predict the optical properties of vast quantities of simulations. Furthermore, it is important that the surrogate network can extrapolate optical properties for materials beyond its original training scope. As such, we compare two methods that serve this function: an improved version of a previously developed deep-neural network72, and an image-based deep convolutional neural network (DCNN). Both methods utilize datasets generated in FDTD that are subdivided into training, validation, and test datasets.
Surrogate neural network: deep neural network
The architecture of the deep neural network—visualized in Fig. 1 of our previous work72 and in the supplementary materials—is designed to emulate the critical simulation inputs that influence the computed optical properties. The network employs a total of 8 input neurons: 3 geometric inputs (xspan, zspan, tsub), the source wavelength (λ), and 4 material inputs (n, k, εreal, εim). The substrate thickness is a key geometric parameter to consider as it enables the model to interpret the relationship between the spectral optical properties and underlying material thickness and ultimately to predict the broadband spectral behavior of transmissive materials. Micropyramids of different materials are differentiated using discrete material inputs for the complex refractive index (n and k) and the correlated permittivity values (εreal, εim). Compared to using only n and k, using both the complex refractive index and the permittivity is essential to accurately extrapolate the optical properties of materials not seen in the training process. The source wavelength (or frequency) is the fundamental factor that links the geometric and material inputs. For each FDTD simulation, we simulate 100 wavelength points (100 frequency points), each of which has a discrete solution for reflectivity and transmissivity. Accordingly, each simulation is divided into 100 discrete input vectors as the solution to Maxwell’s equations is not sequentially dependent. To strengthen the connection between the input and the output optical properties (R, T) and the key independent parameter (λ), we utilize two smaller multi-layer perceptron groupings (MLPs) to separately consider the relationship between the geometry/wavelength and the material data/wavelength. The MLPs outputs are concatenated and fed into a larger deep neural network, the output of which is a reflectivity and transmissivity value.
The DNN method is effective at being both quick to predict and in making accurate predictions, even when extrapolating. The DNN surrogate neural network has a mean absolute error (MAE) and mean-squared error (MSE) between the simulation data and predictions of 0.0033 and 1.35e-4 respectively for the “test” dataset—data that is held back from the training/validation process. As the network’s design is not limited by constraints in material input, a fundamental evaluation of the surrogate’s performance is in the prediction of the optical properties of microstructures made of materials that are outside of the scope of training. Thus, we evaluate the network on two large (1500 simulation) “unseen” datasets Al2O3/Ti, and on 100 simulations of 25 other materials in a “library”. When the simulations from these datasets are completely unseen by the training/validation process, the DNN yields an MAE between prediction and simulation of 0.0175, 0.0131, and 0.0279 for the Ti, Al2O3, and Material library datasets respectively. As the optical properties are already on a scale of 0 to 1, these errors indicate an exceptional degree of prediction accuracy when extrapolating for new materials. To improve the prediction accuracy when extrapolating, the model benefits from small “calibration” datasets. By including 5–10 simulations from the “unseen” datasets (< 1%) to the training/validation process of the surrogate, we reduce the prediction MAE to 0.0073, 0.0049, and 0.0118 for the Ti, Al2O3, and material library datasets respectively. The included simulations represent an almost insignificant number of simulations when compared to the original training and validation dataset (< 0.05%). Despite this, the inclusion has a dramatic impact on the remainder of the extrapolated data, indicating the model has strong physical understanding and only several simulations are needed to “calibrate” the model to new material behavior. In addition to the observed accuracy, the model can make predictions exceedingly fast– with over 1 million individual input sets per minute.
Surrogate neural network: convolutional deep neural network (CDNN)
The architecture of a second proposed surrogate method based on image processing is shown in Fig. 2. Here, we enhance our neural network design philosophy of mimicking the FDTD optical solver by making a network that analyzes a pseudo-mesh. In FDTD, the optical solution for a given combination of material and geometry is derived from solving Maxwell’s equations across a discretized mesh76. The only way the model can differentiate between two distinct materials (e.g., air and the pyramid) is by assigning the λ-dependent material properties to each cell. Here, we approximate that process by generating an image that utilizes the spectrally dependent material and geometric information.
An image is effectively just a tensor—as shown in Fig. 2, we take a three-dimensional matrix of material information and translate it to a standard RGB image, with each pixel containing a vector of material data. While the convolutional process is compatible with higher or lower order tensors, for ease of use and to simplify data storage/the image generator, we utilize a standard 3-channel color image. The vector used to generate each image is the same 8-input vector described in the previous section, with two additional static background material properties (nbkg = 1 and kbkg = 0). As the maximum Xspan and Zspan in the simulations are fixed to a maximum of 10 μm, each image is set to be an effective 10 × 10 μm—with the vertical and horizontal pixel resolution defining the “cell” length. To minimize memory consumption, we employ a 256 × 256-pixel configuration. This effectively means that each pixel represents ~ 40 nm, indicating that the minimum feature size we can effectively depict is ~ 40 nm. Thus, we eliminate any simulations with a pyramid base size or height less than 40 nm. As visualized in Fig. 2, we build pyramids symmetrically about the center of the image, filling in the remainder of the space symmetrically until the combined pyramid base span is 10 um. As an example, a pyramid with a base span of 10 um will perfectly fill the bottom horizontal axis of the image. A pyramid with a base span of 1 um will be replicated a total of 10 times in the image.
Two set of inputs CDNN architecture evaluates two inputs: the generated image and the 8-input vector of geometric, material, and wavelength information. The image component is interpreted by a convolutional neural network. The convolutional neural network is comprised of multiple “units”—each “unit” contains a convolutional layer with a ReLU activation function (defined by Eq. 1) followed by a max-pooling layer.
$$f\left( x \right) = \left\{ {\begin{array}{*{20}l} 0 \hfill & {for\; x < 0} \hfill \\ x \hfill & {for\; x \ge 0} \hfill \\ \end{array} } \right.$$
(1)
After several convolutions with different filter configurations, we flatten the output and apply a dropout before a dense layer to limit overfitting. The second component, the 8-input vector is used as an input to a deep neural network. This input vector contains details that either cannot be in the image—such as the thickness and excluded material properties—or information present in the image such as the material information and geometry to reinforce the model’s interpretation and prediction performance. The output of this DNN is concatenated with the output of the CNN, and then passed through a final set of dense hidden layers. The model’s output is the same as the DNN’s output—the reflection and transmission value. Precise network design details are in the methods section.
This surrogate method is more effective at accurately extrapolating optical properties for new materials when compared to the DNN only surrogate method. When only 20% of the available simulation data is used in the training/validation/testing process, we can match or exceed the performance of the DNN. The precise performance is dependent upon the selection of the material properties used in the three available pixel matrix dimensions. While the selection of the first two-pixel dimensions (the complex refractive index) is straightforward, the third quantity was a point of study. In Table 1, we show the performance of the CDNN architecture in predicting the Ti, Al2O3, and material library datasets when different quantities are used in the third matrix dimension. Based on these results, we observe that the wavelength is the most effective parameter to use in the third dimension. This provides additional confirmation that the wavelength is an extremely important parameter in enabling the model to build proper connections between the input and output. The model evaluations shown in Table 1 are performed with the complete unseen datasets, we apply the 20% limitation only to the model’s training data.
When we enable the model to see the full simulation dataset in training that the DNN does (3.55 input vectors, or 3.55 million images taken from 35,500 simulations), the CDNN method significantly outperforms the DNN method. The Ti, Al2O3, and library datasets have an evaluated MAE of 0.0155, 0.0113, and 0.0226 respectively. When we calibrate the model with 10 simulations as previously demonstrated in the DNN, this decreases to 0.0067, 0.0043, and 0.0098 respectively. Despite being more accurate than the DNN, due to the relative increase in parameters and memory scale, the training time and prediction time for the CDNN is significantly longer than the DNN.
Inverse neural network
We harness the rapid prediction capabilities of the surrogate network to iteratively train a neural network that solves the inverse design problem. That is, we invert the forward problem to predict what material and microstructure geometry will best match a desired system optical response. The input of this network is the spectrally dependent reflectivity, transmissivity, and emissivity corresponding to a desired wavelength range.
The architecture of the inverse neural network, as depicted in Fig. 1, solves the inverse problem by considering the entire spectral distribution. The network input (400 × 1) is a vertically stacked combination of the predicted reflectivity, transmissivity, derived emissivity (ε = 1 – R – T), and the wavelength vector the optical properties are sequenced to. Correspondingly, the inverse network output is a vertically stacked combination of the geometric input and wavelength dependent material properties (n, k, εreal, εim, 403 × 1) as visualized in Fig. 1. Unlike the surrogate network, we cannot separate the inputs of an inverse network into single input vectors. An “inverted” solution for a single set of wavelength dependent optical properties has an unbounded number of potential solutions, so to design an effective inverted network the input must be the entire sequence. In our first design iterations of the deep neural network surrogate, we considered using the entire sequence of wavelengths/material data as an input and reflectivity/transmissivity as the output. While this method was effective, because Maxwell’s field equations are not sequentially dependent, the surrogate solver was much more effective when the sequences were broken up and individual vectors based on a single wavelength point were used as an input. This method also dramatically expands the scope of the training set from 35,500 simulations to 3.55 million input sets, making a limited number of simulations more effective in developing a surrogate with physical insight that can solve the forward problem more accurately. Once the surrogate is trained and can produce accurate results, however, the number of simulations becomes trivial, as we can effectively estimate the solutions to 10,000 FDTD simulations (with 100 wavelength points each) in approximately 60 s using the surrogate network.
To generate training data for the reverse neural network, we pass in large grids of data to the surrogate network for prediction and collate the output into discrete input and output sets. For each material in a library, we generate a grid of 200 × 200 geometric combinations. These combinations are formed by meshing linearly spaced vectors for the Xspan and Zspan. The minimum and maximum values for these vectors are based on the minimum and maximum observed value of Xspan and Zspan in the surrogate’s training dataset. In total, the grid has 40,000 geometric combinations (or 40,000 simulations) for a single material. For each geometric combination we attach a wavelength vector of length 100, leading to a sum of 4 million inputs per material that are passed to the surrogate. While the wavelength vector attached to each geometric combination was originally a linearly spaced vector that ranged from λ = 0.3 um to λ = 16 um, we found that using a linearly spaced wavelength vector with randomized min/max values for each geometric combination increased the versatility of the training dataset and thereby increased the robustness of the inverse neural network. All generated grid data is normalized before being passed into the surrogate network for prediction. The final non-material parameter—the substrate thickness—is also randomized via a uniform random generation process. Details on the random generation process for the substrate thickness and wavelength vector can be found in the methods section. This grid generation process is repeated across all materials in a material library. The material library contains 50 materials: a list of the materials and their references are provided in the supplementary document. The number of materials in the library is easily scalable and are a non-exhaustive representation of material properties available for a microstructure. In total, we use the surrogate network to estimate 2 million simulations, or 200 million sets of inputs. We then sequence the predicted optical properties using the wavelength vector (of length 100) for each simulation.
The inverse network contains three distinct neural network components that are designed to work in tandem to extrapolate a geometry and material that best fit the desired optical response. The first of the components is a deep neural network consisting of multiple hidden layers that directly take in the (400 × 1) input vector. On a rudimentary level, simply inverting the surrogate’s DNN structure—but with the progression of wavelengths instead of an individual wavelength point—could be effective. Through our development process, however, we discovered that this more simplistic approach lacked physical insight and would often result in a non-physically viable output. Although the solutions to Maxwell’s equations for a given wavelength, material, and geometry—the problem the forward network addresses—are not sequentially dependent, abrupt changes or singularities in material properties across a spectrum are seldom. Thus, developing insight on the relationship between a sequence of optical inputs and material properties is crucial in building a physically grounded model. To address this, we remap the linear sequence of optical properties into a “time”-dependent matrix and use it as an input to a recurrent neural network (RNN). That is, we map the 400 × 1 vector of (λ, ε, R, T) into a 1 × 100 × 4 matrix (λ, ε(λ), R(λ), T(λ)). We select bi-directional long-short term memory (LSTM) layers as the constituent component to the RNN. LSTM networks are more effective than other RNN methods for long-range dependencies in data77, and the bi-directional attributes enables the network to learn both dependencies in the forward and reverse direction. Additionally, we utilize dropout layers between LSTM layers in conjunction with L2 regularization to reduce overfitting. The outputs of the RNN and DNN components are then combined using a matrix multiplication and fed into a third component, another DNN. As opposed to directly linking the network output to the final DNN/RNN layers, a DNN between these two networks and the final network output facilitates an additional layer of non-linear abstraction and learning from the outputs of the two preceding neural network components.
Figure 3 shows the output of the inverse neural network for several broadband test inputs and results once the network outputs are simulated in FDTD. We utilize three thermally relevant test spectrums as a baseline evaluator of the inverse network—unity emissivity, an ideal heating emission spectrum, and an ideal cooling spectrum. These emission spectrums are shown in Fig. 3g–i. For these test cases, we set the spectral transmission to be 0 and R = 1 – ε. In Fig. 3a–c we compare the material properties predicted by the neural network to the material properties of a material in the library that best matches it. The predicted geometric conditions are given in Table 2. For all cases, the projected material properties have a close match in the library. In Fig. 3d–e we compare the results of FDTD simulations using the network generated material and the closest match material for the same predicted optimal geometry. For both the heating and unity case, we observe that the neural network generated material outperforms the library material. Additionally, we note that both the generated material and the library material produce a result that matches the desired input to a high degree of accuracy. This is despite the input having a non-physical step-function behavior. The ideal cooling spectrum (Fig. 3i) has a larger departure between the desired spectrum and the true outcome for both the ML and library generated materials. The observed error is attributed to the physical limitation of material properties and the imposition of zero spectral transmission. This assumption is outside of usual physical intuition for the ideal cooling case, where due to the physical material limitations, most emissive materials (e.g., TiO2, Al2O3, PDMS, etc.) in the infrared are transmissive in the ultraviolet (UV) to NIR wavelengths. Thus, this represents a design challenge for a single material to perform both functions, and the inverse neural network attempts to abstract a physically bounded material that fits zero spectral transmission. The identified properties match well across the broader spectrum but do not capture the intended performance in the visible to near infrared regions (λ = 0–4 um). If we allow transmission in this region, we receive an expected output of PDMS, details for which can be found in the supplementary document.

(a–c) Emissivity spectrums of three test cases (Ideal heating, ideal cooling, and unity emissivity) input into the inverse spectrum. The reflectivity is computed as R = 1 – E, and transmissivity is set to 0. (d–f) ML generated refractive index (n) and extinction coefficient (k) for each of the test cases compared to the material properties of a material in the library that most closely matches it. (g–i) FDTD simulation results for both the ML generated material and the closest matching library material.
The inverse neural network is not limited to broadband design. In Fig. 4, we show how the inverse neural network can be applied to narrowband microstructure design. In this case, we define narrowband as 2 emissivity points with a unity value around the intended wavelength peak. The inverse neural network results for 6 different wavelength points (1, 2, 3, 4, 5, and 15 um) are shown in Fig. 4a–f. The design outcome highlights both the strengths and weaknesses of the presented inverse neural network methodology. The geometric design space is limited to the relatively simple micropyramid geometry and can only utilize one material. Thus, with the implemented neural network architecture, our model stays within constrained and physical material behavior, attempting to find valid solutions without creating a completely arbitrary material. This leads to valid narrowband solutions in the low wavelength regions where geometry can be attenuated to generate plasmonic resonance and resonant behavior. This behavior is particularly evident in the solutions visualized in Fig. 4c,d, with peaks at or near the desired location, albeit with either reduced performance or peaks beyond the desired location. The neural network can readily identify solutions that are physically feasible, but is challenged to find resonant behavior that results in narrowband solutions for the mid-infrared wavelengths. These plots result from the both the physical limitations imposed by the input, the training data available to the network, and the fundamental physics of the micropyramid system. Despite these challenges, the inverse network still can be shown to identify physical behavior outside of the scope of its training data. In Fig. 4d, the surrogate model’s predictions do not indicate resonant narrowband behavior at 4 um, but when simulated the inverse neural network output shows a significant degree of narrowband performance. This indicates that the reverse neural network can abstract solutions beyond the training data and identify behavior that the surrogate cannot, but the network is still constrained by the fundamental physics.

(a–f) Narrowband simulation results using the inverse neural network. The input spectrum has an emissivity of 0.05 throughout the λ = 0.3 to 16 um, except for two points that define the “peak” location which have an emissivity of 1. The reflectivity is defined by E = 1 – R and the transmission is set to 0. The results are compared to the desired input as well as the surrogate predictions for the same material.
Post-processing: constraints and solution viability
While the inverse neural network output can accurately predict material and geometric properties that result in the desired optical spectrum, a key limitation of our open-ended neural network architecture is that it cannot directly accommodate design constraints. This presents a significant challenge in making the inverse design process functional. Ideally, our aggregated network will output a solution that is translatable to a fabricated surface morphology. This challenge is evident from the geometric results for the ideal heating case in Table 2; the aspect ratio of the ML predicted structure is ~ 400 (Zspan/Xspan), which is clearly an impractical microstructure. To address this, we take the output of the neural network and use a post-processing methodology to provide new solutions that fit within user set constraints. Though it is possible to constrain the neural network itself via methods such as custom activation functions on the output neurons, limiting the input dataset, or introducing limits in the input, we choose to post-process the neural network output to maintain a robust inverse solver. For this work, we focus on constraining the aspect ratio as it plays a key role in determining if a microstructure is manufacturable. Other constraints, such as a material’s maximum temperature, thickness limitations, etc., are important and can be easily incorporated for more advanced design optimization.
The post-processing methodology has several stages: inverse prediction, material matching, geometric adjustment, surrogate prediction, simulation, and finally output comparison. Precise details for all the stages of the post-processing method are provided in the methods section. The first stage is to take an optical spectrum, pass it through the inverse network, and output a set of geometric properties and spectral material information. From there, the ML generated material data is compared to existing materials in the material library. We then adjust the ML generated geometry to align with the set maximum aspect ratio. Using the adjusted geometry, we randomly select new constrained/viable geometries and simulate them using the surrogate; the most optimal solutions are passed to FDTD. The post-processing method then compares the “ground-truth” FDTD to both the surrogate output and the desired input. This process is performed for both the ML-generated material and “best-fit” library material and identifies a constrained geometry that is optimal for both the ML-generated material and selected library material.
Figure 5a,b shows the post-processing method’s application to select a new viable solution for the ideal heating case discussed above. For this demonstration, we show solutions when the aspect ratio (Z/X) is limited to 10, 5, and 1. The new geometric solutions are simulated using the surrogate, and the results are compared to the predictions for the unconstrained ML generated geometry. Compared to the desired input spectrum, the constrained cases have an LSE value of 1.229, 1.396, and 1.567 for AR = 10, 5, and 1 respectively. It is evident that while the results decrease in adherence to the desired spectrum as the aspect ratio is limited, it is also evident from Fig. 5c that the adjustments to the geometry to accommodate the limited aspect ratio constraint still yield highly optimal results. It is evident that these solutions deviate from the global maximum but are still highly effective when constrained. If the constrained solution is deemed to not be viable enough, additional materials can be included in the search, forming a more advanced material matching algorithm than previously utilized20,72.

(a) Example of vertical and (b) horizontal reorientation of inverse ML output and subsequent generation of randomly distributed solution points about the adjusted geometric solution. Example shown is using an aspect ratio limitation of 5. These solutions are evaluated using both FDTD and the surrogate, the most optimal new solution is using a process described in the methods section. (c) The identified optimal point at each aspect ratio is shown for each aspect ratio. While the solution is not as optimal as the original ML generated geometry, the we can still identify geometric designs with exceptional performance.
Aggregated neural network
By combining the individual components, we form the aggregated system loop visualized in Fig. 1. The aggregated system is designed in a way that it can learn from previous mistakes and enhance its capabilities and accuracy over subsequent generations. Despite the accuracy of the image-based surrogate, especially for unseen materials, the time required to train and retrain the image-based convolutional network compared to the simpler deep neural network surrogate eliminates it as an option for the aggregate system. The geometric/wavelength simulation grid data generated for each material is passed into the DNN surrogate and the ensuing predictions are the basis of the inverse network training data. If changes are made to the surrogate or more materials are included in the library, the grid data will be regenerated. We then employ the broadband and narrowband test cases in Figs. 3 and 4 to test the performance of the inverse network in making predictions. If either significant deviations between the desired result and actual result are encountered or the surrogate prediction deviates significantly from the FDTD results, we perform additional simulations via our post-processing solution generation method. These results are incorporated in the training data for the surrogate, and the surrogate/inverse network are retrained. Post-processing then serves an additional function as a pseudo-adversarial checkpoint where the generated results are compared to true results and if there is an undesired outcome the network is retrained with new simulation information.
