In-orbit demonstration of a re-trainable machine learning payload for processing optical imagery

The ML payload

The first and simplest component of a machine learning enabled satellite network is the ML payload. This is a self-contained machine learning software module that can be considered analogous to a hardware payload, such as a camera or sensor. ML payloads are typically encapsulated in a virtualised software container, isolated from the base computing environment and adjacent software. PodMan¹³ and Docker¹⁴ are two of the most popular virtualisation frameworks that enable guest operating systems with custom environments to be distributed as single ‘image’ files, with all dependencies included. Usually containers are compiled in a layered fashion, such that multiple containers can extend a shared ‘base’ image that contains common dependencies. This greatly simplifies the process of developing software for onboard processing, allowing the satellite computing module to offer a familiar and consistent base system (e.g., a standard linux-based tool-chain and Python software stack), with hardware and data access exposed through a simple application programming interface (API).

ML payloads also offer a simple pathway to upgrade, correct or enhance satellite capabilities in a relatively risk-free way. For example, neural networks can be re-trained to perform better (e.g., by utilizing newly available data, taking advantage of new acquisition parameterisations, or adapting to the specifics of a new sensor, or in response to previously unseen events), or even to recognise more classes of terrain in images. At a minimum, only the weights of the network need to be altered and the pre-validated supporting software stack can be left unchanged. Hence, the risk of introducing bugs due to code-changes is ameliorated. Network weight definitions are also significantly smaller in size (\(\sim\) 1–20 MB) than a full software stack, meaning much lower upload costs.

The Wild Ride mission: a ML payload testbed

Trillium has partnered with D-Orbit¹⁵, Unibap¹⁶ and ESA \(\Phi\)-Lab¹⁷ to build and test a ML payload on a prototype satellite constellation node. D-Orbit is a space logistics and transportation company offering MicroSat and CubeSat deployment services through their ION Satellite Carrier¹⁸. The D-Orbit Wild Ride mission for the carrier ION SCV Dauntless David successfully launched into LEO on a SpaceX Falcon 9 rocket on June 30th 2021 (see Fig. 1). In addition to seven satellites destined for deployment to multiple orbits, the carrier also included three internal demonstrator payloads, including D-Orbit’s Cloud Computing in Space module—the first iteration of an on-orbit cloud computing module being developed by Unibap.

The Cloud Computing in Space module can be considered a precursor to a fully-fledged space cloud node, offering a quad-core x86 64-bit processor, a Microsemi SmartFusion2 FPGA and an Intel Movidius Myriad X Vision Processing Unit (VPU). In particular, the onboard Myriad X processor accelerates machine learning inference and makes it possible to deploy deep artificial neural networks (ANNs) in a power-constrained environment (1 TFLOPs of compute with a nominal consumption of 1W¹¹). The Myriad X chip underwent a radiation characterisation in ESA test facilities and has already been tested in space on the \(\Phi\)-Sat-1 mission¹¹. The module also carries the D-Sense sensor module¹⁹, which includes a basic RGB camera, similar to a standard webcam. Dauntless David will remain in low Earth orbit for approximately two years, conducting engineering tests and experiments.

The WorldFloods ML payload

For this project we chose to deploy the WorldFloods ML payload, which was developed in partnership with ESA during Frontier Development Lab (FDL) Europe 2019²⁰. WorldFloods¹² is a comprehensive dataset and suite of machine learning models that can be used to create flood masks from multi-spectral Earth-observation images. The segmentation models can distinguish between cloud, land and water, and were trained on multi-band images from ESA’s Sentinel-2 (S2) satellite, including the infrared bands.

The multi-spectral instrument of S2 is a push-broom sensor with high radiometric resolution (12-bit). Its spectral response covers the visible, near-infrared and shortwave-infrared ranges (490–2380 nm) with 13 bands, with a spatial resolution varying from 10 to 60 m depending on the band. In this work we re-sampled all the bands to 10 m which is the resolution of the visible and infrared bands. As in Mateo-Garcia et al.¹², we used level 1C S2 products. Level 1C products are processed to calibrated top-of-atmosphere reflectances and the images are geo-referenced, and ortho-rectified (see the S2 User Handbook for details²¹).

When deployed on a satellite, WorldFloods offers an enhanced ability to rapidly map the spatial extent of water bodies and flooding detected by orbital sensors. At present, creating a flood map at sufficient resolution for first responders (\(\sim\)10 m) can take up to 48 hours due to the lead time involved in downloading, processing and interpreting high-resolution multi-spectral data, followed by transmitting the derived maps to the disaster zone. If the multi-spectral data can be processed in orbit instead, a vectorised polygonal outline of the flooded region could quickly be transmitted to the ground. This data product is potentially tens of times smaller in size, making it feasible to push directly onto mobile devices in the field—within minutes of being acquired. At present, the cost of downloading data from orbit dominates most operational budgets, so even modest decreases in file size offer potentially significant savings.

Table 1 Different models tested to segment flood water in Sentinel-2 images.

Model development

The WorldFloods segmentation models created during FDL Europe 2019¹² have recently been open-sourced in a public python package called ‘ML4Floods’²⁶. In this framework, users can train segmentation models using the WorldFloods dataset and different S2 band combinations. These models can subsequently be benchmarked using a dedicated set of test images from WorldFloods, or applied to new S2 images that can also be downloaded with the assistance of the ML4Floods package. For this work, we use the models with all thirteen S2 bands published in Mateo-Garcia et al.¹², but we also train new model versions using only three visible bands B2 – B4, to approximate a standard RGB camera (e.g., like the D-Sense camera on the compute module). It is well-known that infrared (IR) and short-wave infrared (SWIR) bands are the dominant discriminators of water in optical EO data^23,27,28, so we expect the RGB-only models to perform worse than multi-band models. In “Results and discussion” section we directly compare the performance of the RGB models against the all-band models.

The available model variants are presented in Table 1. The architectures of the Linear, Simple CNN and U-Net models are the same as presented in Mateo-Garcia et al.¹², but we added HRNet^25,29,30 as an example of a modern architecture that has produced state-of-the-art results in several semantic segmentation tasks, including remote sensing problems (see e.g., Etten and Horgan¹). The implementation of all the model-training pipelines is open-sourced in the ML4Floods GitHub package³¹.

Adapting models to the D-Sense camera after the satellite launch

As we previously highlighted, it is well-known that ML models struggle when they are applied outside the context in which they were trained. In the ML literature this problem is known as domain-shift, or data-shift^9,32, and it occurs when the distribution of the data is different at training and testing times. In the context of remote sensing, this is a conspicuous problem that arises every time a model (ML-based or otherwise) developed for one sensor is applied to another with slightly different characteristics (radiometric shift), or to a previously unseen area (geographical shift), or to images observed through different atmospheric conditions (seasonal shift). In our case, we observe this problem when the WorldFloods models (trained on calibrated S2 images, 10 m resolution, 12-bit depth) are applied to images taken by the D-Sense camera (\(\sim\)1 km resolution, 8-bit depth, no calibration and significantly worse radiometric quality). We show in “Results and discussion” section that indeed the differences between the images lead to very poor model transfer performance.

There are inter-calibration and domain adaptation techniques³³ that could potentially address this problem and do not require supervised information for the D-Sense sensor. These techniques attempt to align the colour and size distributions of the two domains (S2 and D-Sense) so that a model trained with supervised information from S2 images could work on D-Sense images. We initially tried histogram matching³⁴, which seeks to align the color distributions of the two sensors—but without success. We also attempted to retrain the models on down-scaled S2 images made to resemble the spatial resolution of the D-Sense camera (as proposed in^12,35). However, the segmentation results were still unsatisfactory and therefore we did not try more advanced domain adaptation methods (e.g., Mateo-Garcia et al.³⁶ or Tasar et al.³⁷).

Hence, in order to build a sufficiently good model for processing D-Sense camera data, we incorporated supervised information on native D-Sense images. To build a training dataset we downloaded four D-Sense acquisitions of the Earth (size \(2500\times 1950\) pixels) and annotated regions of water, land and cloud with manually drawn polygons. We trained new models to segment D-Sense images, both by using the S2 RGB WorldFloods models as a starting point (called ‘fine-tuning’ in the literature) and by training from randomly initialised weights. The performance of the SCNN model displayed the best trade-off in accuracy vs model size and was chosen for uplinking to the satellite. We present the validation metrics of all models and some representative examples in “Results and discussion” section.

Engineering the ML payload

The ML4Floods Python toolbox produces trained network definitions and weights in the PyTorch format, and these comprised our starting point. The PyTorch files must be converted to the Intel OpenVINO intermediate representation (IR) format to run in the Myriad X chip. This conversion process quantises the weights and intermediate tensor representations to 16 bit floats (FP16 or ‘half precision’), shrinking the size of the weights on-disk file size and speeding up inference in the Myriad X processor. Table 1 shows the size of the model definition files, which vary between 8 KB and 15 MB for the quantized versions in IR format. Deploying these models on the Unibap SpaceCloud hardware required further development steps:

Finalise and test the tool chain to convert models from PyTorch to IR format via the Open Neural Network eXchange (ONNX) format.
Build an inference pipeline that ingests a multi-band image and produces a vectorised mask outlining cloud, land and water.
Encapsulate the inference pipeline in a ML payload software container and integrate into the Unibap SpaceCloud Framework.
Test and tune the ML Payload so that it functions within the processing envelope of the hardware for the mission: a wall-time under 600 s and using less than 2 GB of memory.

The Unibap SpaceCloud Framework (SCFW) is a software platform running on the satellite payload computer and providing a Docker host for deploying custom containerised applications. The SCFW abstracts access to satellite sensors and application management routines via a simple API that supports multiple languages via protocol buffer definitions. The containerised environment is based on Ubuntu Linux (for this mission, version 18.04), meaning that SCFW applications can be developed on commodity x86 hardware using popular languages, rather than specialised languages designed for embedded programming. This system greatly accelerated development, which took place over \(\sim 4\) weeks during May–June 2021.

Our WorldFloods payload application targeted the Myriad X processor to speed up machine learning, meaning that it was restricted to using the inference engine provided by the Intel OpenVINO Toolkit³⁸. However, this proved to be a boon as the inference engine can be called from the Python language in which previous development had been done. Myriad X processors are also readily available off-the-shelf with USB interfaces³⁹ so testing of network architectures could be done directly on the target hardware—essential for space-qualifying the ML payload.

A schematic diagram of the ML payload is shown in Fig. 2. As a prototype SCFW application, it is currently designed to be triggered from the ground when data becomes available in the input directory. The application detects and normalises the data cubes (depending on the pre-processing required by the requested model) and then pushes the data through the neural network in a forward pass, producing spatial per-pixel masks that classify the image into ‘land’, ‘cloud’ and ‘water’ categories. These intermediate pixel masks are written to a temporary directory before being further processed into polygonal mask outlines. The integer masks are converted to polygons using the rasterio python module, which offers a suitable algorithm in the ‘features.shapes()’ method. Under the hood, the method calls the C routine ‘GDALPolygonize‘ of the Geospatial Data Abstraction Library (GDAL)⁴⁰. This vectorisation process effectively compresses the mask information, with the loss of some fidelity, although the balance between resolution and compression can be tuned. We initially saved the polygons to disk as plain-text files of vertices, but later found that the binary GeoPackage format produces a significantly smaller file. These GeoPackage (.gpkg) files are compressed together with some logging information and written to an output directory, which is queued for syncing with ground-based servers.

Mission parameters impose a memory limit of 2 GB and a maximum contiguous processing time of 600 s. The first version of our application significantly surpassed both of these limits when processing the large \(10\times 10\) k pixel S2 chips. To solve the memory problem we sliced each data cube into multiple overlapping ‘tiles’ of \(256 \times 256\) pixels, performed inference on each of these separately and sequentially updated a full-chip pixel mask on storage. To tile and stitch the predictions we followed the recommendations of Huang et al.⁴¹ by making predictions with overlap (16 pixels) and discarding the predictions at the borders of the tiles (this prioritizes predictions with larger receptive fields). To overcome the memory limitations, we used memory mapping to iteratively build the full pixel mask on disk. For the vectorisation step we similarly divided this full-chip pixel mask into overlapping tiles (this time with a larger tile size of 1,\(024 \times 1\),024 pixels). To work around the processing time limit, we instructed the application to stop and save its state when approaching the cutoff time. On the next processing window, the application would pick-up where it left off to complete the analysis, setting a ‘done’ flag in the output directory when complete. The final masks and meta-data were then compressed into a ZIP file, ready for download.

The ML payload application is controlled by running a custom Docker command and feeding the controlling script with different parameters. These specify the file system directories to access input and output data, the model name and weight definition directory, the processor device (e.g., Myriad X or CPU) and the processing time-limit. Different experiments can be performed by changing these inputs and—crucially—the application can be pointed towards completely new weight definition files, allowing the models to be updated without significant infrastructure changes.

Source link