Highly adaptable deep-learning platform for automated detection and analysis of vesicle exocytosis

Machine Learning


Stationary and random burst events algorithm

Our approach to detecting burst events involves two main parts: automatic region selection and neural network classification. The first part is detecting the potential ROIs. The event detection algorithm used by IVEA employs the grayscale image foreground detection method, which leverages a bidirectional image subtraction technique. This involves subtracting the intensity values in a 16-bit image stack of a reference frame \({I}_{i}\) from an offset frame \({I}_{i+n}\) in both directions—forward \((\Delta {I}_{F}={I}_{i+n}-{I}_{i})\) and backward \((\Delta {I}_{B}={I}_{i}-{I}_{i+n})\). This approach simplifies the handling of 32-bit matrices, reducing potential complexities with straightforward data processing. Backward subtraction is employed only with random burst events to detect fusion events visualized with a pH-insensitive stain. In the absence of fluctuations or foreground variations, the subtraction results in an image with pixel values of zero. However, the presence of noise and/or artifactual fluctuations during image acquisition makes it difficult to differentiate real events from artifacts. To distinguish real events from noise, we detect local maxima (LM) in the subtracted images, representing potential exocytic hotspots. Subsequently, an ROI is generated around each identified LM. These ROIs are then employed to generate image patches, which correspond to cropped sections of video frames. This approach captures localized activities over time, thereby enabling the isolation of specific events for classification rather than analyzing the entire video frame-by-frame with the neural network. The LM detection and ROI extraction process is fully automated, incorporating a global thresholding step that learns from the first four frames of the processed video. This ensures that noise and spurious maxima are filtered out, leaving only meaningful events for classification. The local minima prominence \({{{\mathcal{p}}}}\) approximation algorithm iterates over the noisy images; each iteration increments the value \({{{{\mathcal{p}}}}}_{n}\) until the number of LM coordinates (\({l}_{n}\)) equals 0. If, instead, four successive iterations yield the same number of maxima (e.g., \({l}_{n}=\,{l}_{n-4}\)), the program sets \({{{\mathcal{p}}}}={{{{\mathcal{p}}}}}_{n}\) where “\(n\)” is the iteration number. These four images are also utilized to estimate the full width at half maximum (FWHM) \(\sigma\) of the noise LMs, and to measure \(\Delta \mu\), which is the average of the mean intensities \(\Delta {\mu }_{j}\) at LM \({C}_{j}\) with radius \(r\) expressed as:

$$\Delta {\mu }_{j}=\frac{1}{4{r}^{2}+1\,}{\sum }_{{C}_{j}^{(x)}-r}^{{C}_{j}^{(x)}+r}{\sum }_{{C}_{j}^{(y)}-r}^{{C}_{j}^{(y)}+r}\Delta {I}_{i}({x}_{j},{y}_{j})\,{with}\,{r}\,{\mathbb{\in}}\,{\mathbb{N}}$$

(1)

The region selection procedure is similar to parameter automation. We determine \(\Delta {\mu }_{j}\) and \({\sigma }_{j}\) for each event \({E}_{j}\) at \({C}_{j}\). To designate \({E}_{j}\) as a selected region we employ the following condition:

$${E}_{j}\left|\left(\Delta {\mu }_{j} > \Delta \mu \cdot \theta \right)\wedge \left({\sigma }_{j} > \sigma \right)\,{with}\,\sigma,\theta \,{\mathbb{\in }}\,{\mathbb{R}}\right.$$

(2)

Here, θ denotes the sensitivity parameter, which can be adjusted by the user.

After detection, IVEA performs spatiotemporal tracking for ROI recognition and labeling. This is applied for each detected ROI coordinate over a certain radius and period. For burst events exhibiting temporal dynamics (e.g., lytic granule fusion), a sequence of image patches is extracted, encompassing frames both preceding and following the time point of the event. Each sequence is fed to a shared encoder layer attached to the ViT architecture for image recognition as described in Dosovitskiy et al.36. We designated the modified architecture as eViT. The image patches’ spatial dimensions are variable, but they are scaled to fit the encoder input layer of 32 × 32 dimensions. Each patch represents the extracted area centered at the LM, while the sequence is centered on the fluorescence intensity peak time (Fig. 1a). The encoder network automatically extracts features from each sequence and forwards the encoded data to a multi-layered perceptron (MLP), which in turn forwards the data to the ViT network. The ViT network then performs positional encoding on the extracted features and classifies each sequence as a true event or not (Fig. 1b).

For stationary burst events, we employed a straightforward model architecture comprising an LSTM network37 for exocytosis classification. Our LSTM architecture is designed for multivariate time series classification49 (Fig. 1c). In this case, the image patches undergo feature extraction preprocessing to convert them into one-dimensional time-series vectors (Fig. 1a). These feature vectors are subsequently fed into the LSTM network for classification (Fig. 1d). An additional optional method is implemented in the stationary burst events for detecting and tracking agonist/electric and NH4+ stimulations. This algorithm is utilized to recognize and sort events based on their occurrence period. Stimulus detection is expressed as:

$${{{{\mathcal{R}}}}}_{i}=\frac{\Delta {\mu }_{i+1}}{\Delta {\mu }_{i}}\, > {\theta }_{s}$$

(3)

Where \({{{{\mathcal{R}}}}}_{i}\) is the mean ratio; \({\theta }_{s}\) is the default threshold, is 1.1;\(\,\Delta {\mu }_{i}\) and \(\Delta {\mu }_{i+1}\) are the mean gray value of image \(\Delta {I}_{i}\) and \(\Delta {I}_{i+1}\) respectively.

To avoid an increase in the number of detected events due to high fluorescence intensity during the stimulus period, we adjust the detection sensitivity by increasing the detection sensitivity \({\theta }_{i}\), such as \({\theta }_{i}=\theta \cdot {{{{\mathcal{R}}}}}_{i}\).

Feature extraction

Feature extraction is performed by extracting a sequence of image patches around \({C}_{j}\) over a time interval. This patch is subdivided into smaller regions. Subsequently, we determine the mean intensity of each region (Fig. 1a). For each selected region denoted as \({E}_{j}\), we extract the spatial neighboring pixels around \({C}_{j}\) as a 2D matrix \({{{{\bf{M}}}}}\,^{j}\in \,{{\mathbb{R}}}^{k\times k}\), where \(k\) is the kernel defined by the user. The spatiotemporal data \({{{{\bf{V}}}}}_{j}\in \,{{\mathbb{R}}}^{k\times k\times T}\), which represents event \({E}_{j}\) that occurred at time \({t}_{j}\), is extracted over several frames T expressed as (3).

$${{{{\bf{V}}}}}_{j}\left(x,y,t\right):\!\!=\left\{{{{{\bf{M}}}}}_{t}^{j}\left(x,y\right)\, \left| \,t\in \left[{t}_{j}-{n}_{b},{t}_{j}+{n}_{a}\right]\right.\right\}$$

(4)

Whereas \({n}_{b}\) is the number of frames before \({t}_{j}\) and \({n}_{a}\) is the number of frames after \({t}_{j}\). The spatial coordinates of each matrix \({{{{\bf{M}}}}}\,^{j}\) are split into 13 small regions \({\mathfrak{f}}\,\epsilon \,{\mathbb{N}}\) (see Supplementary Fig. 13a), which yield the feature matrix \({{{{\bf{P}}}}}_{j}\,\epsilon \,{{\mathbb{R}}}^{T\times {\mathfrak{f}}}\) for event \({E}_{j}\) such as (4):

$${{{{\bf{P}}}}}_{j}\left({{{\bf{x}}}}\left(t\right){{,}}{\mathfrak{f}}\right):\!\!=\left\{\frac{1}{{n}_{f}}{\sum }_{f=1}^{{n}_{f}}{{{{\bf{V}}}}}_{j}\left({x}_{f},{y}_{f},t\right) | \,f\in \,\left[1,{\mathfrak{f}}\right]\right\}$$

(5)

Whereas \({n}_{f}\) is the number of pixels in each region \(f\).

This approach forms time series data \({{{\bf{P}}}}\), represented as \({{{\bf{P}}}}\in {{\mathbb{R}}}^{{n}_{s}\times T\times {\mathfrak{f}}}\), where \({n}_{s}\) is the total number of nominated events. Each element \({{{{\bf{P}}}}}_{j}\) in \({{{\bf{P}}}}\) represents 13 distinct signals that capture the fluorescence intensity profile of different regions plotted on a single graph (Fig. 1d, Supplementary Fig. 9). The choice of small, symmetrical regions over circular masks enhances feature preservation of the spatiotemporal signal (Supplementary Fig. 13a, c). Additionally, we opted for 13 regions over 9 regions due to their higher sensitivity in capturing slow granule movements (Supplementary Fig. 13a, b).

If the number of frames increases by \({t}_{n}\), such as \({{{{\bf{x}}}}}^{{\prime} }\left(t\right)\in {{\mathbb{R}}}^{{{{\mathcal{t}}}}}\) with \({{{\mathcal{t}}}}=T+{t}_{n}\). \({{{\bf{P}}}}\left({{{\bf{x}}}}\left(t\right){{,}}{\mathfrak{f}}\right)\in {{\mathbb{R}}}^{T\times {\mathfrak{f}}}\) is then determined by using windowing mean-sampling expressed as:

$${{{\bf{x}}}}{\left(t\right)}_{i}=\left\{\frac{1}{{{{\mathcal{w}}}}}{\sum }_{k=1}^{{{{\mathcal{w}}}}}{{{{\bf{x}}}}}^{{\prime} }{\left(t\right)}_{{{{\mathcal{w}}}}\left(i-1\right)+k}\, \right| {{{\mathcal{w}}}}=\frac{{{{\mathcal{t}}}}}{T}\,{with}\,{{{\mathcal{w}}}}\,{\mathbb{\in}}\,{\mathbb{N}}$$

(6)

Where, \({{{\bf{x}}}}{\left(t\right)}_{i}\) is the \(i-{\mbox{th}}\) element of the sampled vector \({{{\bf{x}}}}\left(t\right)\), and \({{{{\bf{x}}}}}^{{\prime} }{\left(t\right)}_{j}\) as \(j=\,{{{\mathcal{w}}}}\left(i-1\right)+k\) is the \(j-{\mbox{th}}\) element of vector \({{{{\bf{x}}}}}^{{\prime} }
(7)

Multivariate LSTM neural network architecture

Our LSTM network comprises four different layers. It serves as a robust framework for multivariate temporal data analysis. The first input layer, defined by the input-shape specification, establishes the dimensions of the incoming multivariate time series data. This initial stage isn’t a distinct processing layer but rather a configuration step to align the network with the input data’s structure. The subsequent architecture unfolds with a 1D convolution layer employing Rectified Linear Unit (ReLU) activation function. The subsequent layer incorporates a LSTM, designed to recognize sequential patterns. To promote stable training dynamics, a batch-normalization layer is added. The last layer is the fully-connected Dense layer that employs the softmax activation function, rendering the architecture adept for multiclass classification.

$${\hat{{{{\mathcal{y}}}}}}_{i}=h\left({{{\mathcal{z}}}}\right)=\frac{{e}^{{{{{\mathcal{z}}}}}_{i}}}{{\sum }_{j=1}^{{n}_{c}}{e}^{{{{{\mathcal{z}}}}}_{j}}}$$

(8)

Here, \({h}\left({{{\mathcal{z}}}}\right)\) is the softmax function, \({{{{\mathcal{z}}}}}_{i}\) represents the raw score (logit) for a specific class \(i\) and \({n}_{c}\) is the number of classes. This arrangement encapsulates both localized and temporal patterns inherent to multivariate sequential data, combining convolution, recurrent, and normalization mechanisms. The network is structured to accommodate categorical cross-entropy as the loss function \({{{\mathcal{L}}}}\) (Eq. 9), tailored for multiclass categorization, while optimization leverages the Adam optimizer with a learning rate of \(3\times {10}^{-4}\).

$${{{\mathcal{L}}}}=\frac{1}{{n}_{s}\,}{\sum }_{j=1}^{{n}_{s}\,}L\left({\hat{{{{\mathcal{y}}}}}}^{(j)}\right),{{\mbox{with}}}\,L\left({\hat{{{{\mathcal{y}}}}}}^{(j)}\right)=-{\sum }_{i=1}^{{n}_{c}}{{{{\mathcal{y}}}}}_{i}^{\left(j\right)}\cdot \log \left({\hat{{{{\mathcal{y}}}}}}_{i}^{\left(j\right)}\right)$$

(9)

Where \(L\left({\hat{{{{\mathcal{y}}}}}}^{(j)}\right)\) is the loss for a single data point (sample), \({n}_{s}\) is the number of data points, \({{{{\mathcal{y}}}}}_{i}\) \(\in \left\{0,\,1\right\}\) is the true label for class \({i}\) and \({\hat{{{{\mathcal{y}}}}}}_{i}\) is the predicted probability.

Encoder-ViT network architecture

Our eViT network consists of two components: a CNN as the encoder for feature extraction from image patches and a ViT for classification. The encoder is shared CNN. It comprises seven layers, including a 2D spatial convolution layer that is followed by a sequence of 3D convolution layers and 3D max pooling operations (Supplementary Fig. 14a, b).

We have two pre-trained models available with IVEA, namely GranuVision2 and GranuVision3. The model’s encoder input layer accepts time series image patches of length 26 for GranuVision2 and 28 for GranuVision3. The width, height and channel of each patch is 32 × 32 × 1, denoted as \({{{\bf{X}}}}\in {{\mathbb{R}}}^{t\times w\times h\times c}\). If the dimensions of the image patches change, we use bilinear interpolation to resize the images to 32 × 32. The initial stage of the shared CNN involves the application of a 3D convolution layer with 16 filters and ReLU activation, which is followed by a 3D max pooling layer. Subsequently, another 3D convolution layer with 32 filters and ReLU activation is applied, followed by a 3D max pooling layer. Finally, a 3D convolution layer with 64 filters and ReLU activation is applied, followed by a 3D max pooling layer. The output of the encoder is passed through a Flatten layer and a 64-unit Dense layer. Positional embeddings are added before the sequence enters the transformer block, which includes an MLP. The transformer block consists of multi-head self-attention and an MLP, each with a residual connection (Fig. 1b). The transformer block operates on an input sequence of length equal to the time series dimension, with a key dimension representing the output size of the previous Dense layer. The MLP inside the transformer block consists of two Dense layers, where the first layer has a dimension twice the key dimension, and the second layer projects it back to the key dimension. The final MLP comprises two Dense layers: the first layer with GELU non-linearity, and the second with SoftMax activation, classifying 10 (2 fusion + 8 artifacts) or 11 (3 fusion + 8 artifacts) classes for GranuVision2 or GranuVision3, respectively. The eViT architecture underwent ablation to study the impact of layers on model performance. This study involved probing the model on an evaluation dataset by eliminating layers and retraining. The evaluation dataset was divided into two categories: exocytosis (positive labels) or non-exocytosis (negative labels). We performed the ablation study on the shared CNN layers and the penultimate Dense layer. The results show that progressively removing these layers leads to a noticeable decline in performance. The final configuration, with only one convolution layer, exhibited the strongest performance drop. This study underscores the additive role of each layer in our current eViT architecture (see Supplementary Fig. 15).

Gaussian non-max suppression

Various non-maximum suppression techniques are typically used to address the multiple overlapping-detections problem, including the classical intersection over union (IoU) method, weighted boxes fusion (WBF), and others56. However, these methods cannot be implemented with our data because exocytotic events cannot be limited to objects with boundaries or boxes. Thus, we have developed a new method that implements 3D Gaussian spread over time

$$g\left(x,y,t\right)=\Delta {{{\rm{\mu }}}}\cdot \exp \left(-\frac{\left(\Delta {x}^{2}+\Delta {y}^{2}\right)}{2{\left(\sigma+{SR}\right)}^{2}}\right)\cdot \exp \left(-\frac{\Delta {t}^{2}}{2{\tau }^{2}}\right)\,{with}\,\tau={{{\mathcal{v}}}}{\mathscr\,{\cdot }}\,\sigma$$

(10)

where \(\Delta {{{\rm{\mu }}}}\) is the fluorescence intensity of the event area in the subtracted image; \(\sigma\) is the event’s cloud spread; \({\mbox{SR}}\,=1\) is a user-controlled spread radius; \(\tau\) is the temporal cloud spread; and \({{{\mathcal{v}}}}\) is the image acquisition frequency set to 10 Hz.

To isolate prime events from redundant ones, we apply \(g\left(x,y,t\right)\) to each pair of events as follows:

$$\left\{\begin{array}{cc}{E}_{i}={E}_{j}&{{{\rm{if}}}}\,\Delta {\mu }_{i} < {g}_{j}\left({x}_{j},{y}_{j},{t}_{j}\right)\,\hfill \\ {E}_{i}\,\ne {E}_{j}&{\mbox{otherwise}}\,\end{array}\right.$$

(11)

where \({E}_{i}\) and \({E}_{j}\) are two different true positive (TP) events.

Neural network training and evaluation

Python was utilized for developing and training the LSTM and eViT networks, while Visual Studio Code was employed for coding. The LSTM training data was processed and prepared in MATLAB, enabling visualization of patterns in segmented regions of time series image patches. In contrast, the eViT network’s data labeling was managed using ImageJ. The initial data labeling and IVEA classification validation were performed by the human experts listed in Supplementary Table 2.

For the eViT, the videos were associated with their corresponding ROI files. These files include ROI center coordinates, frame positions, and radii. The IVEA software was used to export the labeled ROIs as zip files, with the training datasets being tagged as _training_rois to differentiate them from the evaluation data. The user can enable this option in the IVEA graphical user interface (GUI) via the Select Model dropdown list. Prior to integrating the neural network with IVEA, the initial procedure involved exporting the selected regions identified by the automation processes and manually labeling them. Subsequent to the initial integration of the neural model, events in the training datasets were automatically labeled with a uniform naming convention that includes the list number, event ID, frame number, and classification category. For example, the third event in the ROI manager list, with an ID of 3, detected at frame 779 and initially classified as category 1, would be named: 3-event (3) | frame 779_class_1. To prepare the data for the neural network training, we have developed a Python-based script with a simple interface and a JSON configuration file to extract or load the training data. The process of extracting the training data entails the following two steps. First, the ImageJ ROIs are read to identify the events positions and their categories. Subsequently, the times series patches are extracted at each ROI. These data are then stored as Hierarchical Data Format (.h5) file format, organized into dictionaries containing x_train and y_train data, facilitating efficient loading and archiving of the training data.

During the training process, labels were refined through an iterative process. Initially, the network was trained to distinguish between exocytosis and not exocytosis. Later on, additional classes were introduced to differentiate between exocytosis subtypes, as well as motion or noise artifacts. Exocytosis classes received positive integers (e.g., 0, 1, 2), while non-exocytosis classes (such as noise or artifacts) were assigned negative integers (e.g., -1, -2). Whenever a misclassification was noted (for instance, class_1 instead of class_2 or -6), the label was corrected or a new one was defined, and the data were fed back into the network for retraining. To accommodate the substantial volume of events predicted and labeled by IVEA, a significant number of labels associated with non-exocytosis events were excluded to enhance data management. For generating a new model, training files and tools are available on our GitHub page for IVEA (see data available section).

For the LSTM, the data were exported as CSV files in the form of \({{{\bf{X}}}}\in {{\mathbb{R}}}^{{n}_{s} \times {\mathfrak{f}} \times t}\) for stationary burst events, and the labeled data as \({{{\bf{Y}}}}\in {{\mathbb{R}}}^{{n}_{s}\times {n}_{c}}\), where, \({n}_{s}\) is the number of samples and \({n}_{c}\) is the number of classes. Our LSTM network for stationary burst events was trained with 39 videos with 11,300 data samples with dimensions of 13 ×41, while for the random burst events the LSTM network was trained on 548 videos with 12,600 data samples. As for the eViT network, the data were saved as videos with their associated ROI files both in zip and roi file container. The input data for the eViT network is \({{{\bf{X}}}}\in {{\mathbb{R}}}^{{n}_{s}\times t\times w\times h\times c}\) and \({{{\bf{Y}}}}\in {{\mathbb{R}}}^{{n}_{s}\times {n}_{c}}\). The eViT network for random burst events was trained with 608 videos and 7931 data samples augmented to reach 24,916 samples of 26 x 32 x 32 dimensions each (see Supplementary Table 7). These videos were acquired at a rate of 10 Hz. For videos in which the eViT was tested and acquired at 50 Hz, the videos were reduced by a factor of five using the ImageJ “reduce” function, resulting in a rate of 10 Hz. We used an NVIDIA RTX 3070 to training the neural network.

The final models were evaluated on videos unseen by the neural network, rather than reserving part of the training data for validation. A diverse array of datasets was utilized in the evaluation process, acquired from multiple laboratories employing a variety of microscopes. These included lytic granule exocytosis in T cells, dense-core granules in INS-1 and chromaffin cells (both pH-sensitive and pH-insensitive fluorescent markers), DRG neurons expressing Synaptophysin-SEP, and dopaminergic neurons examined with dopamine nanosensors (“AndromeDA”). The analysis showed strong concordance with HE annotations. Most of the datasets were processed using default IVEA settings and automated parameter estimation, though users may override these defaults when necessary. For parameter estimation, these data sets are devoid of fusion events that occurred within the initial four frames of the videos, enabling IVEA to learn from. However, users have the option to disable automated learning and opt for manual override by adjusting the sensitivity to 1 or lower, thereby generating additional local maxima coordinates.

For each unseen video analyzed by IVEA, the resulting event ROIs were examined for validation. HE labels every detected event as true exocytosis (true positive, TP) or falsely predicted (false positive, FP). Any missed exocytosis event that was previously observed by the HE was labeled as a false negative (FN). Finally, precision, recall, and F1 score were computed based on the TP, FP, and FN counts, with the corresponding formulas:

$${{\mathrm{Precision}}}=\frac{{TP}}{{TP}+{FP}}$$

(12)

$${{\mathrm{Recall}}}=\frac{{TP}}{{TP}+{FN}}$$

(13)

$${{\mathrm{F1\,{Score}}}}=2\,\times \frac{{Precision}\times {Recall}}{{Precision}+{Recall}}$$

(14)

The IVEA analysis was conducted on a range of computer systems, with a baseline configuration of an Intel Core i5 processor and 32 GB of RAM without GPU.

Training on new data

Training the neural network on new data involves two steps: data labeling and neural network training. To label the data, the user can create an ROI over the event using the ImageJ “ROI Manager” tool. The user should then label the ROIs with a special tag and with their associated category number, and save the ROI/s as a roi/zip file under the same name as the video. Alternatively, users can employ the IVEA “Data labeling” ImageJ plugin, which is provided with IVEA for easy labeling. The next step involves training the neural network using Python. Users should set up a Python development environment, such as the Anaconda platform. To run the IVEA training GUI, users must install the libraries associated with the Google TensorFlow platform. The script launches a GUI that enables users to combine their labeled data with an existing dataset, collect a new dataset, train a new neural network, or refine an existing one. After training, the script generates a trained Keras model and saves it to the designated directory with its associated JSON configuration file. The IVEA plugin enables users to import a custom model for subsequent predictions. If no custom model is imported, IVEA uses its embedded models.

Google TensorFlow-Java implementation

IVEA’s LSTM and eViT networks were developed using the Python v3.8.15 language and Google’s machine learning and artificial intelligence framework TensorFlow v2.9.1 or v2.10 with the Keras library. Using Python, we were able to train our neural network and export the trained model as a Protocol Buffer (.pb) file. To load and use our model with ImageJ Fiji, we used the Google TensorFlow Java v1.15.0 library and deeplearning4j core v1.0.0-M1.1 in our software. Integrating Google TensorFlow with Java is a complex task, particularly within the context of Fiji implementation. While Java offers versatility, it has limitations compared to Python, particularly in providing user-friendly and adaptable tools for machine learning applications. Notably, the Java support for Google TensorFlow is constrained, and as of the year 2024, faced issues with deprecated documentation. Additionally, the consolidation of all components into a single Java archive (jar) file poses challenges within the Fiji environment. In an effort to simplify the integration of the Google Deep Learning Framework with Java inside Fiji, we provided a concise explanation of the TensorFlow Java implementation on https://github.com/AbedChouaib/IVEA.

Video simulation and noise control

To mimic the CTL’s lytic granules and simulate the fusion activity, we first create the vesicles as small spheres with Gaussian intensity spread with a cutoff at \(2\sigma\) using equation:

$$g\left(x,y\right)=\mu \cdot \exp \left(-\frac{\left({\left({{{\rm{x}}}}-{{{{\rm{x}}}}}_{{{{\rm{c}}}}}\right)}^{2}+{\left({{{\rm{y}}}}-{{{{\rm{y}}}}}_{{{{\rm{c}}}}}\right)}^{2}\right)}{2{\left(\sigma \right)}^{2}}\right)$$

(15)

Whereas \(\mu\) is the intensity of the spot and \(\sigma \in \left[1.1,\,3\right],\,\sigma \,{\mathbb{\in }}\,{\mathbb{R}}\) is the standard deviations controlling the spatial spread distribution.

We then added some random spatial movement for the vesicles to add motion variable. The vesicle’s fusion was more like fluorescence intensity cloud that spreads and disappears. To simulate these phenomena, we used Gaussian spread over time equation to control the temporal presence of the fusion over time, then we added one more variable for the radial spatial spreading dependent on the time variation. The overall equation expressed as:

$$h\left(x,y,t\right) \,=\,\mu \cdot \exp \left(-\frac{\Psi \left(x,y,t\right)}{{\left(2{\sigma }_{s}\right)}^{2}}\right)\cdot \exp \left(-\frac{{\left(T-t\right)}^{2}}{{\left(2\tau \right)}^{2}}\right)$$

(16)

$$\Psi \left(x,y,t\right)=\,\left(\Delta {x}^{2}+\Delta {y}^{2}\right)\cdot \exp \left(\frac{{\left(T-t\right)}^{2}}{{\left(2\tau \right)}^{2}}\right)\,{with}\,{\sigma }_{S},\tau > 0$$

(17)

Whereas \(\Psi \left(x,y,t\right)\) simulate the radial dynamic dispersion of vesicle cargo over time, \(t\) is the current frame, \(T\) is the frame where the fusion occurs, \(\tau\) is the fusion time interval, \({\sigma }_{S}\) is the fusion radial spread and \(\mu\) is the fluorescence intensity magnitude.

For the noise control analysis, twenty videos with distinct SNRs were generated using MATLAB. Initially, a baseline video with no noise was created. Artificial white noise was then added to this baseline using the built-in function imnoise(). Subsequently, the MATLAB built-in poissrnd() function was utilized to generate random photon shot noise commonly observed in microscopy. A Gaussian blur was applied to it to replicate the point spread function seen in microscopy images. Finally, the processed noise was added to the video to achieve the desired noise characteristics. The Poisson noise was modeled using the following equation:

$${I}_{{noise}}\left(x,y\right){{{\rm{:\!=}}}}{{{\rm{Poisson}}}}\left(\lambda \cdot I\left(x,y\right)\right)$$

(18)

Where \(\lambda\) is the scaling factor that controls the relative level of noise.

To explore the impact of noise across a range of conditions, the scaling factor \(\lambda\) was varied incrementally from 0.1 up to 10 times the signal. Higher values of \(\lambda\) correspond to higher noise levels (lower SNR), while lower values of \(\lambda\) reduce the noise relative to the signal (higher SNR).

Hotspot area detection algorithm

The IVEA hotspot area extraction is based on DART algorithms, which employ unsupervised learning to segment the image into different layers. Following image segmentation, the MIC algorithm is performed to address the non-uniform regional fluctuations in fluorescence intensity, which is conducted prior to foreground subtraction. MIC is an enhanced version of the simple ratio and the previous method used with DART. MIC clusters the first image into a series of layers, wherein each layer comprises a group of labeled pixels that exhibit a close range of gray values, as determined by k-mean clustering. This process can be expressed as \(I\left(x,y\right)\to \,I\left(x,y,k\right){|\; k}=\,{{\mathbb{N}}}^{5}\) (Supplementary Fig. 12). Conventional approach (DART) involves the addition of the difference in gray values of clusters between two subsequent images. In contrast, with MIC we employed a simple ratio equation for each layer, assuming that the least cluster value represents the background. In the event of uniform regional fluctuations, the number of clusters of MIC could be reduced to \(k=1\); this would yield a result similar to that of the simple ratio. MIC is expressed as:

$${I\,}_{i}^{{\prime} }\left(k,x,y\right):\!\!=\left(\left(\frac{{\mu }_{i-n}\left(k\right)}{{\mu }_{i}\left(k\right)}-1\right)\cdot \theta+1\right)\cdot {I}_{i}\left(k,x,y\right)$$

(19)

Where \(i\) is the \(i\)-th frame, \(k\) is the number of layers, \(n\) is the frame difference, and \(\theta\) is a user input parameter added to control intensity adjustment, default \(\theta=1\).

The iterative threshold consists of two distinct parts: Initially we capture two images where no events have occurred. Next, we compute \(\Delta I\) and transform it into an 8-bit image to decrease computation time by reducing the iterations to under 255 steps. Finally, we attempt to clear \(\Delta I\) repeatedly. The clearing process consists of three sequential operations: threshold, erosion, and median filtering. In the iterative process, the threshold starts at half the mean intensity of \(\Delta I\), then we perform erosion with kernel \({K}_{e}\,\left[n,n\right]\) to eliminate lone pixels \(\Delta I=\Delta I\ominus {{{{\rm{K}}}}}_{{{{\rm{e}}}}}\) \({as}\) \(n=3\). After erosion, a median filter with a user-defined radius or a preset default value is applied. The average mean gray value of the processed image is calculated and checked to see if it is equal to zero. If not, iteratively the threshold increments by one gray step until we reach an average mean value of zero. The outcome of this process delivers the first iterative threshold decision \({v}_{1}\). The second threshold decision \({v}_{2}\) is performed for the remaining images, where this threshold is determined to correct the first threshold. The second threshold is like the previous process, except that a specific area of the segmented background is selected from each image (Fig. 5b). The final threshold decision \({v}_{i}\) is determined by \({v}_{i}={v}_{2}\cdot \alpha\) where \(\alpha\) is the threshold sensitivity, if \(\alpha\) was set as zero. The software takes two more frames to learn the sensitivity; it assumes no events had occurred and tries to correct \({v}_{i}\) by tuning \(\alpha\) automatically. This step adjusts the difference between iterating over the entire image and iterating exclusively over the image’s background. Regions surpassing the global threshold \({v}_{i}\) are considered as detected occurrences. Subsequently, each contiguous region is isolated and assigned a distinctive label designating it as an event. The fluorescence intensity of each event is spatially and temporally tracked immediately after detection. The mean intensity of each event \({\mu }_{e}\left(t\right)\) is temporally measured over a fixed area, then we determine the mid-intensity \({\mu }_{{\mbox{mid}}}=\frac{1}{2}\left({\mu }_{e}\left({t}_{\min }\,\right)+\,{\mu }_{e}\left({t}_{\max }\,\right)\right)\). When the fluorescence intensity of an event falls below \({\mu }_{{\mbox{mid}}}\,\), the event signal is considered to have disappeared and the tracking stops (Fig. 5c,d).

Mice for T Cell, chromaffin cell and DRG neuron culture

WT mice with C57BL/6 N background used in this study were purchased from Charles River. Synaptobrevin2-mRFP knock-in (KI) mice were generated as described in Matti et al.14. Granzyme B-mTFP KI mice with C57BL/6 J background were generated as described previously57. Granzyme B-tdTomato KI mice58 were purchased from the Transgenesis and Archiving of Animal Models (TAAM) (National Centre of Scientific Research (CNRS), Orleans, France). Mice were housed in individually ventilated cages under specific pathogen-free conditions in a 12 h light-dark cycle with constant access to food and water. All experimental procedures were approved and performed according to the regulations by the state of Saarland (Landesamt für Verbraucherschutz, AZ.: 2.4.1.1).

Murine CD8 + T cells

Culture

Splenocytes were isolated from 8–20 week-old mice of either sex as described before22. Briefly, naive CD8 T cells were positively isolated from splenocytes using Dynabeads FlowComp Mouse CD8+ kit (Thermo Fisher Scientific, Cat# 11462D) as described by the manufacturer. The isolated naive CD8 + T cells were stimulated with anti-CD3ε /anti-CD28 activator beads (1:0.8 ratio, Thermo Fisher Scientific, Cat# 11453D) and cultured for 5 days at 37 °C with 5% CO2. Cells were cultured at a density of 1 × 106 cells/ml in 12 well plates with AIMV medium (Thermo Fisher Scientific, Cat# 12055083) containing 10% FCS (Thermo Fisher Scientific, Cat# A5256901), 50 U/ml penicillin, 50 μg/ml streptomycin (Thermo Fisher Scientific, Cat# 15140163), 30 U/ml recombinant IL-2 (Thermo Fisher Scientific, Cat# 212-12-100 µg) and 50 μM 2-mercaptoethanol (Sigma, Cat# M6250). Beads and IL-2 were removed from T cell culture 1 day before experiments.

Transfection and constructs

Day 4 effector T cells were transfected 12 h prior to the experiment through electroporation of the Plasmid DNA (Granzyme B-pHuji, Synaptobrevin2-pHuji, CD63-pHuji) using Nucleofector™ 2b Device (Lonza) and the nucleofection kit for primary mouse T cells (Lonza, Cat# VPA-1006), according to the manufacturer’s protocol (Lonza). After nucleofection, cells were maintained in a recovery medium as described by Alawar et al.59. 4 h prior to the experiment the cells were washed with AIMV medium. The pMax_granzyme B-pHuji construct was generated by replacing the mTFP at the C-terminus of pMax-granzyme-mTFP57 with pHuji using a forward primer that included an AgeI restriction site 5′-ATG TAT ATC CAC CGG TCG CCA CCA TGG TGA GCA AGG GCG AGG AG-3′ and a reverse primer that included a NheI restriction site 5′-ATG TAT AGC TAG CTT ACT TGT ACA GCT C-3′. The size of this plasmid was 4.315 kb. The pmax-CD63-pHuji was generated by subcloning from pCMV-CD63-pHuji60, which was a generous gift from Frederik Verweij (Centre de Psychiatrie et neurosciences, Amsterdam/Paris), into pMax with the restriction sites EcoRI and XbaI. Its size was 4.282 kb. Synaptobrevin2-pHuji plasmid was generated as described in ref. 61.

Acquisition conditions

Measurement of exocytosis was performed via TIRFM as follows. We used day 5 bead activated CTLs isolated from GzmB-mTFP KI, GzmB-tdTomato KI, Synaptobrevin2-mRFP KI or WT mice. The latter were transfected with the above descripted constructs. 3 × 105 cells were resuspended in 30 μl of extracellular buffer (10 mM glucose, 5 mM HEPES, 155 mM NaCl, 4.5 mM KCl, and 2 mM MgCl2) and allowed to settle for 1–2 min on anti-CD3ε antibody (30 μg/ml, BD Pharmingen, clone 145-2C11) coated coverslips to allow immunological synapse formation triggering lytic granule exocytosis. Cells were then perfused with extracellular buffer containing calcium (10 mM glucose, 5 mM HEPES, 140 mM NaCl, 4.5 mM KCl, 2 mM MgCl2 and 10 mM CaCl2) to stimulate CG secretion. Cells were recorded for 10 min at 20 ± 2 °C.

Imaging

Live cell imaging was done with two setups. The experiments performed with CTL (lytic granule staining with synaptobrevin-mRFP, granzyme B-mTFP or granzyme B-tdTomato) were performed with setup # 1 described previously22,26,45. Briefly, an Olympus IX70 microscope (Olympus, Hamburg, Germany) was equipped with a 100x/1.45 NA Plan Apochromat Olympus objective (Olympus, Hamburg, Germany), a TILL-total internal reflection fluorescence (TILL-TIRF) condenser (TILL Photonics, Kaufbeuren, Germany), and a QuantEM 512SC camera (Photometrics, Tucson, AZ, USA) or Prime 95 B scientific CMOS camera (Teledyne Photometrics, Tucson, AZ, USA). The final pixel size was 160 nm and 110 nm, respectively. A multi-band argon laser (Spectra-Physics, Stahnsdorf, Germany) emitting at 488 nm was used to excite mTFP fluorescence, and a solid-state laser 85 YCA emitting at 561 nm (Melles Griot Laser Group, Carlsbad, CA, USA) was used to excite mRFP and tdTomato. The setup was controlled by Visiview software (Version:4.0.0.11, Visitron GmbH). The acquisition frequency was 10 Hz for all experiments.

The setup # 2 used to acquire CTL secretion, in which the lytic granules were labeled by Synaptobrevin-pHuji, granzyme B-pHuji and CD63-pHuji overexpression, was previously described57,61. Briefly the setup from Visitron Systems GmbH (Puchheim, Germany) was based on an IX83 (Olympus) equipped with the Olympus autofocus module, a UAPON100XOTIRF NA 1.49 objective (Olympus), a 445 nm laser (100 mW), a 488 nm laser (100 mW) and a solid-state 561 nm laser (100 mW, Melles Griot Laser Group, Carlsbad, CA, USA). The TIRFM angle was controlled by the iLAS2 illumination control system (Roper Scientific SAS, France). Images were acquired with a QuantEM 512SC camera (Photometrics, Tucson, AZ, USA) or Prime 95 B scientific CMOS camera (Teledyne Photometrics, Tucson, AZ, USA). The final pixel size was 160 nm and 110 nm, respectively. The setup was controlled by Visiview software (Version 4.0.0.11, Visitron GmbH). The acquisition frequency was 5 or 10 Hz, and the acquisition time was 10 to 15 min.

Murine DRG neurons

Culture and transfection

The training of the Stationary burst event neural network and the automatic detection of neuronal exocytosis at synapse was performed on data sets that were previously published45,46. Shortly DRG neuron cultures from young adult (1–4 weeks old) WT of either sex was made as previously described26. Lentivirus infection to transfect with SypHy was performed on DIV1. The following day, the lentivirus was removed by washing before adding the second order spinal cord (SC) interneurons (SC neurons) to the culture to allow DRG neurons to form synapses. SC neurons were prepared from WT P0-P2 pups of either sex using as previously described45. DRG/SC co-culture was maintained in Neurobasal A (NBA) medium (Cat#° 21103049) supplemented with fetal calf serum (5% v/v, Cat# 11550356), penicillin and streptomycin (0.2% each, Cat# 11548876), B27 supplement (2%, Cat# 17504-044), GlutaMAX (1%, Cat# 35050-061, all products from Thermo Fisher Scientific, Waltham, MA, USA), and human beta-nerve growth factor (0.2 µg/mL, Cat# N245, Alomone Labs, Jerusalem, Israel) at 37 °C and 5% CO2.

Acquisition conditions

Secretion was evoked by electrical stimulation via a bipolar platinum-iridium field electrode (Cat# PI2ST30.5B10, MicroProbes, Gaithersburg, MD, USA) and a pulse stimulator (Isolated Pulse Stimulator Model 2100, A-M Systems, Sequim, WA, USA). The measurement protocol was 30 s without stimulus followed by a biphasic 1 ms long 4 V stimulus train at 10 Hz for 30 s to elicit exocytosis of SVs. At the end of the measurement, NH4Cl was applied to visualize the entire SV pool. During the measurement, the temperature was maintained at 32 °C by a perfusion system with an inline solution heater (Warner Instruments, Holliston, MA, USA). The extracellular solution contained 147 mM NaCl, 2.4 mM KCl, 2.5 mM CaCl2, 1.2 mM MgCl2, 10 mM HEPES, and 10 mM glucose (pH 7.4; 300 mOsm). The NH4Cl solution had the same composition as the extracellular solution, but the NaCl was replaced with 40 mM NH4Cl. All products were from Sigma-Aldrich/Merck.

Imaging

All experiments were performed on Setup # 1 described above for the CTLs.

Chromaffin cells

Data showing bovine chromaffin cells exocytosis was from Becherer et al.62 and Hugo et al.14. Culture condition was described by Ashery et al.63. Briefly, chromaffin cells were dissociated from the bovine adrenal gland by enzymatic dissociation (20 min) with 129.5 units per ml collagenase (Cat# C1-22, Biochrom AG, Berlin Germany). They were maintained for 3–5 days in culture in DMEM (Cat# 31966021) containing ITS-X (1:100 dilution, Cat# 51500056), Penicillin/Streptavidin (1:250, Cat# 15070063) all products from Thermo Fisher Scientific, Waltham, MA, USA. They were electroporated with NPY-mRFP to label the large dense core granules using the Gene Pulser II (Biorad, Hercules, Ca, USA, at 230 V 1mF) or the Neon™ transfection system (Invitrogen, Karlsruhe, Germany, using one pulse at 1100 V for 30 ms). Cells were patch-clamped in whole cell recording modus using an EPC-9 patch-clamp amplifier controlled by the PULSE software (Heka Elektronik, Lambrecht, Germany). The extracellular solution contained (in mM): 146 NaCl, 2.4 KCl, 10 HEPES, 1.2 MgCl2, 2.5 CaCl2, 10 glucose and 10 NaHCO3 (pH 7.4, 310 mOsm). Secretion was induced through either depolarization trains62 or perfusion of the cells with 6 μM Ca2+ containing solution via the patch-clamp pipette14. The intracellular solution contained (in mM) either (experiment from ref. 14) 160 Cs-aspartic acid, 10 HEPES, 1 MgCl2, 2 Mg-ATP, 0.3 Na2-GTP (pH 7.2, 300 mOsm) or (experiment from ref. 62) 110 Cs-glutamate, 10 HEPES, 2 Mg-ATP, 0.3 Na2-GTP, 5 CaCl2, 9 HEDTA (pH 7.2, 300 mOsm). All products for the solutions were from Sigma-Aldrich/Merck. The acquisition rate was 10 Hz and the exposure time was 100 ms. The camera was either a Micromax 512BFT camera (Princeton Instruments Inc., Trenton, NJ, USA) with 100 × /1.45 NA Plan Apochromat Olympus objective62, or a QuantEM 512SC camera (Photometrics, Tucson, AZ, USA) with an 100 × /1.45 NA Fluar (Zeiss) objective14, giving a final pixel size of 130 or 160 nm2 respectively.

INS-1 cells

Culture and transfection

Rat insulinoma cells64 (INS-1 cells, clone 832/13 provided by Hendrik Mulder, Lund University) were maintained in RPMI 1640 (Invitrogen, Cat#21870076) containing 10 mM glucose and supplemented with 10% fetal bovine serum(Sigma-Aldrich, Cat# F7524), streptomycin (100 µg/ml) and penicillin (100 µg/ml, Biowest, Cat# L0022), Na-pyruvate (1 mM, Gibco, Cat# 11360-070) L-glutamine (2 mM, Biowest, Cat# X0550), HEPES (10 mM, Gibco, Cat# 15630-080) and 2-mercaptoethanol (50 µM, Gibco, Cat# 31350-010). The cells were plated on polylysine-coated coverslips (Sigma-Aldrich, Cat# P5899 and Marienfeld, Cat# 112620), transfected using lipofectamine 2000 (Invitrogen, Cat#11668-019) with a ratio of 0.1 µg DNA:1 µl lipofectamin, and imaged 24-42 h later.

Acquisition conditions

The bath solution contained (in mM) 138 NaCl, 5.6 KCl, 1.2 MgCl2, 2.6 CaCl2, 10 D-glucose, 0.2 diazoxide (Sigma, Cat# D9035), 0.2 forskolin (Merk, Cat# 93049), and 10 HEPES (Sigma, Cat#H4034-1KG), pH 7.4 adjusted with NaOH. Individual cells were stimulated by computer-controlled air pressure ejection of a solution containing elevated K+ (75 mM replacing Na+) through a pulled glass pipette (similar to patch clamp electrode, Hilgenber, Cat# 1003027) that was placed near the recorded cell. The bath solution temperature was kept at 35 °C using a FCS13-A electronic heater (Shinho, Cat# FCS11E7 2002.07).

Imaging

INS1 cells (clone 832/13) that transiently expressed NPY-mGFP, NPY-mNeonGreen or NPY-mCherry were imaged using a custom-built lens-type total internal reflection (TIRF) microscopes based on AxioObserver D1 microscope with an x100/1.46 objective (Carl Zeiss, Cat# 420792-9800-720). Excitation was from a diode laser module at 473 nm, or a diode pumped laser at 561 nm, respectively (Cobolt, Göteborg, Sweden, Cat# 0473-06-01-0300-100 & Cat# 0561-06-91-0100-100), controlled by an acoustic-optical tunable filter (AOTF, AA-Opto, France, Cat#AOTFnC-400 650-TN). Light passed through a dichroic Di01-R488/561 (Semrock), and emission light was separated onto the two halves of a sCMOS camera (Prime 95B, Photometrics, Tucson, AZ, USA, Cat# 01-PRIME-95B-R-M-16-C) using an image splitter (Dual view, Photometrics) with a cutoff at 565 nm (565dcxr, Chroma) and emission filters (FF01-523/610, Semrock; and ET525/50 m and 600EFLP, both from Chroma). Scaling was 110 nm per pixel (sCMOS camera). The acquisition rate for NPY-mNeonGreen was 50 Hz and 10 Hz for NPY-mCherry. NPY-eGFP expressing INS1 cells were imaged using a TIRF microscope that was based on an AxioObserver Z1 (Zeiss) with a diode pumped laser at 491 nm (Cobolt, Stockholm, Sweden, Cat# DC-4915615050-300) that passed through a cleanup filter and dicroic filter set (zet405/488/561/640x, Chroma). Imaging was done with a 16-bit EMCCD camera (QuantEM 512SC, Roper) with a final scale of 160 nm per pixel. The acquisition rate was 10 Hz. Image acquisition was conducted with MetaMorph (V7.8.0.0, Molecuar Devices).

Human CD8 + T lymphocytes

Cells

Human CD8 + T cell clones were used as cellular model. Human T cell clones were isolated and maintained as previously described65. Briefly, cells were cultured in RPMI 1640 medium GlutaMAX (Gibco, Cat# 61870036) supplemented with 5% heat inactivated human AB serum (Institut de Biotechnologies Jacques Boy, Cat# 201021334), 50 μM 2-mercaptoethanol (Gibco, Cat# 31350010), 10 mM HEPES (Gibco, Cat# 15630122), 1× MEM-Non-Essential Amino Acids (MEM-NEAA) (Gibco, Cat# 11140035), 1× sodium pyruvate (Sigma-Aldrich, Cat# S8636), ciprofloxacin (10 μg/ml, Sigma-Aldrich Cat# 17850), human recombinant interleukin-2 (rIL-2; 100 IU/ml, Miltenyi Biotec Cat# 130-097-748), and human rIL-15 (50 ng/ml, Miltenyi Biotec, Cat# 130-095-766). Blood samples were collected and processed following standard ethical procedures after obtaining written informed consent from each donor and approval by the French Ministry of the Research as described (Cortacero et al. 2023, authorization no. DC-2021-4673).

Acquisition conditions

Human CTLs were stained for 30 min with Lysotracker red (DND-99) dye (2 µM, Invitrogen Cat# L7528) at 37 °C/5% CO2. The cells were washed 3 times with RPMI 1640 medium (1X) w/o pH Red (Gibco, Cat# 11835063) supplemented with 10 mM GlutaMAX (Gibco, Cat# 35050061) and 10 mM of HEPES (Gibco, Cat# 15630122). To induce immunological synapse formation followed by lytic granule exocytosis µ-Slide 15 Well 3D glass bottom slides (Ibidi, Biovalley Cat# 81507) were coated with poly-D-lysine (1:10, Sigma-Aldrich Cat# P6407), human monoclonal anti-CD3 antibody (TR66) (5 µg/mL or 10 µg/mL, Enzo Life Sciences Cat# ALX-804-822) and recombinant human ICAM-1/CD54 Fc Chimera Protein (5 µg/mL or 10 µg/mL, R&D Systems Cat# 720-IC) at 4 °C overnight. The chambered slides were washed 3 times with PBS 1X (Sigma-Aldrich Cat# D8537) and mounted on a heated stage within a temperature-controlled chamber maintained at 37 °C and constant 5% CO2. For each recording, 3 × 104 to 5 × 104 cells were seeded on the chambered slides. During acquisition, the cells were in RPMI 1640 medium (1X) w/o pH Red supplemented with 10 mM GlutaMAX, 10 m HEPES and 5% Fetal Bovine Serum (FBS, Gibco, Cat# A5256701).

Imaging

The TIRFM set up acquisition was based on an Eclipse Ti2-E inverted microscope (Nikon Instruments) equipped with a 100 × /1.45 NA Plan Apochromat LBDA objective (Nikon Instruments) and an iLAS 2 illumination control system (Roper Scientific SAS). A diode laser at 561 nm (150 mW) (Coherent) band-passed using a ZET405/488/561/647x filter (Chroma Technology) was used for excitation. The emissions were separated using a ZT405/488/561/647rpc-UF1 dichroic mirror (Chroma Technology) and optically filtered using ZET405/488/561/647 m filter (Chroma Technology). Images were recorded on a Prime 95B Scientific CMOS Camera (Teledyne Photometrics, Tucson, AZ, USA). The final pixel size was 110 nm. Image acquisition was controlled using MetaMorph Software (Version 7.10.5.476, Molecular Devices) and Modular V2.0 GATACA software. The acquisition frequency was 9 Hz for a duration of 20 to 30 min.

Dopaminergic neurons

Data showing dopaminergic neuron exocytosis monitored by AndromeDA nanosensor paint technology were from Elizarova et al.39. Briefly, ventral midbrain neurons were dissected from postnatal day 0 C57BL/6 mice and enzymatically dissociated using papain (Worthington, Cat# 9001-73-4). Cells were plated on glass coverslips pre-coated with poly-L-lysine (Sigma-Aldrich, Cat# P4707-50ML) and maintained in Neurobasal-A medium (Gibco, Cat# 11540366) supplemented with B-27 (Gibco, Cat# 17504-044), GlutaMAX (Gibco, Cat# 35050-038), and penicillin-streptomycin (Gibco, Cat# 15140-130). Neurons were cultured at 37 °C in a humidified 5% CO₂ atmosphere and imaged between DIV 21 and DIV 42. The imaging setup included a 100× oil-immersion objective (UPLSAPO100XS, Olympus) and a Xenics Cheetah-640-TE1 InGaAs camera (Xenics), yielding a final pixel size of 150 nm. Imaging was performed at 15 Hz.

Cardiomyocytes from mouse

Data showing Fluo4 measured calcium sparks in mouse cardiomyocytes were from Tian and Lipp44. Mouse ventricular myocytes were isolated as previously described with full details66. All the procedures concerning animal handling conformed to the guidelines from Directive 2010/63/EU of European Parliament. After isolatation, the cardiomyocytes were rinsed in equilibrium solution (in mM, 140 KCl, 0.75 MgCl2, 0.2 EGTA, 10 HEPES, pH 7.20; from Sigma-Aldrich/Merck) for 2 ~3 min to let the cells settle. The cells were then rinsed with saponin (ChemCruz, Cat# sc-280079, 15 pg/ml dissolved in equilibrium solution) for 40 s. After that the solution was completely removed and exchanged with artificial internal solution (in mM, 100 K-Aspartate, 15 KCl, 5 KH2PO4, 0.5 EGTA, 10 HEPES, 10 phosphocreatine (CalBioChem, Cat# 2380), 8% ~40,000 MW dextran (Sigma, Cat# 31389), 5 MgATP, 5 U/mL creatine phosphokinase (CalBioChem, Cat# 2380), 10 μM Fluo-4 (Invitrogen, Cat# F14200)). pH was set to 7.2 and the free Ca2+ was calibrated to 100 nM. For more details, please refer to67. The data was recorded with 2D-array scanning confocal microscope (Infinity4, Visitech, Sunderland, UK) equipped with a NIKON 60x oil immersion objective (NA = 1.40) and a sCMOS Flash4 V2 camera from Hamamatsu (Hamamatsu Photonics Deutschland GmbH, Herrsching am Ammersee, Germany). In the camera settings, two by two binning (0.217 × 0.217 μm/pixel) was used when the imaging was done with the imaging software VolCell (v8.03.0.3, Visitech, Sunderland, UK). Suitable area of the camera chip was selected such that a final speed of 124 frames per second was achieved.

Statistical analysis and used programs and algorithms

All statistical analyses were performed with SigmaPlot (V14.5.0.101, Systat Software, Inc.). P-values were calculated with two-tailed statistical tests and 95% confidence intervals. ANOVA, ANOVA on ranks and Student’s t-test were used as required. Data analysis and processing was performed with MATLAB (Mathworks 2024b) and Excel 2021 (Microsoft (V2108)).

IVEA was developed in Java 1.8.0_322 using Eclipse and the following Java libraries: ij (V1.54c), opencsv, bio-formats_plugins, loci_plugins, deeplearning4j core v1.0.0-M1.1, Google TensorFlow v1.15.0, libtensorflow_jni v1.15.0.

IVEA for training was done in Python v3.8.15 language using with the following libraries: Google TensorFlow v2.9.1 or v2.10, Keras, Numpy, Scikit-image, Tkinter, shutil, pandas, h5py, and read_roi.

IVEA training platform was coded with visual studio code V1.100.2

We used deep learning long-short term memory network, vision transformer network, convolution neural network, k-means clustering, iterative thresholding, Gaussian non-maximum suppression and multilayer intensity correction algorithm for software development.

Imaging data were analyzed by the human expert with Fiji V1.54p. The results were compared to ExoJ (V1.09), pHusion and SynActJ V0.3 software.

Figures were prepared with CorelDraw V23.5.0.506 and Adobe Illustrator V29.5.1.

Ethics statement

Mice were treated according to the regulations of the local authorities, the state of Saarland (Landesamt für Verbraucherschutz) under the license AZ.: 2.4.1.1 or the Niedersächsisches Landesamt für Verbraucherschutz und Lebensmittelsicherheit (LAVES, permit numbers 33.19-42502-04-19/3254, 33.19.42502-04-15/1817 and 33.19-42502-04-18/2756). Animals were housed according to European Union Directive 63/2010/EU and ETS 123 at 21 + /- 1 °C, 55% relative humidity, under a 12 h/12 h light/dark cycle, and received food and tap water ad libitum. Human blood samples were collected and processed following standard ethical procedures after obtaining written informed consent from each donor and approval by the French Ministry of the Research (authorization no. DC-2021-4673).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.



Source link