A deep learning-enriched framework for analyzing brain functional connectivity

Decoding and analysis of functional connectivity: main concepts and notations

In this study, we developed a framework for decoding and analyzing the brain functional connectivity estimated from EEG recordings in a data-driven way, by exploiting the knowledge learned by a learning system (a CNN in this study). In the following, we present the main concepts and useful notations.

Let us denote the participant-specific dataset composed of pre-processed EEG signals by:

$${{D}_{EEG}}^{\left(p\right)}=\{\left({X}_{0},\:{y}_{0}\right),\dots\:,\left({X}_{i},\:{y}_{i}\right),\dots\:,\left({X}_{M-1},\:{y}_{M-1}\right)\},$$

(1)

where $M$ indicates the number of EEG examples (e.g., EEG epochs in a trial-based EEG recording), assumed here to be the same for each participant. ${X}_{i}{\in \mathbb{R}}^{R\times T}$ represents the EEG multi-variate activity recorded in the $i$-th example from $R$ brain regions at $T$ time samples, and ${y}_{i}$ is the associated label among a set $L$ of brain states of interest (e.g., right-hand motor imagery in a motor imagery task). The EEG multi-variate time series can be represented in the spatial domain at the scalp-level – corresponding to signals at EEG sensors – or at the cortex-level – corresponding to signals obtained by back-projecting the EEG activity on the cortical surface. The latter is enabled via cortical source reconstruction¹², a processing step devoted to transform sensor-space signals (scalp signals) into source-space signals (cortical signals), generally aggregated within cortical regions of interest (ROIs). Therefore, depending on the level of the EEG spatial representation, the brain regions characterized in the multi-variate neural activity either are located on the scalp or on the cortical surface.

Starting from pre-processed EEG data, the brain functional connectivity can be estimated, for example by computing directional causal influences between brain regions (at the scalp-level or cortex-level) via Granger causality¹³. Granger causality, as other measures of connectivity, can be formulated in the spectral domain (spectral functional connectivity), quantifying the interactions between regions separately for $F$ different frequency bins, i.e., the connectivity information is provided as a function of frequency. Therefore, for each example ${X}_{i}{\in \mathbb{R}}^{R\times T}$, containing the multi-variate neural activity, the spectral functional connectivity estimate ${A}_{i}{\in \mathbb{R}}^{R\times R\times F}$ is derived. This matrix quantifies the interaction from the $j$-th to the $k$-th brain regions (at the scalp-level or cortex-level) at the $f$-th frequency bin, having denoted with $j$, $k$, $f$ the indices of the matrix ${A}_{i}$ (i.e., ${A}_{i}\left[j,k,f\right]$). By linking each connectivity estimate to the associated brain state (as in Eq. 1), the participant-specific dataset composed of functional connectivity estimates can be represented by:

$${{D}_{FC}}^{\left(p\right)}=\{\left({A}_{0}, {y}_{0}\right),\dots ,\left({A}_{i}, {y}_{i}\right),\dots ,\left({A}_{M-1}, {y}_{M-1}\right)\},$$

(2)

A CNN-based learning system can be trained to realize a classifier $g$ aimed at discriminating between the brain states of interest, starting from the functional connectivity estimates as input. Thus, the CNN describes the non-linear function $g$, mapping the brain interactions – contained in ${A}_{i}$–to a brain state:

$$g\left({A}_{i};\theta \right):{\mathbb{R}}^{R\times R\times F}\to L,$$

(3)

where $g$ is parametrized in the parameter array $\theta$ (whose values are learned during training). By doing so, the system automatically learns the most relevant features about the brain interactions for correctly discriminating the output brain states. Crucially, the knowledge learned by the learning system is encoded in the parameter values $\theta$, and can be leveraged for designing a novel analysis framework for studying the brain functional connectivity in a data-driven way. At a high level, this can be achieved by performing the following analyses.

i.

Analysis of the output of intermediate interpretable layers. In case the network architecture includes intermediate interpretable layers, their output can be easily interpreted and analyzed, when processing the input ${A}_{i}$. As an example, in case the network is forced at extracting the connectivity inflow or outflow that maximize the between-class discrimination in an intermediate layer, this layer output directly provides connectivity measures that can be analyzed.
ii.

Explanation of the network decision. The network can be combined with an explanation technique to identify, while processing the input ${A}_{i}$, the most relevant samples in the feature maps of a target layer that, based on the learned knowledge $\theta$, drive the decision of the network towards a specific output brain state. In this case, it is crucial that the chosen target layer works on representations living in an interpretable domain. The target layer could be the input layer, which works on the spatial and frequency domains in case of spectral functional connectivity estimates, or a hidden interpretable layer, which forces interpretability in a given domain. As an example, with this procedure it is possible to analyze the frequency components that are maximally relevant for discriminating a brain state under analysis (e.g., the right-hand motor imagery in case of a motor imagery task).

The proposed deep learning-enriched analysis framework encapsulates all these aspects in a unique and comprehensive tool, that could be conveniently used by neuroscientists for analyzing functional connectivity estimates in a data-driven way. In the following, we present the core methodology underlying our framework, and we illustrate the experiments conducted on real data using the proposed framework.

Deep learning-enriched analysis of functional connectivity

Here we describe the proposed deep learning-enriched framework for analyzing brain interactions. At first, we describe the developed neural network for decoding brain functional connectivity. Then, we delineate how to exploit the network knowledge for extracting and analyzing connectivity-related measures, by explaining in detail the data-driven analyses sketched in Sect. “Decoding and analysis of functional connectivity: main concepts and notations” (see points i. and ii).

Network architecture and training

We designed a compact convolutional neural network – termed FCNet (‘Functional-Connectivity-Net’) – for decoding functional connectivity estimates derived from EEG signals. FCNet processes the functional connectivity matrix ${A}_{i}\in {\mathbb{R}}^{R\times R\times F}$ by first extracting features in the frequency domain (spectral block), and then in the spatial domain (spatial block). The processing operated in the spatial block is interpretable by design, and it is characterized by the extraction of measures of connectivity inflow and outflow. Finally, the last classification block finalizes the decoding problem. Specifically, in our experiments FCNet was applied to a binary classification task consisting of the discrimination between right-hand vs. left-hand motor imagery (see Sect. “Experiments”). In Fig. 1 we provide a high-level scheme of the processing operated by the fundamental blocks of FCNet. The main FCNet blocks are described in the following, and a complete overview of the network structure can be found in Table 1.

i.

Spectral block. The spectral block aims at searching features that summarize the causal spectra in the frequency domain. First, a 3D convolutional layer is applied, learning $K=32$ kernels with size of $W=(1,1,9),$ using a unitary stride. Feature maps are pooled (average pooling) in the frequency domain, to reduce the computational cost, utilizing a pooling size and stride of $W=S=(1,1,4)$, and neurons of the pooled feature maps are dropped out during training (dropout probability $p=0.1$). Then, a second 3D convolutional layer is applied, continuing the learning of spectral features. This layer learned $K=32$ kernels too, with size of $W=(1,1,9)$. To keep limited the number of trainable parameters, separable convolution (i.e., depthwise convolution followed by pointwise convolution) is implemented, utilizing a depth multiplier $D=1$. Feature maps are average pooled in the frequency domain, utilizing a pooling size and stride of $W=S=(1,1,4)$, and neurons of the pooled feature maps are dropped out during training (dropout probability $p=0.1$). This block, with the chosen hyperparameters, provides as output 32 feature maps with shape $(R,R)$, each summarizing the spectral functional connectivity presented as input to the network, across all frequency bins.
ii.

Spatial block. The spatial block aims at learning connectivity inflow- and outflow-related features from the high-level connectivity feature maps provided by the spectral block. To this aim, this block exploits a parallel dual-branched structure (connectivity inflow branch and connectivity outflow branch). Here, each branch operates in parallel on the input connectivity feature maps by separately learning inflow and outflow features. Specifically, the connectivity inflow branch includes a depthwise 2D convolutional layer (depthwise multiplier of $D=1$) – termed inflow convolutional layer – learning 32 kernels with size $W=\left(R,1\right)$. This way, the network learned 32 features (one per feature map of the spectral block, as $D=1$) aimed at computing the connectivity inflow associated to each brain region from the connectivity feature maps provided by the spectral block. Therefore, the output of the inflow convolutional layer provides a quantification (in 32 variants) of the brain connectivity inflow at each brain region; remarkably, the mathematical formulation of these inflow measures – parametrized in the parameters of the inflow convolutional kernels – is automatically learned by the network from the input data. Then, the network learns how to optimally recombine together the 32 brain connectivity inflows by performing a pointwise 2D convolutional layer, learning 32 kernels with size $W=(1,1)$. The combination of these depthwise and pointwise convolutions realizes a separable convolutional layer. Finally, by exploiting an average pooling layer (pooling size and stride of $W=S=(1,R)$), the recombined inflow measures are pooled together across brain regions, obtaining a single scalar value for each feature map. As concerning the connectivity outflow branch, the same processing as in the inflow branch is performed (same sequence of layers) but operating in space along the other dimension, that is, using convolutional kernels with size $W=(1,R)$ in the depthwise 2D convolutional layer, and pooling feature maps using a pooling size and stride of $W=S=(R,1)$. Similar to the connectivity inflow branch, in the connectivity outflow branch the output of the depthwise 2D convolutional layer – termed outflow convolutional layer – provides a quantification (in 32 variants) of the brain connectivity outflow at each brain region. Overall, each parallel branch (connectivity inflow branch and connectivity outflow branch) outputs a feature array with 32 elements, summarizing the inflow and outflow measures learned by the network across all the brain regions.
iii.

Classification block. This block finalizes the decoding problem starting from the two feature arrays provided by the dual-branched spatial block. The feature arrays are concatenated together into a single feature array with 64 elements, and then provided as input to a fully-connected layer with $N$ units, one per brain state to classify. As the experiments were conducted on a right-hand vs. left-hand motor imagery classification task (2 motor brain states, see Sect. “Experiments”), the fully-connected layer includes $N=2$ units, activated via softmax function. Therefore, the network output produces the conditional probability $p\left({l}_{c}|{A}_{i}\right),$ $\forall {l}_{c}\in L=\left\{\text{right-hand}, \text{left-hand}\right\}$, where $c$ is the index of a specific output class (i.e., output brain state).

For spectral and spatial blocks, after each convolutional layer a batch normalization layer is included, and the resulting feature maps are then activated via rectified linear activations (ReLU functions), introducing non-linearity in the network.

Table 1 FCNet architecture. Each layer is provided with its name, main hyper-parameters, number of parameters to fit and output shape. Where not specified, a unitary stride ($S$) and no padding ($P$) were applied. The output shape is reported as a function of the number of brain regions $R$, that varied in our experiments: $R=22$ in scalp-level experiments and $R=24$ in cortex-level experiments. The total number of parameters to fit was 5.4k and 5.6k, respectively for the conducted scalp-level and cortex-level experiments.

The network trainable parameters – 5.4k and 5.6k, respectively in the scalp-level and cortex-level experiments (see Sect. “Experiments”) – were trained for 500 epochs using the cross-entropy as loss function and Adam as optimizer (learning rate of 5·1e-4, and mini-batch size of 32). The network hyperparameters (e.g., the number and size of the convolutional kernels in the spectral block) were determined via preliminary empirical hyperparameter evaluations. A within-subject training strategy was employed, training the network separately for each participant. We adopted this choice as we were interested in designing an algorithm able to highlight participant-specific connectivity signatures, by leveraging on the participant-specific neural decoders’ knowledge. We designed the training strategy in this way to open important application scenarios of our analysis framework, such as the prospective application to individual patients, to better support the analysis of the neuropathology and guide the personalization of treatments in patients. To assess the decoding performance, the dataset of the $p$-th participant ${{D}_{FC}}^{\left(p\right)}$ was partitioned employing a 10-fold cross-validation scheme. Moreover, from the training set we sampled 20% of examples form the validation set, for performing early stopping, arresting the learning at the training epoch with the highest validation accuracy. Therefore, the remaining 80% of examples were effectively used to train the network. The confusion matrix and accuracy served as performance metrics. For each participant and each cross-validation fold, these metrics were computed on the test set, and then averaged across folds.

Analysis of the output of intermediate interpretable layers

The design of FCNet includes two interpretable layers: the inflow convolutional layer and the outflow convolutional layer. By construction these layers operate by automatically learning, via convolutional operators, 32 connectivity inflow measures and 32 connectivity outflow measures, respectively. Overall, the performed computation is non-linear with respect to the input spectral connectivity, encompassing the spectral block and the inflow/outflow convolutional layer. Moreover, the computation exploits the knowledge learned by the interpretable CNN during the training process, such that it optimally combines the spectral information (frequency domain: spectral block) and it optimally weights the brain interactions (spatial domain: inflow/outflow convolutional layer) in order to maximally separate the output brain states. To process these network-based measures, we performed the following steps.

For each participant and each cross-validation fold (that is, for each trained network), we performed a forward pass through the network using the test examples ${A}_{i}$ as input (i.e., $\forall\:i|{A}_{i}\in \text{test\:set}$). We extracted the 32 inflow measures and 32 outflow measures – each ${\in \mathbb{R}}^{R}$ (i.e., quantifying the inflow/outflow at each brain region) – at the output of the inflow convolutional layer and outflow convolutional layer, respectively. That is, we exploited the neural network up to the interpretable layers as a tool for extracting representations useful to the analysis of brain connectivity. For each trained network, each of the 32 inflow measures was averaged across the test examples associated to the same brain state classified by the network (i.e., one of the two motor imagery conditions, see Sect. “EEG pre-processing and processing”). The same was done for the outflow measures. After this procedure, for each trained network (i.e., for each participant and each cross-validation fold), 32 average inflow measures and 32 average outflow measures were obtained for each brain state (i.e., for each motor imagery condition). It is worth noticing that these resulting measures came out from independently trained models (one model per participant and cross-validation fold), that is, from models with different trained convolutional kernels. Therefore, we performed a clustering procedure for appropriately ordering the measures and aggregating them across models. The clustering procedure was applied separately to inflow and outflow measures. Specifically, constrained K-means⁴⁷ was adopted, and applied to the inflow (/outflow) measures averaged across the brain states. The clustering algorithm searched for 32 clusters (one per measure), while imposing no links between the measures within each participant and each cross-validation fold. That is, the clustering of the 32 inflow (/outflow) measures was achieved only across – and not within – the participant and fold dimensions. This ensured that each cluster was characterized by a peculiar connectivity inflow (/outflow) measure, that was consistent across participants and cross-validation folds. Once found the 32 clusters of inflow/outflow computations, for each brain state, we ordered the measures of each participant and each cross-validation fold according to the cluster they belong to, and we averaged each measure across the 10 cross-validation folds. Overall, this processing resulted into 32 average inflow measures and 32 average outflow measures for each participant and each brain state, termed as FCNet-based inflow measures and FCNet-based outflow measures. These 64 neural network-based connectivity measures were under investigation in this study.

Explanation of the network decision

FCNet decision was explained to understand the samples in the frequency and spatial domains that mostly drove the network decision towards the appropriate brain state. The network explanation analysis has been included in the proposed framework to not only extract useful measures for analyzing brain connectivity but also to visualize the most relevant contributions for the learning system in interpretable domains (frequency and spatial domains).

For each trained neural network (i.e., for each participant and each cross-validation fold), we explained the network decision by using the Deep Learning Important FeaTures (DeepLIFT) algorithm⁴⁸ while the network processed the test set examples ${A}_{i}$ as input (i.e., $\forall i|{A}_{i}\in \text{test set}$). Once performed the forward propagation of the information using an input example ${A}_{i}$, DeepLIFT backpropagates the output prediction back to a target layer (e.g., the input layer), providing a relevance representation map with the same shape of the layer output, quantifying the positive or negative contribution to the output prediction. This explanation technique provides a measure of the change in the output from a reference value with respect to the change in the input from a reference input value. In this study we exploited a reference input connectivity matrix of zeros, which is also the default option for DeepLIFT. DeepLIFT is widely adopted in the literature for explaining the network decision when applying deep neural networks to multi-variate neural activity^35,36,37,38, and represents a good candidate for explaining the decision of the proposed neural network, that processes a connectivity matrix extracted from a multi-variate neural activity. Moreover, a recent benchmark study on explanation techniques applied on simulated EEG time series showed that DeepLIFT is consistently more accurate and robust to detect neural changes in the temporal, spatial, and frequency domains compared to other techniques, e.g., saliency maps, gradient-weighted class activation mapping (Grad-CAM), and guided GradCAM³⁵.

We derived DeepLIFT relevance representations associated to the output neuron of the correct class (i.e., the brain state associated to the input: right-hand or left-hand motor imagery in this study), separately with respect to the input layer and to the inflow/outflow convolutional layers. Crucially, the output of all these layers is interpretable, thus, the derived relevance representations can be easily extracted and interpreted in a domain of interest. For each trained network (i.e., each participant and cross-validation fold), and each input test example ${A}_{i}\in {\mathbb{R}}^{R\times R\times F}$ we derived three types of representations, describing how the network processed the input in distinct domains for providing the correct brain state as output. The first representation quantified the relevance of each input connection, between each pair of brain regions, as a function of the frequency, resulting into an input relevance map $\in {\mathbb{R}}^{R\times R\times F}$ (relevance computed with respect to the input layer). The second and the third representations quantified the relevance of each connectivity inflow and outflow measure (64 in total) for each brain region, resulting into 32 inflow and 32 outflow relevance maps $\in {\mathbb{R}}^{R}$ (relevance computed with respect to the inflow/outflow convolutional layers). These three types of representations were separately averaged across the test examples and underwent the following processing, for easing our understanding of the most relevant connectivity-related features in the frequency and spatial domains.

i.

Frequency domain. As concerning the relevance representations computed with respect to the input layer, we averaged the DeepLIFT maps ($\in {\mathbb{R}}^{R\times R\times F}$) across cross-validation folds, and all brain regions. Therefore, for each participant we obtained an average frequency pattern (spectral relevance, $\in {\mathbb{R}}^{F}$), quantifying the relevance of each frequency bin of the input connectivity matrix, for discriminating the correct brain state. This relevance pattern could be useful for visualizing the frequency range contained in the input connectivity estimate most important for decoding the different brain state conditions (here different motor imagery conditions).
ii.

Spatial domain. As concerning the relevance representations computed with respect to the connectivity inflow and outflow layers, we averaged each of the DeepLIFT maps (i.e., each of the 32 inflow relevance maps and each of the 32 outflow relevance maps $\in {\mathbb{R}}^{R}$) across participants, cross-validation folds, and brain regions. The averaging across participants and folds was accomplished by ordering the 32 relevance maps within each participant and fold according to the results of the clustering procedure described in Sect. “Analysis of the output of intermediate interpretable layers”. After this procedure, we obtained a scalar relevance score specific of each of the 32 inflow measures and of the 32 outflow measures (inflow/outflow measure relevance score), that, in its absolute value, could be useful for ranking the 64 inflow/outflow measures based on their contribution towards the correct output brain state. This score could be exploited for visualizing the most relevant connectivity inflow and outflow, as extracted from the neural network (out of the total 64 measures), when analyzing the output brain states.

Experiments

EEG pre-processing and processing

The proposed framework was applied on functional connectivity estimates derived at both the scalp level and cortex level, to provide a more complete evaluation. We addressed the decoding and analysis of right-hand vs. left-hand motor imagery (2 brain states under analysis), which is a common and simple task performed in the literature, mainly exploited for brain-computer interface applications^28,29. In the following, the datasets used in our experiments are briefly presented, together with the performed pre-processing and processing steps.

i.

Scalp-level experiments. We exploited the BNCI2014-001 dataset⁵¹, also known as ‘dataset IIa’ from BCI competition IV. Widely recognized in the literature as a primary benchmark for assessing the efficacy of new neural networks in EEG applications, this dataset has been extensively employed in previous studies²⁷. It consists of 22-channel EEG sampled at 250 Hz, recorded from 9 healthy participants across 2 recording sessions. Electrodes were placed according to 10–10 international system as displayed in Fig. 2a. The imagination of the right-hand and left-hand movements was performed for 4 s. For each participant and each session, 144 trials were recorded, balanced across classes (288 trials in total, for each participant). Signals were band-pass filtered between 1 and 40 Hz using a 4th order zero-phase Butterworth filter. Then, the EEG was epoched by extracting 4-s length epochs from 0 s to 4 s with respect to the motor imagery onset. Finally, to mitigate potential spurious connections between sensors arising from volume conduction when estimating brain connectivity (see Sect. “Functional connectivity estimation”), a spherical spline surface Laplacian transformation was applied⁸. After this pre-processing procedure, each EEG epoch was ${X}_{i}{\in \mathbb{R}}^{22\times 1000}$ ($R=22$, $T=1000$), and the corresponding label was ${y}_{i}\in L=\left\{\text{right-ha}\text{nd}, \text{left-hand}\right\}$. These epochs defined the ${{D}_{EEG}}^{\left(p\right)}$ for the $p$-th participant (see Eq. 1).
ii.

Cortex-level experiments. We exploited the Lee2019-MI dataset⁵². This dataset consists of 62-channel EEG sampled at 1000 Hz, recorded from 54 participants across 2 recording sessions. The imagination of the right-hand and left-hand movements was performed for 4 s. For each participant and each session, 100 trials were recorded, balanced across classes (200 trials in total, for each participant). Signals were band-pass filtered between 1 and 40 Hz using a 4th order zero-phase Butterworth filter, and downsampled to 250 Hz (same sampling frequency used for the dataset described in point i.). Then, the EEG was epoched by extracting 4-s length epochs from 0 s to 4 s with respect to the motor imagery onset. Finally, the cortical activity was derived as follows from each EEG epoch. Scalp signals were transformed into cortical signals using MNE Python library (version 1.2.2)⁵³. To this aim, a template head anatomy was adopted using the FSaverage template, with the source space restricted to the cortex and discretized into 20,484 vertices. The forward problem⁵⁴ was solved via the boundary element method with MNE default parameters. The inverse problem¹² was solved using eLORETA (exact Low Resolution Electromagnetic Tomography)⁵⁵ with MNE default parameters, with identity noise covariance matrix, and with the dipole source orientation constrained to be perpendicular to the cortex. By doing so, each cortical vertex was associated to one source signal. The cortical surface was parcellated into 24 ROIs (12 per hemisphere) extracted from the Desikan-Killiany atlas⁵⁶, covering frontal, parietal, and occipital areas; these ROIs are known to be involved in the fronto-parietal network active during motor execution and imagination^57,58, and were also considered in prior studies^7,59 when analyzing the cortical activity over a selection of relevant ROIs during movements. The list of the 24 ROIs and of their abbreviations is reported in Table 2, and their location over the cortex is displayed in Fig. 2b. It is worth noting that we adopted this coarser cortical resolution with large cortical ROIs in order to mitigate the effects of spatial blurring and localization inaccuracy due to the relatively low EEG spatial resolution (62 channels). A waveform representative of the neural activity of each ROI was derived by averaging all signals of the vertices belonging to that ROI. To avoid cancelling out the neural activity in case of many vertices within the ROI having dipole orientations in opposite directions, the signs of source signals that were not oriented as the ‘dominant direction’ were flipped before averaging⁶⁰. The dominant direction corresponded to the first principal direction of orientations of dipoles belonging to the ROI. After these pre-processing and processing procedures, each EEG epoch was ${X}_{i}{\in \mathbb{R}}^{24\times 1000}$ ($R=24$, $T=1000$), and the corresponding label was ${y}_{i}\in L=\left\{\text{right-hand}, \text{left-hand}\right\}$. These epochs defined the ${{D}_{EEG}}^{\left(p\right)}$ for the $p$-th participant (see Eq. 1).

Table 2 List of the cortical regions of interest (ROIs) adopted in the cortex-level experiments. The cortex was parcellated into 12 ROIs per hemisphere (24 ROIs in total), based on the Desikan Killiany atlas⁵⁶. The complete name of each ROI is reported in the left column, the lobe to which the ROI belongs to is indicated in the middle column, while the ROI abbreviation is reported on the right. Additionally, the table reports the divisions/cortices composing each selected ROI, according to the chosen atlas.

Functional connectivity estimation

The directional influences between brain regions (either at the scalp-level or at the cortex-level) were estimated by computing pairwise spectral Granger causality (GC)¹⁴, exploiting an order $p=30$ (as done in prior studies^7,61) in the bivariate autoregressive model. Since in both experiments the signals were band-limited in the range 1–40 Hz, spectral GC was computed in this range of frequency for a set of $F=81$ frequencies, by using a frequency resolution of about 0.5 Hz. For the generic $f$-th frequency bin, the spectral GC is represented by a non-symmetric matrix ${\in \mathbb{R}}^{R\times R}$, with the off-diagonal $jk$-th value quantifying the influence exerted by the $j$-th signal (representing the $j$-th brain region) onto the $k$-th signal (representing the $k$-th brain region) at that frequency, i.e., ${GC}_{j\to k, f}$. For each participant, the directional influences in the frequency domain were estimated separately for each EEG epoch. Thus, we obtained a functional connectivity matrix ${A}_{i}\in {\mathbb{R}}^{R\times R\times F}$ for each EEG epoch ${X}_{i}$. To emphasize how much each connectivity value contributed to the overall connectivity across the brain regions and to mitigate inter-trial variability, the connectivity matrix at the $f$-th frequency bin ($f=0,…,80$) was normalized such that the sum of all off-diagonal connectivity values was 1, i.e., ${A}_{i}\left[:, :,f\right]=\raisebox{1ex}{${A}_{i}\left[:, :,f\right]$}\!\left/ \!\raisebox{-1ex}{${\sum }_{j,k,j\ne k}{A}_{i}\left[j,k,f\right]$}\right.$, as done in a previous study⁷.

All functional connectivity estimates and the associated motor imagery conditions were collected to form the dataset ${{D}_{FC}}^{\left(p\right)}$ for the $p$-th participant (see Eq. 2). Each 3D matrix ${A}_{i}$ represented the network input, and the corresponding motor imagery ${y}_{i}$ represented the network output. Therefore, according to the adopted network training (see Sect. “Network architecture and training”), 207/144, 52/36, 29/20 examples were used in the training, validation, and test sets, respectively (scalp-level/cortex-level experiments).

Inflow and outflow measures: statistical analysis between motor states

To assess the effect size and statistical significance of the inflow and outflow measures, a statistical analysis on these measures was performed, by comparing each of them between the considered brain states (i.e., right-hand vs. left-hand motor imagery). These measures included the FCNet-based measures (see Sect. “Analysis of the output of intermediate interpretable layers”) and also measures extracted with a classic approach (graph theory) used for reference. Specifically, the classic analysis approach consisted of the following processing. For each experiment (scalp-level and cortex-level), participant and motor imagery condition:

i.

Computation of the total, alpha-band, and beta-band connectivity matrices, by integrating the connectivity matrix ${A}_{i}$ ($\forall i$), across the entire frequency axis (1–40 Hz), the alpha-band frequency range (8–12 Hz), and the beta-band frequency range (12–30 Hz), respectively. Thus, we obtained ${A}_{i,tot}$, ${A}_{i,\alpha}$, ${A}_{i,\beta}$ connectivity matrices ($\in {\mathbb{R}}^{R\times R}$).
ii.

Computation of centrality indices derived from the graph theory, to synthetize some changes in the topology of the brain connectivity network between the motor imagery conditions. The brain connectivity can be described by a weighted graph, where the magnitude of the connectivity between two brain regions (EEG sensors or cortical ROIs) is represented as the weight of an edge, while the two brain regions connected by an edge represent two nodes of the graph. A centrality index provides a measure of importance of a particular node in the graph. By considering all nodes in the brain graph, a centrality measure can be represented by an array $\in {\mathbb{R}}^{R}$. Here we focus on centrality indices that take into account the direction of connections, specifically the in degree, out degree, authority and hubness, and were computed on each connectivity matrix derived from the previous point i. (i.e., on ${A}_{i,tot}$, ${A}_{i,\alpha}$, ${A}_{i,\beta}$). These indices were considered as prior studies that analyzed changes in the brain network connectivity during motor imagery observed a peculiar directionality in the connectivity pattern^18,19,20. The mathematical formulation of these indices is provided in the following, by considering a generic adjacency matrix $A$ (one among ${A}_{i,tot}$, ${A}_{i,\alpha}$, ${A}_{i,\beta}$), i.e., a matrix defined by all edges’ weights, with the $jk$-th element of the matrix ($A\left[j,k\right]$) representing the weight of the edge connecting the node $j$ to node $k$, where a node corresponds to a brain region, either on the scalp or on the cortex. The in degree and out degree (${in}_{k}$ and ${out}_{k}$) for a node (node $k$) are the sum of the weights of the edges entering and exiting from that node, respectively:

$${in}_{k}=\sum_{j}A\left[j,k\right],$$

(4)

$${out}_{k}=\sum_{j}A\left[k,j\right].$$

(5)

Thus, the in degree and out degree provide a direct interpretation of the nodes most involved in the reception (in degree) and transmission (out degree) of information, by equally weighting the nodes in the graph in the same way across conditions. Authority and hubness are recursive measures that provide a more refined concept of reception and transmission of information than the previous ones. Specifically, the authority (${auth}_{k}$) of a node (node $k$) is the sum of the weights of edges entering a node, multiplied by the hubness of the node the edge originates from. On the other hand, the hubness (${hub}_{k}$) of a node (node $k$) is the sum of the weights of edges exiting from a node, multiplied by the authority of the node the edge points to.

$${auth}_{k}=\sum_{j}A\left[j,k\right]{hub}_{j},$$

(6)

$${hub}_{k}=\sum_{j}A\left[k,j\right]{auth}_{j}.$$

(7)

Like the in degree and out degree, these measures provide insights about the nodes that are mostly involved in the reception (authority) and transmission (hubness) of the information, but they take mutually into account the centrality of receiving and sending nodes. This results in a weighted computation, where the weights are specific of the adjacency matrix $A$ on which the centrality index is computed, related to a specific condition. The recursive formulation implies that strong connections exist between nodes with high authority and nodes with high hubness. This could be useful to further emphasize any existing directionality in the connectivity pattern. As the centrality indices were computed separately for each brain region (i.e., graph node), each centrality index resulted into an array $\in {\mathbb{R}}^{R}$.
iii.

Averaging of the centrality indices computed at point ii. across the examples (i.e., EEG epochs) belonging to the same motor imagery condition, i.e., $\forall i|{y}_{i}=\text{right-hand}$, and $\forall i|{y}_{i}=\text{left-hand}$.

Therefore, after this procedure, for each experiment (scalp-level and cortex-level), each participant and each motor imagery condition, we derived a set of 4 centrality measures (in degree, out degree, authority, hubness), separately for the brain connectivity matrix representing the total connectivity, alpha-band connectivity, and beta-band connectivity. This results in a total of 12 graph theory-based centrality measures (6 inflow and 6 outflow measures).

For each experiment (scalp-level and cortex-level), and each measure derived from the graph theory (12 in total) and from FCNet (64 in total, see Sect. “Analysis of the output of intermediate interpretable layers”), we compared each inflow/outflow measure between the right-hand and left-hand motor imagery condition, by performing a paired permutation test based on t-statistic (two-tail test, 10000 permutations), employing the tmax method for adjusting p-values for multiple comparisons ($R$ tests, one per brain region, for each measure)⁶². The Cohen’s d was used to quantify the effect size⁶³, i.e., $d=t/\sqrt{s}$, where $t$ is the t-value and $s$ is the sample size ($s=9$ and $s=54$, corresponding to the number of participants respectively in the scalp-level and cortex-level experiments).

Comparative analysis of FCNet vs. other decoding approaches

In this study we are not focusing on proposing a new approach for improving the decoding performance in motor imagery neural decoding; rather, the key contribution point is the design and exploitation of an interpretable and explainable artificial intelligence approach to analyze functional connectivity during motor imagery (i.e., neural data analysis). However, even though it is not our main aim, as a secondary and additional contribution, we also performed a comparative analysis of FCNet vs. other neural decoders. To this aim, we compared the decoding performance scored by FCNet – on top of which the proposed data-driven framework is built – with two variants of FCNet and with nine other decoding approaches widely adopted in the literature for decoding motor imagery (4 machine learning algorithms and 5 deep neural networks). Specifically, we compared FCNet with:

i.

FCNet variants. Two non-interpretable variants of FCNet were considered, by changing the interpretable inflow/outflow convolutions of the network (spatial block), each performing a 2D convolution along one of the two dimensions of the high-level connectivity feature maps $\left(32,R,R\right)$ provided by the spectral block (see the interpretable connectivity inflow/outflow branches of the spatial block in Sect. “Network architecture and training”). Specifically, a first non-interpretable variant was obtained by redesigning the spatial block with a single 2D convolutional layer learning 32 kernels with size $W=\left(R,R\right)$, activated via ReLU. The convolutional layer was realized using separable convolutions (i.e., depthwise convolution combined with pointwise convolution), as in the original version. A feature array of 32 elements was returned, which was provided as input to the classification block. This way, the network learned patterns on the high-level connectivity feature maps $\left(32,R,R\right)$ in a mixed way across brain interactions, without exploiting an interpretable feature learning that enables the distinction between inflow/outflow connectivity features. Finally, a second non-interpretable variant was obtained by redesigning the spatial block with a single 2D average pooling layer (pooling size and stride of $W=S=(R,R)$). The high-level connectivity feature maps $\left(32,R,R\right)$ were summarized with a feature array of 32 elements, each containing the mean connectivity value across all brain interactions, and this array was provided as input to the classification block. Like the previous variant, the network summarized the feature maps in a mixed way across brain interactions, without exploiting an interpretable feature learning of inflow/outflow connectivity. The main difference between the variants is that, in the second variant the network does not learn the optimal recombination of the high-level connectivity feature maps across brain interactions, but simply computes the mean value.
ii.

State-of-the-art decoders. We selected 9 state-of-the-art decoders based on prior studies. Four of these algorithms were machine learning solutions proposed for decoding graph theory indices derived from brain connectivity networks. The remaining algorithms were deep neural networks proposed for decoding EEG multi-variate time series. As concerning the machine learning algorithms, graph theory measures were extracted, quantifying the inflow and outflow (in degree and out degree, or authority and hubness), optionally used in combination with the clustering coefficient – an index that indicates the clustering degree (i.e., the segregation degree) of the nodes of a brain network – and classified by using linear discriminant analysis (LDA), inspired from previous decoding studies^39,42. Among the deep neural networks, we included in the benchmark ShallowFBCSPNet and Deep4Net⁶⁴ – the first successful CNNs proposed for motor imagery decoding – together with EEGNet³⁶, a multi-purpose CNN reaching state-of-the-art decoding performance on a variety of tasks (e.g., decoding of motor imagery, P300, steady-state visual evoked potential)⁶⁵, also winning international EEG decoding competitions^66,67. Finally, we also included recent CNN designs that exploit inception modules for learning temporal features, such as EEGInception⁶⁸ and EEGITNet⁶⁹.

All decoders were trained and tested as FCNet, to provide a fair comparison, see Sect. “Network architecture and training”. Then, the decoding accuracy scored by FCNet was compared with the one scored by each of the other algorithms (2 from point i. and 9 from point ii.) by performing a Wilcoxon signed-rank test (two-tail test). To correct p-values for multiple tests (11 in total) we adopted the Benjamini-Hochberg procedure.

Source link