Research goal
The research GraphFedAI provides a topology-aware, scalable, and privacy solution to identify DDoS attacks in IoT systems. The GraphFed AI integrates an FL model with a graph neural network to overcome difficulties while managing privacy and security in heterogeneous IoT systems. The developed framework provides a collaborative model to train the systems without negotiating the privacy factor. The graph network network explores the IoT device’s topological, spatial, temporal, and relational features and communication patterns to recognize abnormal activities. To ensure network adaptability, the dual approach successfully identifies distributed and complex attack patterns like multi-point DDoS attacks. Hence, the ultimate goal of this work is to create an effective framework to identify and detect DDoS attacks in IoT systems by ensuring scalability and privacy concerns.
Framework design
This section discusses GraphFedAI-based DDoS attack detection that uses IoT devices to gather the information processed by data cleaning, training, and classification models to predict normal and DDoS attacks. Then, the overall structure of the GraphFedAI is shown in Fig. 2.

GraphFedAI framework design.
The main objective of this GraphFedAI framework is to reduce the loss function \(\left({\mathcal{L}}_{global}\right)\) value while detecting DDoS attacks in IoT systems. \({\mathcal{L}}_{global}\) is achieved while training the graph neural network over the FL model that ensures the system’s privacy \(\left(\mathfrak{p}\right)\) and scalability \(\left(\mathfrak{y}\right)\) by observing the association of graph structure. Then, \({\mathcal{L}}_{global}\) is defined using Eq. (1).
$$\left\{\begin{array}{c}\underset{\theta }{\text{min}}{\mathcal{L}}_{global}=\frac{1}{N}\sum_{i=1}^{N}{\mathcal{L}}_{local}^{i}+\lambda R\left(W\right)\\ {\mathcal{L}}_{local}^{i}=\frac{1}{\left|{V}_{i}\right|}\sum_{v\in {V}_{i}}{\ell}\left({f}_{\theta }\left({X}_{v},\widehat{{A}_{v}}\right),{y}_{v}\right)\\ R\left(W\right)=\frac{1}{2}{\Vert W\Vert }^{2}\end{array}\right.$$
(1)
In Eq. (1), \(R\left(W\right)\) is utilized to minimize the overfitting issues while exploring the input data \(X\) from IoT devices. \(X\) is examined using the introduced framework that generates the graph to identify the relationship between the features that help to minimize \({\mathcal{L}}_{global}\) value and generalize the framework. \({\mathcal{L}}_{global}\) is predicted from the ith IoT device’s local loss value \({\mathcal{L}}_{local}^{i}\), regularization weights \(\lambda\) and \(N\) number of contributed IoT devices in the environment. The graph \((G)\) is generated using the CICIoT-2023 dataset, which is pre-processed to eliminate irrelevant information. The generated \(G\) consists of nodes \((V)\), edges \((E),\) and feature matrix \((X)\) features that help to understand the IoT devices, communication links, and several attributes like flow statistics, packet size, traffic volume, etc. \(G=(V,E,X)\) identifies IoTdevice state \(S\) like usual \((n)\) and malicious involvement \(\left(\beta \right)\).
One important aspect to consider when dealing with various forms of distributed denial of service assaults, such as volumetric, protocol, or application layer attacks, is the use of GNNs. This guarantees that the model can accurately reflect IoT devices’ complex and dynamic interactions. Due to their built-in modeling of IoT network topology, GNNs may adapt to different attack patterns by studying regular node interactions and detecting attack-induced deviations. Even when assault techniques change or attack intensities vary, the model can retain high accuracy because of this adaptability. Data augmentation approaches are also used during training to expose the model to different assault scenarios with different sizes, timings, and methodologies. For the model to generalize well to various IoT installations and attack vectors, it is important to avoid overfitting certain attack types or environmental variables. To make it even more resilient, we apply multi-task learning and ensemble learning techniques, which simultaneously train many models or tasks to identify different kinds of assaults and network abnormalities. The insights from various models or tasks focusing on certain attack patterns strengthen the system’s ability to withstand unknown or new assault techniques. The model is equipped to manage the scattered nature of IoT devices by integrating FL. This allows for model training across devices without centralized data. By including a wide range of data points, the model becomes more resistant to changes in device settings, network circumstances, and attack types. Another way FL improves security is by reducing the likelihood of data leakage or a single failure point because of its decentralized design.
Data compilation
The first step is data compilation, in which data is prepared for identifying \(\beta\) while making the transmission. Initially, the missing values \((mv)\) are eliminated from \(D,\) which is done by applying interpolation to ensure data continuity. \(mv\) value is replaced according to Eq. (2), which computes the missing value at \(t\) index in \(D\). The computation covers the present \({x}_{t}\) and previous values \({x}_{t-1}\) to identify the replacement value for \(mv\).
$$\left\{ \begin{gathered} x_{t} = \left( {1 – {\mathbb{I}}\left( {x_{t} {\kern 1pt} \,is\,{\kern 1pt} missing} \right)} \right).x_{t} + I\left( {x_{t} {\kern 1pt} \,is{\kern 1pt} \,missing} \right).x_{{t – 1}} \hfill \\ \begin{array}{*{20}c} {x_{{t – 1}} } & {if\,{\kern 1pt} x_{t} {\kern 1pt} \,is\,{\kern 1pt} missing{\kern 1pt} \,;t \le limit} \\ \end{array} \hfill \\ \begin{array}{*{20}c} {x_{t} } & {otherwise} \\ \end{array} \hfill \\ \end{gathered} \right.$$
(2)
In Eq. (2), \({x}_{t}=1\) if \({x}_{t}\) has a missing value that is denoted as \({\mathbb{I}}\left({x}_{t} is missing\right)\) else 0 that is computed from the value of the last time step \({x}_{t-1}\) and \({x}_{t}\) value. This pre-processing is performed up to 15 successive missing values to manage the data continuity in \(D\). Then, encoding changes the categorical attributes into numerical values to manage the compatibility while identifying the \(\beta\). The encoding \(\left({v}_{i}\right)\) is done using Eq. (3).
$$\left\{\begin{array}{c}{v}_{i}=\left[{\mathbb{I}}\left({x}_{i}={C}_{1}\right),{\mathbb{I}}\left({x}_{i}={C}_{2}\right),\dots .{\mathbb{I}}\left({x}_{i}={C}_{C}\right)\right]\\ 1\quad if \, {x}_{i}={C}_{j} \\ 0 \quad if \, {x}_{i}\ne {C}_{j}\end{array}\right.$$
(3)
In Eq. (3), the categorical attributes are denoted as \({x}_{i}\) which are allocated to the respective feature class \({{C}_{1},{C}_{2},\dots \dots C}_{C}\) according to the characteristics function \({\mathbb{I}}\left({x}_{i}={C}_{j}\right)\). \({\mathbb{I}}\left({x}_{i}={C}_{1}\right)\) is defined as \(\left\{\begin{array}{c}1, \quad if \,\, {x}_{i}={C}_{j} \\ 0, \quad if \,\, if \, \,{x}_{i}\ne {C}_{j}\end{array}\right.\) from these values, the \({x}_{i}\)’s encoding variable \({v}_{i}\) is computed. \({x}_{i}\) feature category is checked by \({\mathbb{I}}\left({x}_{i}={C}_{j}\right)\) if the \({x}_{i}\) is depending on \({C}_{j}\) then \({\mathbb{I}}\left({x}_{i}={C}_{j}\right)=1\) else 0. Consider \(D\) has \(attack\_type\) as the feature that belongs to different categories \(\left\{{C}_{1}\left(normal\right), {C}_{2} \left(DDoS\right)and {C}_{3}(Dos)\right\}\). Then, \({x}_{i}\) need to encode to \({C}_{2}\) by exploring \({v}_{i}=\left[I\left({x}_{i}={C}_{1}\right),I\left({x}_{i}={C}_{2}\right),I({x}_{i}={C}_{3})\right]\). If \({x}_{i}\) is belongs to \({C}_{2}\) then \({\mathbb{I}}\left({x}_{i}={C}_{j}\right)\) is defined as \(\left\{\begin{array}{c}I\left({x}_{i}={C}_{1}\right)=0 ({x}_{i}\ne Normal)\\ I\left({x}_{i}={C}_{2}\right)=1 ({x}_{i}=DDoS)\\ I\left({x}_{i}={C}_{3}\right)=0 ({x}_{i}\ne DoS)\end{array}\right.\). Then, \({v}_{i}\) for data point \({x}_{i}\) is defined as \({v}_{i}=\left[\text{0,1},0\right]\). Then, these generalized encoding processes are described in Table 2.
Table 2 shows that \(Gen({x}_{i}({v}_{i})\) while analyzing the attack type using \({\mathbb{I}}\left({x}_{i}={C}_{j}\right)\) which returns the output according to the condition described in Eq. (3). The output \({v}_{i}\) represents that the binary vector for input \({x}_{i}\). Then, the Pearson correlation coefficient \((\rho )\) is computed to minimize the overfitting issues and improve the interpretability characteristics. \(\rho\) value identifies the association between the features that maintain the DDoS prediction system stability. The computed \(\rho\) value lies between \(-1 to 1\) in which \(\rho =1\) represents that features have a linear relationship (positive), \(\rho =-1\) means a linear relationship in negative, and \(\rho =0\) means no linear relationship.
The Pearson correlation coefficient is used to identify and remove highly correlated features (threshold > 0.9), which often introduce redundancy and noise. In the context of DDoS detection, this improves model generalization by reducing overfitting and focusing on more discriminative traffic features. It also lowers false positive rates by minimizing misleading feature combinations that resemble attack-like behavior in benign traffic patterns.
Then, \(\rho\) is computed using Eq. (4).
$$\left\{\begin{array}{c}\rho \left(X,Y\right)=\frac{\sum_{i=1}^{n}\left({X}_{i}-\overline{X }\right)\left({Y}_{i}-\overline{Y }\right)}{\sqrt{\sum_{i=1}^{n}{\left({X}_{i}-\overline{X }\right)}^{2}\sum_{i=1}^{n}{\left({Y}_{i}-\overline{Y }\right)}^{2}}}\\ \rho =1 \quad \quad positive \quad correlation\\ \rho =-1 \quad \quad negative \quad correaltion\\ \rho =0 \quad \quad no \quad correlation\\ 0.95\le \rho \le 1 \quad \quad very \quad high \quad positive \quad correlation\\ -1\le \rho \le -0.95 \quad \quad very \quad high \quad negative \quad correlation \end{array}\right.$$
(4)
According to Eq. (4), the feature continuity is computed from \(\rho\) value, and \(\rho>0.95\) is selected as the threshold value to predict the highly correlated features. \(\rho\) computation was used to minimize the multicollinearity and redundancy issues while classifying \((n \, and \,\beta )\). The CICIoT-2023 dataset \(D\) has different features (\(f)\) gathered from device interaction, sensor data, and traffic. \(f\) has categorical (attack type) and continuous data (packet counts, packet size, and transmission time). These \(f\) are explored using Eq. (4) to identify the highly correlated features. First, the correlation matrix \((R)\) is estimated for \(f=\left\{{x}_{1}, {x}_{2}, {x}_{3},\dots \dots {x}_{n}\right\}\) which is defined in Eq. (5).
$$R=\left[\begin{array}{ccc}\rho \left({x}_{1},{x}_{1}\right) & \rho ({x}_{1}, {x}_{2})& \begin{array}{cc}\dots \dots & \rho \left({x}_{1},{x}_{n}\right)\end{array}\\ \rho \left({x}_{2},{x}_{1}\right)& \rho \left({x}_{2},{x}_{2}\right)& \begin{array}{cc}\dots \dots & \rho ({x}_{2},{x}_{n})\end{array}\\ \begin{array}{c}.\\ \begin{array}{c}.\\ .\\ \rho \left({x}_{n},{x}_{1}\right)\end{array}\end{array}& \begin{array}{c}.\\ \begin{array}{c}.\\ \begin{array}{c}.\\ \rho \left({x}_{n},{x}_{2}\right)\end{array}\end{array}\end{array}& \begin{array}{c}.\\ \begin{array}{c}.\\ \begin{array}{c}.\\ \rho \left({x}_{n},{x}_{n}\right)\end{array}\end{array}\end{array}\end{array}\right]$$
(5)
In Eq. (5), the correlation between \({x}_{i}\) and \({x}_{j}\) is signified as \(\rho \left({x}_{i},{x}_{j}\right)\); here, the diagonal matrix takes 1 as a value because every feature is effortlessly related \(\rho \left({x}_{i},{x}_{j}\right)=1\). Then, \(R\) value is compared with the threshold value of 0.95 to minimize the redundancy. Let the feature \(packe{t}_{size} \,\, and \,\, nu{m}_{packet}\) correlation \(\rho \left( packe{t}_{size}, nu{m}_{packet}\right)\) is computed based on Eq. (5), and the obtained value is defined as \(R=\left[\begin{array}{cc}1.00& 0.98\\ 0.98& 1.00\end{array}\right]\). From the computation, the correlation between \(\rho \left( packe{t}_{size}, nu{m}_{packet}\right)\) is a robust and positive correlation because 0.98 is more significant than a threshold value. In addition, irrelevant features are removed from \(D\) to minimize the multicollinearity. Here, \(\rho \left( packe{t}_{size}, nu{m}_{packet}\right)\) is highly correlated; therefore, \(nu{m}_{packet}\) is eliminated from the list and maintains the \(packe{t}_{size}\) Vice-versa. This selection of \(f\) is done based on the feature relevancy and a few \(f\) like \(rs{{t}_{flag}}_{number}, LIC, to{t}_{size}, fi{n}_{count}, AVG, srate, ac{{k}_{fla}}_{numb}, radius, weight\) are removed from \(f\). Then, the final \(f\) is obtained is defined as \(final f={X}_{original}-droppe{d}_{f}\). Then, the graphical analysis of the data compilation is shown in Fig. 3.

Graphical analysis of data compilation.
Figure 3 illustrates the graphical analysis of data compilation evaluated using heatmap, which provides the relationship between \(f\) used to identify the \(red\left(f\right)\). The heatmap defines \(\rho\) between \(f,\) which has the value of range between \(-1\,\, to \,\, 1\). \(\left\{\begin{array}{c}\rho =1 \quad \quad positive \,\, correlation\\ \rho =-1 \quad \quad negative \,\, correlation\\ \rho =0 \quad \quad no \,\, relationship\end{array}\right.\) computation used to identify the relationship between \(f,\) which is described in Fig. 3c. According to \(\rho\) value, the features are reduced, and relevant features are shown in Fig. 3d. From the computation \(packe{t}_{size}\) and \(ac{k}_{coun{t}_{number}}\) having a moderate relationship and \(ac{k}_{coun{t}_{number}} \,\, and \,\, duration\) has a weaker correlation. The diagonal value indicates that the features have an exact correlation. The successive analysis of these correlation analyses minimizes the overfitting issues and improves the generalizability and interpretability while classifying \((n \,\, and \,\, \beta )\).
Federated learning in GraphFedAI framework
GraphFedAI framework uses the FL model to train the graph network to improve the DDoS detection rate. The FL is an effective collaborative model that trains the neural network to preserve data security and privacy. The FL trains the network \({\mathfrak{f}}_{{\theta }_{k}}\) for every dataset \({D}_{k}\) with \({\theta }_{k}\) parameter. FL transmits the updated parameters \({\Delta \theta }_{k}\) which is obtained from the objective function \({\mathbb{L}}^{(k)}\) to \(c{s}_{i}\). The updating is done according to \({\mathbb{L}}^{(k)}\) that is computed according to Eq. (7). The output from \(c{s}_{i}\) is aggregated with the help of weighted averaging value, which is defined as \({\theta }^{(t+1)}=\sum_{k=1}^{N}\frac{\left|{V}_{k}\right|}{{\sum }_{j=1}^{N}\left|{V}_{j}\right|}{\theta }^{(t+1)}\). The aggregation process ensures data sensitivity and privacy. FL has three components: central aggregator, edge servers, and IoT devices, which optimize the model and process the IoT traffic data to reduce anomaly activities. The first component is IoT devices, which are local endpoints used to gather information by observing the environment. The collected information is sent to the following IoT devices or servers by establishing the communication link. IoT interaction creates the graphs that are denoted as \({G}_{k}=\left({V}_{k},{E}_{k},{X}_{k}\right)\). Node \({V}_{k}\) participate in the local networks \(k\) is denoted as the \({k}_{th}\) IoT device. Every \({V}_{k}\) has edges \({E}_{k}\) to make the interactions between other nodes or servers. \({E}_{k}\) creates pairwise communication, which has a weight value while making the interactions. Then, \({E}_{k}\) is defined as \(\left\{\left(i,j\right)\left|i \,\, and \,\, j \,\, direct \,\, communciation\right.\right\}\). \({X}_{k}\) is defined as the node \(f \left(packe{t}_{size}, packe{t}_{duration}, flo{w}_{stat}\right)\). These \(f\) are defined in the matrix format and the row belongs to the \({N}_{k}\in {G}_{k}\) and the feature vector of node \(i\in {V}_{k}\) is represented as \({X}_{k}=\left[{x}_{i1},{x}_{i2},\dots \dots .{x}_{im}\right]\). \({X}_{k}\) consists of local information on IoT devices that helps to understand network and node behavior in IoT systems. Consider the IoT sensors like \(te{mp}_{sensor}, hu{m}_{sensor} \,\, and \,\, gateway\) the interaction between these devices forms a graph \({G}_{k}\). The \({G}_{k}\) consists of \({V}_{k}:\left\{{v}_{1},{v}_{2}, {v}_{3}\right\}\) and \({E}_{k}:\left\{\left({v}_{1},{v}_{2}\right), \left({v}_{2},{v}_{3}\right)\right\}\) which helps to create the interaction between the \(te{mp}_{sensor}, hu{m}_{sensor} and \,\, gateway\). These devices collect the \(f,\) and the formed matrix is represented as \({X}_{k}=\left[\begin{array}{cc}450& 6\\ 300& 5\\ 200& 4\end{array}\right]\); column 1 denoted as the \(packe{t}_{size} (in\,\, bytes)\) and column 2 denoted as \(packe{t}_{duration}(in \,\, ms)\) for \(te{mp}_{sensor}, hu{m}_{sensor} and \,\, gateway\). The generated graph \({G}_{k}\) is explored using the classifier to identify the \((n \,\, and \,\, \beta )\). After forming the \({G}_{k}=\left({V}_{k},{E}_{k},{X}_{k}\right)\), IoT devices need to be trained using GNN to analyze graph data. The graphical structure of GNN is shown in Fig. 4.

Graphical structure of graph neural networks.
During this process, every IoT device is trained separately, and the information is aggregated while predicting the \(n \,\, and \,\, \beta .\) The main intention of the GNN training is to understand the graph embeddings to observe the pattern and structure of the IoT network for classifying the \(n \,\, and \,\, \beta .\) Then, the overall GNN training process is described by Eq. (6).
$$\left\{\begin{array}{c}{h}_{v}^{(l)}=agg\left(\left\{{h}_{u}^{(l-1)}|u\in \mathcal{N}(v)\right\}\right)\\ {h}_{v}^{(l+1)}=\sigma \left({w}^{(l)}{h}_{v}^{(l)}\right)\\ \widehat{{y}_{v}}=softmax\left({w}^{(l)}{h}_{v}^{(l)}\right)\end{array}\right.$$
(6)
The IoT device \({G}_{k}=\left({V}_{k},{E}_{k},{X}_{k}\right)\) is processed by the GNN to get \({\mathfrak{f}}_{{\theta }_{k}}\) prediction from every embedding \({h}_{v}\) for node \(v\in {V}^{k}\). \({\mathfrak{f}}_{{\theta }_{k}}\) is obtained from the \(agg\)(aggregated) information. The aggregation is done by performing message-passing \(\left\{{h}_{u}^{(l-1)}|u\in \mathcal{N}(v)\right\}\) to the neighboring nodes \(\mathcal{N}(v)\). During this process, \(u\) th previous layer embedding details \({h}_{u}^{(l-1)}\). \({h}_{v}^{(l)}\) value is transformed with the help of \({\theta }_{k}\) trainable weights, which are defined as \({h}_{v}^{(l+1)}\). This transformation uses the \({w}^{(l)}\) and ReLU activation function \(\sigma (.)\) to get the feature transformation. The final \({h}_{v}^{(l+1)}\) is fed into the output layer to predict \(\widehat{{y}_{v}}\) and the training process is continued until to reach the minimum loss value \(\left({\mathbb{L}}^{k}\right)\) (Eq. 7).
$$\left\{\begin{array}{c}{\mathbb{L}}^{(k)}=\frac{1}{\left|{V}^{k}\right|}\sum_{v\in {V}^{k}}l\left( {\mathfrak{f}}_{{\theta }_{k}} \left(v\right),{y}_{v}\right)\\ l\left({\mathfrak{f}}_{{\theta }_{k}}(v),{y}_{v}\right)=-\sum_{c=1}^{C}{y}_{v,c}\text{log}\left({\mathfrak{f}}_{{\theta }_{k}}{\left(v\right)}_{c}\right)\\ \Delta {\theta }_{k}={\Delta \theta }_{k}^{new}-{\Delta \theta }_{k}^{old}\end{array}\right.$$
(7)
Equation (7) is used to reduce the \({\mathbb{L}}^{(k)}\) value of \({G}_{k}\) which indicates how effectively GNN recognizes the \(n \,\, and \,\, \beta .\) The \({\mathbb{L}}^{(k)}\) is computed by considering \(C\) output in which \({\mathfrak{f}}_{{\theta }_{k}} \left(v\right)\) predicted label and true output \({y}_{v}\). According to the \({\mathbb{L}}^{(k)}\) network parameter \(\Delta {\theta }_{k}\) is updated in terms of computing the after \({\Delta \theta }_{k}^{new}\) and before \({\Delta \theta }_{k}^{old}\) local training of parameters. This updated information is transferred to the edge servers to improve the aggregation process. The local training process preserves privacy and maintains decentralized learning directly linked with the system’s scalability and communication overhead. The edge server \(e{s}_{i}\) performs the local optimization \(\left(\tau \right)\) by receiving \(\Delta {\theta }_{k}\) from IoT devices. \(\Delta {\theta }_{k}\) The value obtained from the gradient update, which is defined as \(\Delta {\theta }_{k}={\theta }_{k}^{
(8)
The collected \(\Delta {\theta }_{k}\) values are aggregated in \(e{s}_{i}\) to \(\tau\) the model \({\mathfrak{f}}_{{\phi }_{i}}\). The aggregation process gathers the entire IoT devices \({C}_{i}\). From the aggregated information, the \(e{s}_{i}\) minimizes the \({\mathbb{L}}_{{e}_{i}}\) value for every device \(k\) by using cross-entropy loss function \(l\left( .\right)\). Then, the GNN attributes \({\phi }_{i}\) is optimize \(e{s}_{i}\) using the aggregated \(\Delta {\theta }_{k}\) which is defined in Eq. (9).
$$\left\{\begin{array}{c}\Delta {\phi }_{i}=\eta .\frac{1}{\left|{C}_{i}\right|}\sum_{k{\in C}_{i}}\Delta {\theta }_{k}\\ \Delta {\phi }_{i}\to ca\end{array}\right.$$
(9)
\(e{s}_{i}\) updates \(\Delta {\phi }_{i}\) value by using \(e{s}_{i}^ {\prime}s\) learning rate \(\eta\) and the aggregated parameter value. The selected parameters refine \({\mathfrak{f}}_{{\phi }_{i}}\) and \(\Delta {\phi }_{i}\) is forwarded to \(ca\) (\(\Delta {\phi }_{i}\to ca)\). \(\Delta {\phi }_{i}\) updating was performed for all IoT devices to minimize data transfer needs globally and maximize overall performance while predicting DDoS attacks. The FL process ensures the GraphFedAI framework’s robustness in the IoT environment. The FL learning process allows the GNN to work in different conditions, such as authentic traffic patterns, attack format, and network conditions. The Improvement of GraphFedAI is evaluated before and after applying the FL; the result is shown in Table 3. Missing or partial data, often caused by network interruptions or sensor failures, is handled using interpolation in real-world IoT systems. To ensure the model can train on a more comprehensive dataset and avoid gaps that might lower its performance, interpolation estimates the missing values between known data points. For instance, when data points are missing because of sporadic transmission, interpolation may approximate these values by analyzing patterns in nearby data points. For the model to identify anomalous behavior during DDoS assaults, the data must remain consistent and uninterrupted. This technique enables this to happen. The connections between the dataset’s characteristics, such as packet frequency, device communication patterns, and traffic volume, are evaluated using correlation analysis. The model can distinguish between regular network activity and abnormal due to DDoS assaults by determining the correlation between several variables. During an assault, features that typically exhibit strong correlations may have weak correlations or unexpected patterns. This aids in identifying outliers that may suggest harmful behavior. The model can zero in on the most important aspects using correlation analysis, which improves the DDoS detection mechanism’s accuracy and efficacy by picking up on small changes from typical network patterns.
Table 3 illustrates that the Improvement of FL in \({\mathfrak{f}}_{{\theta }_{k}}\) while predicting the DDoS attacks on various attack types such as SYN flood, UDP flood, HTTP flood, etc. The IoT devices are trained and \(\Delta {\theta }_{k}\) values are aggregated, helps every device learn the attack types and improves \({\mathfrak{f}}_{{\theta }_{k}}\) rate and robustness \((r)\). \(\Delta {\phi }_{i}\to ca\) values identify the high-risk regions like spoofing and flooding and low-risk regions, which helps to predict the DDoS attacks with maximum accuracy. The frequent \(\eta .\frac{1}{\left|{C}_{i}\right|}\sum_{k{\in C}_{i}}\Delta {\theta }_{k}\) The updation process helps the system adapt to attack patterns and IoT conditions.
Decision-making of DDoS attack detection in IoT
The final stage of the GraphFedAI framework is DDoS attack detection \(\left(\delta \right)\) which uses the FL to train the GNN \({\mathfrak{f}}_{{\phi }_{i}}\) that are aggregated from \(e{s}_{i}\). \(\delta\) explores the traffic \(\left(t\right)\) from \({G}_{k}\) which is defined as \({G}_{int}=({V}_{k},{E}_{k},{X}_{k})\). The overall decision-making process computation is described in Fig. 5.

Representation of DDoS decision-making analysis.
The GNN processes the \({G}_{int}\) in the message passing stage \({m}_{v}^{(l)}=\sum_{u\in N(v)}{\phi }^{\left(l\right)}({h}_{u}^{\left(l\right)},{h}_{v}^{\left(l\right)}, {e}_{uv})\) that aggregate the message from layer \(l\), the learning function \({\phi }^{\left(l\right)}\) and edge features \({e}_{uv}\). This \({m}_{v}^{(l)}\) value is utilized for updating the node \(v\) representation using \({h}_{v}^{(l+1)}={\varphi }^{\left(l\right)}({h}_{v}^{\left(l\right)},{m}_{v}^{(l)})\). The values are aggregated over the layers, and the prediction is performed as \(\widehat{{y}_{v}}=\sigma \left({w}_{o}.{h}_{v}^{\left(L\right)}+{b}_{o}\right)\). During this process, network parameters such as \({b}_{o} \, \, and \, \, {w}_{o}\) is used to improve the overall prediction accuracy. According to this process, \({v}_{k}\) status \(n \, \, and \, \, \beta\) is predicted based on the probability distribution of \({C}_{i}\) of input \({x}_{i}\). Then, the output is defined for the node \({v}_{k}\) is computed is defined as \(\widehat{{y}_{v}}=ar \, \, gmax {f}_{\phi }(v)\). The entire \({v}_{k}\) behavior is gathered and analyzed according to the threshold analysis \((\vartheta )\) and \({\beta }_{ratio}\) is computed as \({\beta }_{ratio}=\frac{\sum_{v\in V}{\mathbb{I}}\left(\widehat{{y}_{v}}=\beta \right)}{\left|V\right|}\). The indicator function \({\mathbb{I}}\left(\widehat{{y}_{v}}=\beta \right)\) in \({\beta }_{ratio}\) is true, it has a value of 1, and the analysis uses the total number of \({v}_{k}\) in \({G}_{k}\). Suppose the \({\beta }_{ratio}\) value exceeds \(\vartheta\) value, and then the system shows a flag that \({G}_{k}\) affected by DDoS attacks. Then, the condition is defined as \({\beta }_{dec}=\left\{\begin{array}{c}1 \quad \quad if \,\, {\beta }_{ratio}>\vartheta \\ 0 \quad \quad otherwise\end{array}\right.\). The discussed framework analyzes the high-density area to identify \(\beta\) in \({G}_{int}\) because the attack patterns are indicative. Therefore, the intricate environment is explored with the help of attention concept \({A}_{ij}\) of embedding nodes \({h}_{v}\) which predicts the relationship of \(({v}_{i} \, \, and \, \, {v}_{j})\). From the relationship, \(\beta\) decision is taken by computing the anomaly score \({a}_{s};{a}_{s}=\frac{1}{m}\sum_{i=1}^{m}\left|\frac{{x}_{i}-{\mu }_{i}}{{\sigma }_{i}}\right|\). Then, the overall decision-making of \(\beta\) is carried out according to Eq. (10).
$$D\left(x\right)=\left\{\begin{array}{c}\beta \quad \quad if \, \, {a}_{s}>{\vartheta }_{attack} \, \, or \, \, {p}_{m}\left(x\right)\ge {\vartheta }_{modal} \\ n \quad \quad otherwise\end{array}\right.$$
(10)
$$D\left(x\right)=\left\{\begin{array}{c} \beta \quad \quad \, \, if \left(\frac{1}{m}\sum_{i=1}^{m}\left|\frac{{x}_{i}-{\mu }_{i}}{{\sigma }_{i}}\right|>{\vartheta }_{attack}\right) \bigvee \left(\sigma ({w}_{o}.{h}_{g}\left(x\right)+{b}_{o})\ge {\vartheta }_{modal}\right) \\ n \quad \quad \, \, otherwise\end{array}\right.$$
(11)
Equations (10) and (11) decide the DDoS attacks. If \({a}_{s}\) value greater than \({\vartheta }_{attack}\) then traffic is alerted for DDoS attacks. Suppose the GNN approach probability value is greater than \({\vartheta }_{model}\) then traffic is affected by the DDoS attack, and both conditions are satisfied, then the network is infected by the DDoS attack. The efficiency of \({\delta }_{rate}\) is evaluated for three conditions such as \({a}_{s}, {p}_{m} and {C}_{c}\) and the obtained results for various traffic rates in the IoT environment are shown in Fig. 6.

\({\delta }_{rate}\) Analysis of different decision-making conditions.
Figure 6 shows that \({\delta }_{rate}\) analysis of the GraphFedAI framework using different parameters, such as \({a}_{s}, {p}_{m} \,\, and \,\, {c}_{c}\). \({a}_{s}\) uses various features such as \(co{n}_{dura}, packe{t}_{size}, time and packe{t}_{traffic}\) to explore the DDoS attacks in the IoT environment. \({p}_{m}\) uses \({G}_{k}=\left({V}_{k},{E}_{k},{X}_{k}\right)\) that identifies the relationship between \(f,\) which helps to classify \(n \,\, and \,\, \beta\) effectively under different conditions. The introduced framework combines these conditions \({c}_{c}\) to identify the DDoS with minimum false negative and positive rates. The effective utilization of these factors improves \({\delta }_{rate}\) up to 98.7% for \(co{n}_{dura}\), 98.4% for \(time\), 97.9% of \(packe{t}_{size}\) and 98% for \(packe{t}_{traffic}\) features. The effective utilization of the FL on the GNN training process improves the overall system scalability, robustness, and DDoS detection accuracy. Therefore, the extracted features from the pre-processed data improve the overall \({\delta }_{rate}\) and ensures the system’s robustness and scalability in IoT. Algorithm 1 shows the pseudocode of GraphFedAI for DDoS Detection in IoT Networks.

GraphFedAI for DDoS detection in IoT networks.
The GraphFedAI framework builds the graph by extracting communication flows between IoT devices from the CIC-IoT-2023 dataset. To guarantee that each device is consistently represented in each session, we use a mix of source/destination IP addresses, MAC addresses, and protocol types to identify each device as a node. Suppose two nodes create a communication flow within a sliding time range of 60 s, whether TCP, UDP, or ICMP; an edge is formed between them. Three aggregated flow-level metrics are used to assign edge weights, which define the link between nodes: (1) the number of packets exchanged between the two devices during the interval determines the packet frequency; (2) the timestamp difference between the first and last packet calculates the communication duration; and (3) the average payload size is derived by averaging the total bytes transferred.
Temporal window construction and interpolation
To address missing or irregular values in the time-series input features, a linear interpolation technique is applied across each 14-day sliding window. Let \({x}_{t}\) denote a missing feature value at time t. The interpolated value is computed as:
$${x}_{t}={x}_{t1 }+\frac{\left({x}_{t2}-{x}_{t1} \right)}{\left({x}_{t2}-{x}_{t1} \right)} \bullet \left(t-t1\right)$$
(12)
In Eq. 12, \({x}_{t1}\) and \({x}_{t2}\) are the nearest known values before and after the missing timestamp \(t\). This interpolation ensures temporal continuity in both node attributes and graph edge weights, which is essential for reliable learning of behavior sequences. Without this smoothing step, discontinuities in the data introduce abrupt changes in the input graph structure, weakening the performance of LSTM and attention layers during training.
