Building on the identified limitations in Sect. “Related work”, our methodology presents a holistic framework for botnet detection across three datasets (BOT-IOT, CICIOT2023, IOT23), combining novel preprocessing, feature selection, and ensemble modeling. The proposed methodology presents a holistic approach for botnet detection applied to these datasets, where 38 features was selected out of 46 features in BOT-IOT. In the same way 42 out of 47 in CICIOT2023 and 25 out of 25 in IOT23 dataset. The framework starts with detailed data preprocessing and a quality enhancement process. The first phase of processing involves filling the missing values, eliminating duplicate entries, and identifying outliers using the IQR method. A new approach of skewness reduction applies several transformation techniques such as log transformation, square root transformation, Yeo-Johnson transformation, and quantile transformation with both uniform and normal distribution. This comparative analysis helps in choosing the right transformation that preserves the essential features of the attack while dealing with the skewed data.
Feature selection uses several statistical methods. The process starts with the correlation matrix to assess the correlations between features and to identify the correlated features. Statistical validation includes the use of Chi-square statistics and p-value to test for the significance of the features, and this is followed by a detailed feature dependency analysis across label classes. Advanced feature analysis techniques include distribution analysis across attack types and proportional analysis techniques, capping with the elimination of non-influential features.
The model optimization framework incorporates the Random Forest and Logistic Regression models with threshold-based decision-making and multiple validation techniques to solve the problems of underfitting, where some threats are missed, and overfitting, where false alarms are generated. It is noticed that the cross-validation shows certain characteristics of the dataset; BOT-IOT and CICIOT2023 have clear classification structures while IOT23 is more challenging to classify. This approach helps to provide the best possible detection results and also provides useful information about the problems of class imbalance and behavior of the models in different attack conditions.
SMOTE implementation addresses the class balance optimization in the inherent imbalance security datasets. This process includes very detailed class distribution analysis and minority class enhancement with the balance being checked. A comparison between SMOTE and PCA transformation methods is also done to determine the effect of these methods on performance as well as the computational costs necessary for each data set in order to choose the best pre-processing method.
The evaluation framework includes various performance metrics, such as the conventional metrics (accuracy, precision, recall, F1-score), regression metrics (MSE, RMSE, MAE), and computational efficiency metrics. Cross-dataset validation checks the model’s performance on all three datasets, providing insights into the framework’s effectiveness in varied security setups. This systematic approach ensures robust botnet detection while maintaining computational efficiency and practical applicability.
The methodology emphasizes adaptability and robustness, particularly in handling the varying complexities presented by different datasets. This comprehensive approach enables effective botnet detection while addressing the challenges of real-world deployment scenarios. Figure 1 illustrates a block diagram of the proposed technique.

Block diagram of the proposed IoT botnet detection framework.
Datasets
BOT-IOT
The botnet attacks are analyzed using the benchmark datasets of Bot-IoT. This dataset assists in understanding the different attack features and patterns that enhance cybersecurity measures. The Bot-IoT dataset11 was developed by the Cyber Range Lab of UNSW Canberra within real network circumstances. The dataset comprises both botnet and normal traffic for the network environment, and it includes a total of 72 million records. After 5% records with full features set, it has now decreased from 27 million records to 3668522 records. The 5% Subset of Bot-IoT includes the maximum amount of features among all processed sets or subsets, with 43 independent features and 3 dependent features. The Argus network flow features and the further derived attributes are contained in the 43 independent features. This dataset includes five categories: Reconnaissance, DDoS, Theft, DoS, and normal. Table 1 and Fig. 2 display a summary of the dataset instances used in the current research.

Flow diagram of the BOT-IOT dataset summary.
The Bot-IoT dataset originated at the Cyber Range Lab based at UNSW Canberra11 by collecting realistic and synthetic IoT network traffic besides many different kinds of attacks. In order to do this, a practical test environment was created to gather comprehensive network data, including various types of botnet anomalies such as Denial of Service, Distributed Denial of Service, Information Gathering, and Information Theft. Data Exfiltration (DE), Denial of Service over HTTP (DH), Distributed Denial of Service over HTTP (DDH), Keylogging (KL), Operating System Fingerprinting (OSF), and Service Scanning (SS) were the specific sorts of anomalies within the subcategory types. The majority of data collected from networks comprises DoS-UDP (DU), DoS-TCP (DT), DDoS-UDP (DDU), and DDoS-TCP (DDT), while the rest consists of all other types of network data.
CICIOT2023
CICIOT2023 dataset, first published in 2023 by Neto et al.12, represents the latest advancement in IoT security data collection. This dataset was generated using an expansive IoT network topology, incorporating multiple real IoT devices functioning as both intruders and targets. CICIOT2023 Dataset: From the extensive CICIOT2023 dataset containing over 30 million instances, we utilized a subset of 234,745 records. The dataset encompasses 34 classes (33 attack types and one benign class) with 46 features, including flow_duration, header_length, protocol_type, and other network characteristics. This dataset represents modern IoT network scenarios with various attack types show in Table 2. Figure 3 shows a flow diagram in which the categories’ labels and counts are presented.

Flow diagram of the CICIOT2023 dataset summary.
IOT23
IOT23 dataset represents an extensive network data collection from 23 distinct scenarios involving Windows and Linux operating systems in diverse IoT-related attack environments. The data collection period ranged from one hour to 112 hours, depending on pcap file size growth. The complete dataset includes several large scenarios, with the 16th scenario containing over 73 million records, the 2nd scenario containing 67 million records, and the 10th and 12th scenarios each containing 54 million records13. We specifically utilized 48,003 records from the IOT-23 Dataset. This dataset provides real-world IoT network traffic patterns and attack scenarios, offering diverse attack-to-normal data ratios that reflect actual network conditions. Various attack types and counts are shown in Table 3. Also, the flow diagram in Fig. 4. This can be shown for further analysis.

Flow diagram of the IOT23 dataset summary.
Data preprocessing and quality enhancement
This section addresses preprocessing challenges across all three datasets through a systematic approach. First, we identify and handle missing values and duplicate records in each dataset, with particular attention to the varying data formats. In BOT-IOT, the attributes flgs, proto, and state were removed as they contained redundant data with categorical integer equivalents64. Additionally, daddr, pkSeqID, and saddr were deemed invalid and excluded for limiting generalization and causing overfitting. pkSeqID were deemed invalid as they serve only as row identifiers in Koroniotis et al.65. Identifying the numerical and categorical features helps us to treat each feature according to the need of the preprocessing. As we can see the distribution in numerical values are high and there for to know the skewness and outlier’s, different statistical methods are used to show it numerical and in plots. Skewness handling involves a comparative framework utilizing various transformation methods to address data imbalances. These methods include IQR-based outlier detection and removal shown in Fig. 5(b,c), log transformation, square root transformation, Yeo-Johnson transformation, and quantile transformation (both uniform and normal variants) shown in Fig. 5(a). Each approach is applied to reduce skewness without losing the important data shown in Table 4, ensuring the data distribution is more suitable for analysis and modeling. Through extensive testing, Quantile Uniform transformation emerged as optimal, providing effective skewness reduction while preserving critical attack signatures across all datasets. This method proved particularly effective for handling the complex distributions in IOT23 and the large-scale attack patterns in CICIOT2023.

Skewness analysis and correction for dataset preprocessing.
Advantages of quantile uniform transformation
The Quantile Uniform (QU) transformation outperforms alternatives (Yeo-Johnson, Log, Square Root) in preserving discriminative attack features, as evidenced by its consistent accuracy and F1-score gains across models (Table 5). For instance, with Random Forest, QU achieves 98% F1-score–a 0.72% improvement over Yeo-Johnson (97.28%)–demonstrating superior robustness to skewed attack distributions. Unlike Yeo-Johnson, which assumes parametric scaling, QU’s non-parametric approach avoids distorting critical feature tails (e.g., rare attack payloads), thereby maintaining higher precision/recall balance. In this way Table 5 shows marginal absolute gains (e.g., +0.15% accuracy for Logistic Regression), these differences are statistically significant (\(p < 0.05\), paired t-test) and practically impactful in IoT security. For example, QU’s 98% F1-score with Random Forest reduces false negatives by 1.3% compared to Yeo-Johnson, critical for detecting stealthy botnet attacks.
Feature selection and statistical analysis
The feature selection methodology employs a multi-layered statistical approach across all three datasets. The process begins with correlation analysis to identify highly correlated features shown in Fig. 6, followed by Chi-square statistics and p-value validation to assess feature-label relationships. The values of Chi-square statistics and p-value are shown in Table 6 for only the Telnet feature, which statistically shows no relationship with the label class. The methodology consists of Correlation Analysis Feature interdependency is evaluated using the Pearson correlation coefficient shown in Eq. (1):
$$\begin{aligned} r = \frac{\sum (x-\mu _x)(y-\mu _y)}{\sqrt{\sum (x-\mu _x)^2 \cdot \sum (y-\mu _y)^2}} \end{aligned}$$
(1)
where x, y are features, \(\mu _x\), \(\mu _y\) are their respective means. Then, a follow-up formula for Chi-square Statistics and P-value Feature-label relationship significance is assessed using Eqs. (2,3):
$$\begin{aligned} \chi ^2 = \sum \left[ \frac{(O – E)^2}{E}\right] \end{aligned}$$
(2)
where O represents observed frequencies and E represents expected frequencies. Then, the p-value is calculated through this formula:
$$\begin{aligned} p = 1 – F(\chi ^2,\text {df}) \end{aligned}$$
(3)
where F is the cumulative distribution function of \(\chi ^2\) with df (degrees of freedom) = \((r-1)(c-1)\), r = number of rows, c = number of columns in the contingency table.

Correlation matrix of features.
Now, in the research for further assessment and to validate the previous equations, advanced feature distribution analysis and proportional analysis techniques are adopted, and the result is shown in Fig. 7 (a,b). This validation focuses on feature dependency across label classes, which is examined through Eq. (4):
$$\begin{aligned} D(f\mid c) = P(f\mid c)\log \left( \frac{P(f\mid c)}{P(f)}\right) \end{aligned}$$
(4)
where f represents features, c represents classes, and P denotes probability. Proportional Analysis Feature importance is validated using Eq. (5):
$$\begin{aligned} \text {PI}(f) = \sum [w(c) \cdot D(f\mid c)] \end{aligned}$$
(5)
where w(c) is the class weight and \(D(f\mid c)\) is the feature dependency score.

Visualization of attacks-dependent feature in dataset.
This comprehensive approach ensures robust feature selection while maintaining dataset-specific characteristics. Features showing minimal impact across these analyses are removed from the respective datasets, ensuring optimal model performance while reducing computational overhead.
Rationale for statistical methods in feature selection
The multi-layered feature selection process employs correlation analysis, Chi-square statistics with p-value validation, and distribution/proportional analysis to select discriminative features for IoT botnet detection. These methods were chosen for their complementary strengths and suitability for the diverse BOT-IOT, CICIOT2023, and IOT23 datasets:
Correlation Analysis (Eq. 1): Using the Pearson correlation coefficient, this method identifies highly correlated features \((|r| > 0.8)\) for removal, reducing redundancy in high-dimensional datasets like CICIOT2023. Unlike mutual information, it is computationally efficient for continuous features (e.g., packet size).
Chi-square Statistics with p-value Validation (Eqs. 2-3): Chi-square tests evaluate feature-label relationships, prioritizing features with significant p-values (\(p < 0.05\)) that distinguish attacks (e.g., DDoS in BOT-IOT). Compared to ANOVA, Chi-square is better suited for categorical or discretized IoT features (e.g., protocol types), as shown for Telnet (Table 6), ensuring robust selection in imbalanced datasets like IOT23.
Distribution/Proportional Analysis (Eqs. 4-5): This method captures attack-specific patterns (e.g., irregular inter-arrival times in datasets) by analyzing feature distributions and class-specific importance (Fig. 7). Unlike wrapper methods, it is computationally lightweight and generalizable, enhancing feature relevance across datasets.
As we pass these features through this extensive process so that we validate the features and remove that feature which has no efficiency in training our model but only increase time and degrade performance. Focus on enhancing the detection rate and decreasing the detection time66. Therefore, together these methods reduced feature dimensionality by 10.64% in CICIOT2023, lowering the models training time and achieving best performance. Their simplicity and robustness outperformed compute-heavy alternatives, optimizing model performance across diverse IoT scenarios.
Model fitting and cross-validation
The framework implements a systematic approach to model fitting and validation across all three datasets, focusing on achieving optimal performance while preventing both underfitting (missed threats) and overfitting (false alarms). In Model Fitting Analysis We employ Random Forest and Logistic Regression models shown in Eqs. (6,7) with threshold-based optimization:
$$\begin{aligned} RF_{score} = \frac{\sum \left( w_i \times T_i\right) }{N} \end{aligned}$$
(6)
where \(w_i\) is the weight of tree i, \(T_i\) is the tree prediction, and N is the number of trees.
$$\begin{aligned} P(y|X) = \frac{1}{1 + e^{-\theta ^T X}} \end{aligned}$$
(7)
where \(\theta\) represents model parameters and X represents input features.
From this Analysis, we obtained that both Models trained on BOT-IOT and CICIOT2023 showed the best validation curve shown in Fig. 8 (a,b), while IOT232 showed an underfitting in the validation curve in Fig. 8(c).

Model fitting curves across datasets.
On the other hand, the Cross-Validation Implementation framework shown in Eq. (8) utilizes k-fold cross-validation with performance metrics.
$$\begin{aligned} CV_{score} = \frac{1}{k} \sum (Performance_i) \end{aligned}$$
(8)
where k is the number of folds and \(Performance_i\) is the score for fold i.
The BOT-IOT dataset results show outstanding performance from both models, with Random Forest achieving near-perfect 99.98% accuracy. The high F1 Macro scores (99.43% for Random Forest) being close to F1 Weighted scores indicates excellent detection across all attack types, with consistent performance shown by minimal standard deviations. Some gaps by Logistic regression show an imbalance in class distribution in Table 7.
The Random Forest model provides better results than the Logistic Regression model for all the metrics used and has lower standard deviations to show that the results are more constant. The huge gap in the F1 Macro and F1 Weighted scores in Table 8 indicates that there is a class imbalance problem in the dataset.
In the case of the IOT23 dataset, it was observed that Logistic Regression produced better results than Random Forest with an accuracy of 84.36% as compared to 68.79% by Random Forest. However, both models show relatively low F1 Macro scores (48.36% and 51.52% respectively), which indicates that there are still many challenges in dealing with class imbalance issues. The difference between F1 Macro and F1 Weighted metrics shows that both models have difficulty with minority attack classes in this dataset as presented in Table 9.
Our analysis highlights unique characteristics across datasets, offering valuable insights into their distinct behaviors. In the BOT-IOT dataset, classification patterns are consistent with minimal variance, showcasing stable performance. The CICIOT2023 dataset exhibits robust outcomes, characterized by well-defined decision boundaries. However, the IOT23 dataset reveals more intricate patterns, indicating the need for additional validation due to its complexity.
To assess the performance of the model, the proposed framework incorporates several important evaluations. These include identifying underfitted models by exploring training errors, recognizing overfitting problems by comparing validation errors with training errors, and doing class-wise performance analysis to deal with imbalances. The analysis of the models shows that there are distinct differences in the behavior of the models. The simple attack patterns are properly identified in the BOT-IOT and CICIOT2023 datasets, while the IOT23 dataset is quite challenging and requires some improvement. Also, the problem of class imbalance influences the performance of all the datasets, with the IOT23 dataset being most affected.
Mitigation of overfitting and underfitting
Our framework incorporates the following techniques to balance model complexity and generalization:
-
1.
Deep Learning Models (CNN/BiLSTM):
-
Dropout Layers: Added after each convolutional/LSTM layer (rate = 0.2) to prevent co-adaptation of neurons (Sect. “Bidirectional LSTM (BiLSTM)”–”Convolutional Neural Network (CNN)”).
-
Batch Normalization: Stabilized training by normalizing layer inputs, reducing internal covariate shift (Eq. (14), Sect. “Bidirectional LSTM (BiLSTM)”).
-
Early Stopping: Training halted if validation loss plateaued for 10 epochs (monitored during cross-validation).
-
2.
Traditional Machine Learning Models (RF/LR):
-
Elastic Net penalty \((\lambda =0.8, \alpha =0.2)\) combining L1 (sparsity) and L2 (smoothness) terms (Eq. 19, Sect. “Logistic Regression (LR)”).
-
3.
Cross-Validation and Threshold Optimization:
-
4.
Class Balancing:
Class balancing and performance optimization
To solve the problem of class imbalance in all three datasets, the following framework has been put in place for all three datasets namely, the SMOTE (Synthetic Minority Over-sampling Technique) with memory-efficient batch processing. Building on the approach from previous research 62,67, the SMOTE implementation follows the mathematical foundation in Eq. (9):
$$\begin{aligned} X_{new} = X + \text {rand}(0,1) \cdot (X_{neighbor} – X) \end{aligned}$$
(9)
where \(X_{new}\) is the new instance to be synthesized, X is the real instance, and \(X_{neighbor}\) is any of the k nearest neighbors to X.
The process is put in place to meet the particular needs that are raised by each dataset. The BOT-IOT dataset aims at addressing the problem of dealing with severely unbalanced attack categories, and for CICIOT2023, it also addresses the problem of dealing with imbalanced attacks. The IOT23 dataset needs to be handled with specificity in terms of its class balances. Performance optimization is done by a number of techniques, such as managing memory, using batch processing, proper data concatenation, and monitoring runtime performance to ensure the efficiency and scalability of the system. This is a method that is used both for the full dataset, which is referred to as the balance dataset, and for the PCA transformed dataset, which is referred to as the PCA balance dataset.
Model evaluation and performance metrics
Bidirectional LSTM (BiLSTM)
The BiLSTM network is particularly implemented for identification of temporal attack patterns in the IoT network traffic. The architecture processes traffic sequences in both forward and backward directions, which are necessary for recognizing sophisticated attack signatures. The LSTM unit consists of the following six components: The input which is “Xt”, the cell state which is “Ct”, the hidden layer state which is “ht”, the forget gate which is “ft”, the memory gate which is “it” and the output gate which is “ot”. Each gate serves a specific purpose:
$$\begin{aligned} f_t&= \sigma (W_f \cdot [h_{t-1}, x_t] + b_f) \end{aligned}$$
(10)
$$\begin{aligned} i_t&= \sigma (W_i \cdot [h_{t-1}, x_t] + b_i) \end{aligned}$$
(11)
$$\begin{aligned} o_t&= \sigma (W_o \cdot [h_{t-1}, x_t] + b_o) \end{aligned}$$
(12)
$$\begin{aligned} C_t&= f_t \odot C_{t-1} + i_t \odot \tanh (W_c \cdot [h_{t-1}, x_t] + b_c) \end{aligned}$$
(13)
$$\begin{aligned} h_t&= o_t \odot \tanh (C_t) \end{aligned}$$
(14)
Building upon this foundation, our implementation incorporates a two-layer BiLSTM architecture for enhanced performance. The first BiLSTM layer with 128 units is capable of identifying the intricate attack patterns while the second layer with 64 units is responsible for identifying the higher level features. In order to avoid overfitting, the model uses dropout (rate = 0.2) and BatchNormalization layers. Last, there are dense layers with 128 neurons and 64 neurons with ReLU activation to categorize the attacks properly. This architecture integrates a sequential processing model and a set of robust classification layers, which makes it suitable for the intended application in IoT network security.
Convolutional Neural Network (CNN)
The following is a brief of the selected model, a 1D CNN architecture, which can extract temporal patterns and hierarchical features from the input sequence efficiently to capture both short and long-range dependencies without requiring explicit feature engineering68. The architecture is designed for identifying spatial patterns in the network traffic, and it incorporates three convolutional layers with 64, 128, and 256 filters, which are used to gradually increase the level of feature abstraction. Each convolutional layer applies filters across the input sequence using the operation defined in Eq. (15):
$$\begin{aligned} \text {Conv}(x) = \text {activation}\left( W * x + b\right) \end{aligned}$$
(15)
where * denotes the convolution operation. In order to improve the performance and stability of the model, each convolutional layer is packed with BatchNormalization layer to maintain stable training process and with MaxPooling1D layers to lower dimensionality. Furthermore, to prevent the model from overfitting, dropout layers with a dropout rate of 0.2 are added after each convolutional layer to regularize the model. All the layers such as convolutional, normalization, pooling, and regularization are combined in this architecture to make it effective in extracting features that are important for the accurate prediction of the problem at hand, namely network traffic analysis.
Random Forest Classifier (RF)
The Random Forest Classifier works on the principle of building multiple decision trees and combining their predictions through ensemble decision-making which makes it a very useful classifier in the identification of IoT botnet. For an input feature vector X, the classifier implements the majority voting mechanism as outlined in Eq. (16):
$$\begin{aligned} \text {RF}(X) = \text {mode}\{{\text {tree}_i(X)}\}, \quad \text {for } i \text { in } 1..n_{\text {trees}} \end{aligned}$$
(16)
In order to refine the predictions even more, the following implementation computes a weighted tree score as defined in Eq. (17):
$$\begin{aligned} \text {Tree\_Score} = \frac{\sum (w_i \cdot \text {prediction}_i)}{n_{\text {trees}}} \end{aligned}$$
(17)
where wi represents individual tree weights.
Our implementation uses 1000 estimators with bootstrapping, which makes the model capable of dealing with imbalanced attack categories and supports stable ensemble learning. All the trees in the ensemble are tuned to emphasize feature selection for the detection of attacks, using the most significant features that define the attack signature. The final predictions are averaged with the majority voting in order to provide accurate and consistent attack categorization. Thus, this particular approach makes sure that the Random Forest Classifier is particularly good at dealing with the complex IoT botnet attacks.
Logistic Regression (LR)
Logistic Regression is used for multi-class attack classification, and with the help of elastic net regularization, both L1 and L2 penalties are used for feature selection and coefficient regularization. The model computes class probabilities by applying the softmax function as defined in Eq. (18):
$$\begin{aligned} P(y|X) = \text {softmax}(WX + b) \end{aligned}$$
(18)
and optimizes the loss function in Eq. (19):
$$\begin{aligned} L(\theta ) = -\sum y_{\text {true}} \log (P(y \mid X)) + \lambda [(1-\alpha )\Vert \theta \Vert _1 + \alpha \Vert \theta \Vert _2] \end{aligned}$$
(19)
where \(\lambda\) is the regularization strength (C=0.8) and \(\alpha\) is the L1 ratio (\(\alpha\)=0.2). Our implementation uses the LogisticRegression classifier, which is suitable for large data sets, and the qn solver to ensure fast convergence, particularly when dealing with a large number of features. The elastic net penalty achieves both sparsity and good generalization capabilities, while the multinomial setting allows for more accurate classification of the different attack types. Other parameters like max_iter = 5000 ensure conjunction during training. This tailored approach makes sure that the problem of imbalanced attack classes is adequately dealt with by bootstrapping, and thus, the model is appropriate for the task of securing IoT networks.
Ensemble model architecture
Our hybrid ensemble framework combines predictions from deep learning (CNN, BiLSTM) and traditional machine learning (Random Forest, Logistic Regression) models using a weighted voting mechanism. The architecture operates as follows:
-
1.
Base Model Training:
-
Each model (CNN, BiLSTM, RF, LR) is trained independently on the preprocessed, balanced datasets.
-
2.
Prediction Aggregation:
-
For a given input sample, predictions from all four models are aggregated using dynamically weighted voting shown in Eq. (20):
$$\begin{aligned} \textbf{Ensemble}_\textbf{Score} = \sum _{\textbf{i}=\textbf{1}}^{\textbf{4}} \textbf{w}_\textbf{i} \cdot \textbf{P}_\textbf{i} \end{aligned}$$
(20)
Where \(\textbf{w}_\textbf{i}\) = validation F1-score of model \(\textbf{i}\), and \(\textbf{P}_\textbf{i}\) = predicted probability vector from model \(\textbf{i}\).
-
3.
Decision Thresholding:
-
4.
Model Contribution Analysis:
-
Deep learning models (CNN/BiLSTM) prioritize detecting novel attack patterns via temporal/spatial feature extraction.
-
Traditional models (RF/LR) provide stability against overfitting and rapid inference.

Weighted ensemble classification.
Hyperparameter tuning
Hyperparameters for the CNN, BiLSTM, Random Forest (RF), and Logistic Regression (LR) models, integrated into the ensemble framework (Sect. “Ensemble model architecture”), were optimized using grid search with 5-fold cross-validation across BOT-IOT, CICIOT2023, and IOT23 datasets to maximize accuracy and generalization. Key hyperparameters and their selection rationale are:
-
CNN: Dropout rates (0.2 for convolutional layers, 0.3–0.5 for dense layers) were tuned from 0.1, 0.2, 0.3, 0.4, 0.5 to prevent overfitting while preserving spatial feature extraction, critical for the ensemble’s 100% accuracy on BOT-IOT (Table 10). Filter sizes (64, 128, 256) and kernel size (3) were selected from 32, 64, 128, 256 and 3, 5 to optimize attack pattern detection.
-
BiLSTM: Dropout rates (0.2 for LSTM layers, 0.2–0.3 for dense layers) were chosen from 0.1, 0.2, 0.3 to mitigate overfitting on temporal sequences, achieving 99% recall for CICIOT2023 (Table 12). LSTM units (128, 64) were tuned from 64, 128, 256 for efficiency and performance.
-
Random Forest: Number of trees (1000) was selected from 100, 500, 1000, with max_depth=15 and min_samples_leaf=5 from 10, 15, 20 and 1, 5, 10, balancing stability and efficiency (training time: 68.32s) for 100% accuracy on BOT-IOT (Table 11).
-
Logistic Regression: Elastic net penalty (l1_ratio=0.2, C=0.8) was tuned from l1_ratio=0, 0.2, 0.5, 0.8, 1 and C=0.1, 0.5, 0.8, 1 to optimize for high-dimensional CICIOT2023 data (99.15% accuracy, Table 13). Max_iter=5000 ensured convergence.
The tuned hyperparameters minimized validation loss (e.g., MSE: 0.0003 for CICIOT2023, Table 22) and supported the ensemble’s weighted voting mechanism (Sect. “Ensemble model architecture”), achieving superior generalization (e.g., +1.5% accuracy on IOT23, Table 22). Alternative values (e.g., higher dropout or fewer trees) reduced performance, particularly for complex attacks.
Performance evaluation framework and metrics
Our research work applies a systematic evaluation framework for the evaluation of intrusion detection performance on the basis of the BOT-IOT, CICIOT2023, and IOT23 datasets. The framework consists of several performance metrics, including classification and regression, to provide a more comprehensive assessment of detection capability where emphasis is on the varying attack patterns of each of the datasets.
For the classification performance assessment, the following set of metrics is used to generate detailed information regarding the effectiveness of the detection system. It is important to note that the most widely used performance measure for the overall detection capability is the accuracy (TP + TN)/(TP + TN + FP + FN), while the precision, defined as TP/(TP + FP), measures the models ability to avoid false alarms which is outmost importance in real life IoT security applications. Model performance primarily evaluated through detection accuracy while precision serves as the secondary evaluation metric69. Note that the Recall TP/(TP + FN) is a metric which defines a system’s ability to capture all the actual attacks in the system and is of paramount importance especially when dealing with sophisticated breach attempts. The F1-score which is computed as 2 * (Precision × Recall)/(Precision + Recall) offers a fair evaluation of the model’s performance and is very important in handling class imbalance problem which is evident in our datasets.
The regression metrics framework allows for further understanding of the quantitative characteristics of our detection system. Mean Squared Error (MSE = \(\frac{1}{n} \sum (y_{\text {true}} – y_{\text {pred}})^2\)) is a regression measure of how well the predictions are, with higher sensitivity to large errors, which makes it useful in identifying severe attacks. Root Mean Squared Error (RMSE = \(\sqrt{\text {MSE}}\)) provides error measurements in the original scale of our features which are easy to interpret, while Mean Absolute Error (MAE = \(\frac{1}{n} \sum |y_{\text {true}} – y_{\text {pred}}|\)) directly measures the differences between the predicted and true values. These metrics are very useful for assessing the performance of the model for various attacks and for different network.
The evaluation framework also incorporates computational performance assessment, measuring training duration, inference time, and resource utilization. These metrics are important in determining the effectiveness of our detection system in terms of its practical deployability in various IOT settings. Training time measurements give an understanding of the needs of the model optimization speed, while evaluation checks the inference capability of the system in providing real-time threat detection. The resource utilization metrics enable the assessment of the system’s efficiency and flexibility for implementation in various deployment conditions.
