Advancements in cyberthreat intelligence through resource exhaustion attack detection using hybrid deep learning with heuristic search algorithms

Machine Learning


This paper proposes a novel CREA-HDLMOA methodology. Its primary goal is to advance an effective method for DDoS attack detection using advanced optimization algorithms. The methodology involves various processes, such as data normalization, dimensionality reduction, hybrid classification, and parameter selection. Figure 2 represents the workflow of the CREA-HDLMOA model.

Fig. 2
figure 2

Workflow of the CREA-HDLMOA method.

Stage I: data normalization

Initially, the data normalization stage employs LSN for converting input data into a beneficial format. This model is chosen for its simplicity, computational efficiency, and ability to scale input features into a uniform range, typically [0,1], which is crucial for accelerating model convergence during training. LSN ensures no feature dominates others due to scale differences, preserving the relative relationships among data points. The model also exhibits efficiency for time-series and DL techniques and does not distort feature distributions. The method also assists in reducing gradient vanishing issues in deep networks by maintaining numerical stability. Compared to more complex normalization techniques, LSN presents minimal overhead, making it ideal for real-time and resource-constrained environments. This makes LSN a perfect choice for pre-processing in DDoS detection tasks.

LSN is an essential pre-processing stage in DDoS attack detection, guaranteeing that networking traffic features are converted to an ordinary scale, usually inside [0,1] or [– 1,1]. This normalization avoids features with large numeric intervals to control those with small choices, resulting in more symmetrical and precise ML methods. During the detection of DDoS, whereas real-time study is essential, normalization improves the efficacy of classifiers by guaranteeing that each feature contributes similarly to the decision-making procedure. It reduces the influence of noise and outliers, making anomaly-based detection models more predictable. Eventually, linear scaling standardization is essential in improving feature representation and enhancing the complete detection precision of DDoS attacks.

Stage II: dimensionality reduction process

Besides, the ROA implements the FS process for selecting the most relevant features from the data31. This model is chosen for its robust global search capability and efficiency in handling high-dimensional feature spaces. The technique replicates the natural rime growth process, enabling it to explore and exploit feature subsets efficiently to detect the most relevant attributes for DDoS detection. This method maintains the semantic integrity of features and does not need data transformation, which is significant for interpretability. The model also shows excellence in dynamically adjusting the selection process based on fitness values, ensuring optimal feature subsets tailored for the classification task. This results in improved model accuracy, faster training, and reduced computational complexity.

RIME is the optimizer model, miming the natural frost formation procedure. Frost formation mainly takes place in two kinds: hard and soft frost. This model utilizes a forward greedy method to repeatedly hunt for the optimum solution, attaining a global optimizer.

During RIME, all frost bodies are considered the individual searching particles inside the model, and the complete frost body population is deliberated as the model’s population. The complete frost body population RRR is initially set to determine the primary mathematical representation, as provided in Eq. (1).

$$\:R=\left[\begin{array}{llll}{x}_{11}&\:{x}_{12}&\:\dots\:&\:{x}_{1j}\\\:{x}_{21}&\:{x}_{22}&\:\dots\:&\:{x}_{2j}\\\:\vdots&\:\vdots&\:\ddots\:&\:\vdots\\\:{x}_{i1}&\:{x}_{i2}&\:\dots\:&\:{x}_{ij}\end{array}\right]$$

(1)

Here, \(\:R\) signifies the original frost body population, and \(\:{x}_{ij}\) characterizes the \(\:jth\) frost particle within the frost crystal \(\:i\). The fitness function \(\:F\left(S\right)\) for the frost body agent is given in Eq. (2).

$$\:F\left({S}_{i}\right)=\left[\begin{array}{c}f\left(\left[{x}_{11}\:{x}_{12}\dots\:{x}_{1j}\right]\right)\\\:f\left(\left[{x}_{21}\:{x}_{22}\dots\:{x}_{2j}\right]\right)\\\:\vdots\:\:\:\vdots\:\:\:\:\vdots\\\:f\left(\left[{x}_{i1}\:{x}_{i2}\dots\:{x}_{ij}\right]\right)\end{array}\right]$$

(2)

Here, \(\:f\) signifies the frost particle fitness.

After all frost particles are compressed into soft frost, they transfer based on the particular design, and ecological features influence their efficacy. When the particle surpasses the escape radius, compression cannot take place.

$$\:{R}_{i,j}^{new}={R}_{best,j}+{r}_{1}\cdot\:\text{c}\text{o}\text{s}\theta\:\cdot\:\beta\:\cdot\:\left(h\cdot\:\left(U{b}_{ij}-L{b}_{ij}\right)+L{b}_{ij}\right),\:{r}_{2}

(3)

Here: \(\:{R}_{i,j}^{new}\) represents the upgrade location of the particle; \(\:{R}_{best},\) \(\:j\) signifies the \(\:j\:th\) particle of the optimal frost body within the frost population; \(\:{r}_{1}\) means randomly generated numbers inside the interval \(\:\left(-\text{1,1}\right);\theta\:\) specifies the particle movement direction that alterations in all iterations, as exposed in Eq. 6; \(\:\beta\:\) represents ecological aspect, which pretends modifications in the outside atmosphere in iterations to guarantee the model’s convergence, through the convergence equation specified in Eq. (5); \(\:h\) control the central distance among dual elements and is randomly generated numbers inside the interval \(\:\left(\text{0,1}\right);U{b}_{ij}\) and \(\:L{b}_{ij}\) characterize the lower and upper limits of the escaping interval, controlling the particle movement range; \(\:{r}_{2}\) means randomly generated number in the interval of \(\:\left(\text{0,1}\right)\) and, in addition to the attachment coefficient EEE, controls the upgrade of the particle position; \(\:E\) signifies attachment coefficient, which improves with the iteration counts, as stated in Eq. (6).

$$\:\theta\:=\pi\:\cdot\:\frac{t}{10\cdot\:T}$$

(4)

$$\:\beta\:=1-\frac{\left[\frac{Wt}{T}\right]}{W}$$

(5)

$$\:E=\sqrt{\left(\frac{t}{T}\right)}$$

(6)

Now, \(\:t\) represents the present iteration amount; \(\:T\) characterizes the maximal iteration counts; \(\:\left[\right]\) specifies rounding to the closest integer; \(\:w\) refers to segment counts that control the step function.

As the range of soft frost improves, it shows a stronger chance and wide-ranging coverage that enables the fast recognition of the best decomposition parameters. A hard frost penetration mechanism is presented to update the model among agents to stop them from being stuck in local ideals in the optimization procedure. This mechanism allows the particles to exchange through dissimilar local areas, thus enhancing model convergence and avoiding local bests, as exemplified in Eq. (7).

$$\:{R}_{ij}^{new}={R}_{best,j},\:{r}_{3}<{F}^{normr}$$

(7)

Here: \(\:{F}^{normr}\left(S\right)\) signifies the present standardized fitness value, which designates the likelihood of the \(\:i\:th\) ice particle having swapped; \(\:{r}_{3}\) means a randomly generated number in the interval \(\:\left(-\text{1,1}\right)\). The fitness function (FF) applied in the ROA is established to have a balance amongst the selected feature amounts in every solution (minimal) and the classification precision (maximal) gained by exploiting these chosen features. Equation (8) characterizes the FF to assess solutions.

$$\:Fitness=\alpha\:{\gamma\:}_{R}\left(D\right)+\beta\:\frac{\left|R\right|}{\left|C\right|}$$

(8)

Whereas \(\:{\gamma\:}_{R}\left(D\right)\) signifies a particular classifier’s classification error rate. \(\:\left|R\right|\:\)denote the cardinality of the chosen subset, and \(\:\left|C\right|\) stands for the total quantity of features in the data set, \(\:\alpha\:\) and \(\:\beta\:\) are dual parameters akin to the significance of subset length and classification quality. [1, 0] and \(\:\beta\:=1-\alpha\:.\).

Stage III: hybrid attack classification

In addition, the hybrid of the LSTM + BiGRU technique is deployed for the DDoS attack classification process32. This model was chosen because it captures long-range temporal dependencies and contextual information from past and future sequences. BiGRU improves performance by processing data in both forward and backwards directions, enhancing context awareness, and LSTM retains critical patterns over time. Unlike conventional RNNs or standalone models like LSTM or GRU, this hybrid model gives higher accuracy and robustness against sequential data irregularities. The model effectively detects subtle and growing patterns seen in DDoS traffic. Its relatively lower computational complexity than stacked deep models ensures efficiency without compromising accuracy. Figure 3 represents the infrastructure of LSTM + BiGRU.

Fig. 3
figure 3

LSTM + BiGRU architecture.

Initially presented, LSTM, a version of the RNN method, was applied to resolve the disappearance gradient and explosion difficulties challenged by RNN in long-range sequences. It is particularly adjusted to prevent longer‐range dependencies. This method incorporates the powers of BiGRU and LSTM models, making detecting and classifying sequential data especially efficient. LSTM outshines by taking longer-term dependencies inside the data, whereas Bi-GRU provides a more effective manner to take past or future context over its bidirectional structure. This integration enhances the model’s ability to understand composite sequences. This method may offer more precise classifications by incorporating the memory ability of the LSTM and the bidirectional feature of the BiGRU. The hybrid model also alleviates computational efficiency compared to utilizing distinct LSTM or GRU methods, but exploits their powers.

The LSTM attains the best results in longer-time sequences associated with the normal RNN model. The hidden layer (HL) of the unique RNN has a particular layer; therefore, it depends on shorter‐term input. The cell state at the previous time \(\:{C}_{t-1}\), the current input value \(\:{x}_{t}\), and the value of output at the previous time \(\:{h}_{t-1}\) are the three inputs of LSTM. The cell state \(\:{C}_{t}\) and the value of output at the recent time represent dual outputs of the LSTM. The forget gate selects which cell state of the previous time, \(\:{C}_{t-1}\), should be preserved. The LSTM describes the final value of output, \(\:{h}_{t}\). Initially, it assesses the value of the activation state \(\:{f}_{t}\) of the forget gate at the current time \(\:t:\)

$$\:{f}_{t}=\sigma\:({W}_{f}\otimes\:({X}_{t}{h}_{t-1})+{b}_{i})\#\left(\:\right)$$

(9)

Equation (9) \(\:\otimes\:\)represents dot multiplication, and \(\:\sigma\:\left(\bullet\:\right)\) specifies the function of sigmoid. Then, the candidate state values of the input gate \(\:{i}_{t}\), it and input cell \(\:{\stackrel{\sim}{C}}_{t}\) at \(\:t\) time are calculated:

$$\:{i}_{t}=\sigma\:\left({W}_{i}\otimes\:\left({X}_{t}{h}_{t-1}\right)+{b}_{i}\right)$$

(10)

$$\:{\stackrel{\sim}{C}}_{t}=\sigma\:({W}_{i}\otimes\:({X}_{t}{h}_{t-1})+{b}_{i}\ne\:\left(\:\right)$$

(11)

The cell layer’s updated value at the present instant \(\:t\) is achieved:

$$\:{C}_{t}={f}_{t}\otimes\:{C}_{t-1}+{i}_{t}\otimes\:{\stackrel{\sim}{C}}_{t}\ne\:\left(\:\right)$$

(12)

Finally, the recent predicted variable of the output gate according to the value of the updated cell layer at the current instant \(\:t\) is measured:

$$\:{O}_{t}=\sigma\:\left({W}_{0}\otimes\:\left({X}_{t}{h}_{t-1}\right)+{b}_{0}\right)$$

(13)

$$\:{h}_{t}={O}_{t}\otimes\:\text{t}\text{a}\text{n}\text{h}\left({C}_{t}\right)\#\left(\right)$$

(14)

A GRU is presented to simplify the LSTM method that effectively alleviates the vanishing gradient problems in the traditional RNN model. Nevertheless, the restrictions of the LSTM component, comprising challenging training and composite parameters, are displayed slowly, limiting the LSTM application. Based on the gating ideas, GRU reformed the architecture of the LSTM component, reducing the training difficulty and calculation time. Similar to the LSTM method, the GRU can successively reach its \(\:{h}_{t}\) HL at the instant \(\:t\) for the input sequences \(\:\left\{{x}_{1},{x}_{2},{x}_{3},{x}_{t},\dots\:{x}_{n}\right\}:\)

$$\:{r}_{t}=\sigma\:({W}_{r}{x}_{t}+{b}_{r}+{W}_{hr}{h}_{t-1}+{b}_{hr})\#\left(\right)$$

(15)

$$\:{z}_{t}=\sigma\:({W}_{z}{x}_{t}+{b}_{z}+{W}_{hz}{h}_{t-1}+{b}_{hz})\#\left(\:\right)$$

(16)

$$\:{n}_{t}=\text{t}\text{a}\text{n}\text{h}\left({W}_{n}{x}_{t}+{b}_{n}+{r}_{t}\otimes\:\left({W}_{hn}{h}_{t-1}+{b}_{hn}\right)\right)\#\left(\:\right)$$

(17)

$$\:{h}_{t}=\left(1-{z}_{t}\right)\otimes\:{n}_{t}+{z}_{t}\otimes\:{h}_{t-1}\ne\:\left(\:\right)$$

(18)

Here, \(\:b\) denotes the bias term, \(\:{h}_{t-1}\) specifies the HL state at the instant \(\:t-1,{r}_{t},{z}_{t}\) symbolize the gated state upgraded at the instant, and \(\:\sigma\:\left(\bullet\:\right)\) refers to the function of sigmoid.

Bi-GRU builds dual backwards GRUS, demonstrating time-series data backwards and forward. The result of the time step connects the results of either GRU.

Stage IV: parameter selection process

At last, the MPOA-based hyperparameter selection process is performed to optimize the classification results of the LSTM + BiGRU method33. This model is chosen for its enhanced exploration and exploitation capabilities, which address the challenges of local optima and premature convergence in conventional optimizers. This model adjusts the search process, ensuring optimal convergence speed and solution diversity. The method also presents adaptive and intelligent tuning, resulting in more efficient optimization with mitigated computational overhead. Its robustness and precision make it appropriate for fine-tuning DL methods in complex tasks like DDoS detection.

POA is a new meta-heuristic optimization approach inspired by nature to discover the best solution to the problem. It imitates the individual defensive mechanism of pufferfish. To prevent predators, these incredible animals expand themselves when at risk. The POA mimics this behaviour to avoid local bests and explore the search area more effectively. The pufferfish’s safeguarding behaviour is the primary inspiration for the growth of POA. Pufferfish are smaller in size and have only four teeth. To escape from predators, the pufferfish fill their elastic stomachs with water. After filling massive amounts of water, they turn out spherical ball-shaped fish, and their directed spines become evident. Therefore, the predators cannot touch them. Pufferfish cannot swim rapidly. As a result, this defensive mechanism is vital for their everyday life. Predators are unable to attack these fish after they are spherical. It contains dual stages: the exploitation and the exploration stage. The model \(\:expands\) to examine new selections during this exploration stage by extending its search space. However, in the exploitation phase, it \(\:deflates\) to focus on a small, more promising region when the new area isn’t better. This model progresses toward a better solution found in exploration or exploitation. By balancing exploitation and exploration, POA can recognize possible solutions and direct without less-than-ideal ones. It is applied in various fields, including image processing, engineering, and ML.

The traditional POA processes composite high-dimensional issues, and allocating some controller parameters is unnecessary. However, it has failed to reach the optimal global solution. Thus, an enhanced version of this model is advanced to upgrade its randomly generated number by considering the fitness values. By selecting MPOA, the capability to handle the load balance between nodes is improved, and it guarantees the enhanced survival of WSN. The randomly generated number updated in the presented MPOA is specified in Eq. (19).

$$\:C=\frac{RT}{(YT+KT)}$$

(19)

Now, the mean fitness value is described by the term \(\:YT\). The terms \(\:RT\) and \(\:KT\) signify the present fitness and poor fitness values. This model is used in Eq. (11). The presented method resolves early convergence and gains a global optimal solution by presenting an improved balance among both stages in the model. The population count of 10 is considered in this paper. For every population, 50 sets of iterations should be implemented. The present fitness function (FF) characterizes the fitness value from the present iteration above all iterations. The fitness value gained from each iteration is enhanced and separated by two to become the mean fitness value. The poor FF is the maximum fitness value attained after finishing each iteration. The presented model resolves early convergence and acquires a global optimal solution by presenting an improved balance between both stages. Current POA may be stuck in the local bests and influence the exploration capability. Updating the random variable enhances the probability of exploring the problem area and decreases the likelihood of becoming trapped in local ideals. Owing to the theory in Eq. (9), the MPOA ability of the algorithm to reflect the objective function is improved. This procedure mainly relies on managing those objective functions and leads to an even more flexible and effective optimizer procedure.

This approach has two modelled phases: the attacking and defending stages.

During this beginning phase of the POA, the positions of the fish are randomly organized. The \(\:{q}^{th}\) member of the pufferfish population is \(\:{P}_{q}\). The lower and upper limit of the decision variable a is designated as \(\:N{P}_{a}\) and \(\:U{P}_{a}\), respectively. A random integer in the range [0,1] is represented as \(\:c\). The whole population is characterized as \(\:F\). According to the \(\:{q}^{th}\) member, an objective function is assessed and named \(\:{D}_{q}\). These functions calculate the quality of all optimum solutions produced by all fish. According to the solution, the best and worst fish are chosen. During all iterations, the position of the optimal member becomes advanced.

During this exploration stage, the predator deliberates on its attack on the pufferfish. The pufferfish’s position changes after it attempts to escape from the predator. The pufferfish population is stated in Eq. (20).

$$\:{M}_{q}=\left\{{P}_{b}:{D}_{b}<{D}_{q}\:and\:b\ne\:q\right\},\:where\:q=\text{1,2},\dots\:F\:and\:b\in\:\left\{\text{1,2},\dots\:F\right\}$$

(20)

The term \(\:{D}_{b}\) denotes the objective function, and the fish with the enhanced objective function is mentioned as \(\:{P}_{b}\). The candidate pufferfish’s position is referred to as \(\:{M}_{q}\). During this whole population, one fish is randomly attacked by the predator. The new position of pufferfish is projected utilizing Eqs. (21) and (22) are adapted from.

$$\:{P}_{q,t}^{POS1}={p}_{q,t}+{c}_{q,t}\left(P{S}_{q,t}-{R}_{q,t}.{p}_{q,t}\right)$$

(21)

$$\:{P}_{q}=\left\{\begin{array}{c}{P}^{POS{1}_{{\prime\:}}}{D}_{q}^{POS1}\le\:{D}_{t};\\\:{P}_{q},\:\:else.\end{array}\right\}$$

(22)

Now, the term \(\:{P}_{q,t}^{pos1}\), describes the new position of the fish. The randomly generated numbers are expressed as \(\:{R}_{q,t}\) and \(\:{c}_{q,t}\). The picked fish by the predator is specified as \(\:P{S}_{q,t}\). The random number is updated through Eq. (19). During this exploitation phase, the fish’s position is updated depending on its defensive approach. Once they saw the sharp spines of pufferfish, the predator didn’t attack the fish. The position of the pufferfish is modified later, and the predator leaves the fish. The new site of the pufferfish is computed utilizing Eqs. (23) and (24).

$$\:{P}_{q,t}^{POS2}={P}_{q,t}+\left(1-2{c}_{q,t}\right)\frac{N{P}_{a}-U{P}_{a}}{t}$$

(23)

$$\:{P}_{q}=\left\{\begin{array}{c}{P}^{POS{1}_{{\prime\:}}}{D}_{q}^{POS1}\le\:{D}_{t};\\\:{P}_{q},\:\:else.\end{array}\right\}$$

(24)

During the above terms, the iteration count is mentioned as \(\:w\). The new position and objective function are simultaneously stated as \(\:{P}_{q}^{pos2}\) and \(\:{D}_{q}^{pos2}\). In this paper, the reduction of the classification error rate is reflected as the FF, as provided in Eq. (25).

$$\begin{aligned}fitness\left({x}_{i}\right)&=ClassifierErrorRate\left({x}_{i}\right)\\ & =\frac{no\:of\:misclassified\:samples}{Total\:no\:of\:samples}\times\:100\end{aligned}$$

(25)



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *