Machine learning based multi-stage intrusion detection system and feature selection ensemble security in cloud assisted vehicular ad hoc networks

Machine Learning


Intelligent transportation systems rely on Cloud-assisted Vehicular Ad Hoc Networks (VANETs) to improve road safety and efficiency via communication between cars and infrastructure. Several security risks, including counterfeiting, denial of service (DoS), and data modification attacks, may affect VANETs due to their dynamic and decentralized nature. The data integrity and the safety of vehicles are set to the vulnerabilities based on a series of threats required for robust security. Using ensemble models and feature selection based on ML, this work presents MLIDS-RFA, which uses the Random Forest Algorithm to mitigate various security vulnerabilities in VANETs successfully.

Building an intruder detection system with multiple levels of protection

A new (MLIDS-RFA) developed for use in Cloud assisted Vehicular Ad Hoc Networks (VANETs) is introduced in this research. The assaults in the network are set to different steps, detecting the phase run designed by the system. This improves the detection efficacy and makes the system more resilient against various security threats.

Fig. 1
figure 1

The structure of the suggested IDS model.

The dataset possible for the lesser attributes is made with intrusion detection, where the structure develops the study according to the primary objective. Prior studies have shown that ID is associated with only one group of these characteristics. So, to construct a more effective classifier within a realistic timeframe, it is necessary to lower the dimensionality of the data set. There are essentially four stages to the suggested method: Using a feature selection strategy, the initial step is to choose relevant characteristics for every assault. The next step is to combine the many traits until you have the best mix of features to defend against any assault. This step feeds the features into the categorization algorithm. The last step is to run the model on a test dataset. The suggested methodology’s framework is shown in Fig. 1. Even though a lot of raw data is involved in an assault on a network system, selecting features is becoming an essential part of the process. Feature selection encompasses various approaches to removing superfluous or unneeded characteristics. Low precision in classification, high computing cost, and lengthy processing times are all consequences of model complexity, which is heavily influenced by the data set’s dimensionality. The goal of these techniques is the same as the accuracy of the model optimization: to choose the best features. techniques for feature selection may be broadly classified into two groups: filter techniques and wrapper methods.

$$Z_{d} = ~\frac{1}{{w\left( {m^{3} } \right)}}*Yv_{1} + ~Y_{{z\left( {m – n} \right)}} ,~R_{z} = ~\frac{1}{{n – bp}}~~~$$

(1)

The outcome detection \(\:{Z}_{d}\) and selection features \(\:Y{v}_{1}\) are represented based on the parameters and weights \(\:\frac{1}{w\left({m}^{3}\right)}\). The efficiency detection is measured based on the reaction time \(\:{Y}_{z\left(m-n\right)}\) with the allocation of time \(\:\frac{1}{n-bp}\) is based on the efficiency system \(\:{R}_{z}\) on the characteristics and computational value in Eq. 1.

$$Z_{1} = \left\{ {\left( {k,p} \right):~Y_{{u – 1}} + ~Z_{{k – 1}} = 0} \right\},~E_{2} = \left( {W_{{f – 1}} *~er_{{d – 1}} } \right)~~$$

(2)

The value of the performance metrics \(\:{Z}_{1}\) is dealt with by the rate of error \(\:k,p,\) which defines the rate \(\:{Y}_{u-1}\) depending on the weight \(\:{Z}_{k-1}\) during the initial phase, depending \(\:{E}_{2}\) on the deduction errors \(\:{W}_{f-1}\) with the weighting feature \(\:{er}_{d-1}\) and influence in Eq. 2.

$$M_{{z – 1}} = ~m^{{v + wr}} – \partial \left( {d – hq} \right) + ~zv^{{n – 1}} ~~~$$

(3)

Here, the Eq. 3, \(\:{M}_{z-1}\) may represent a system state or performance measure that changes according to parameters such as \(\:{m}^{v+wr}\) (which may be associated with feature relevance and weight adjustments), while \(\:\partial\:\left(d-hq\right)\) takes adjustments based on discrepancies or detection errors into account. The expression \(\:{zv}^{n-1}\) may denote a part of the output of the algorithm for detection.

Combining feature selection based on ML

Regarding VANET intrusion detection, the suggested solution uses a feature selection process based on ML to find the best features. Due to this focused method, processing overhead and reaction time are drastically reduced, allowing for continuous monitoring and mitigation of risks in dynamic vehicle contexts, as shown in Fig. 2.

Fig. 2
figure 2

Cyber security-related research idea summary and method for feature selection.

In the extraction of features, multi-dimensional characteristics are mapped to a lower-dimensional space for features to reduce the amount of data properties. You may think of this transformation as a hybrid of linear and non-linear features, and it keeps all the important parts of the original features. The resulting feature space displays characteristics that are comparable to the initial features. Methods for extracting pertinent characteristics from a dataset are known as feature selection methods. Improving the efficiency and effectiveness of learning models’ computations relies heavily on feature extraction and selecting features. As a result, both strategies might be seen as successful ways to feature engineering. Extracting features that may enhance the effectiveness of the learning system is where feature extraction shines. It is important to remember that the feature extractor changes the features’ meaning, which might make further analysis more difficult. The selection of features maintains the features’ initial physical significance, as opposed to feature extraction. To do this, it takes the whole collection of characteristics and selects just the most important ones. Since they can potentially improve learning models’ efficiency and comprehension, the selection of features and extraction take center stage in feature engineering. Through the minimization of computing expenses, education efficiency is enhanced.

$$\forall _{{v – 1}} Z_{{m\left( {nk} \right)}} = \left( {\sqrt {2 + \left( {vb – c} \right)m^{2} } } \right)*~er_{{z – 1}} + \left( {\partial _{1} – Pq} \right)~~$$

(4)

A scaling factor impacted by variables \(\:{\forall\:}_{v-1}\) is represented by the Eq. 4, \(\:\sqrt{2+\left(vb-c\right){m}^{2}}\), which modifies the error rate \(\:{er}_{z-1}\) to increase accuracy. The performance indication \(\:{\partial\:}_{1}-Pq\) is determined based on the settings with the detection depending on the change further based on the threat recognition with the high-level accuracy and resilience of the system.

$$L\left( {mn,er^{{z – 1}} } \right) = ~\frac{1}{2} \in Z\left( {y,xw} \right) + ~\frac{2}{4}*\left| {\left| {z_{{jk}} – Mp} \right|} \right|v^{2} ~~$$

(5)

A baseline metric for feature selection and weighting is probably reflected in Eq. 5, \(\:L\left(mn,{er}^{z-1}\right)\). The influence of feature variances \(\:\frac{1}{2}\) on detection accuracy is taken into account by adjusting the \(\:\frac{2}{4}\) term, \(\:Z\left(y,xw\right)\), which quantifies the deviation \(\:{v}^{2}\) of detection results, \(\:{z}_{jk}\), from a reference metric, \(\:Mp\).

$$K = \frac{1}{2}*\left( {\forall _{{d – 1}} + \left| {\left| {e_{r} – mkq^{{ – 1}} } \right|} \right|} \right) – \left( {M_{{kp}} – \partial _{{ \equiv 2}} } \right)~~$$

(6)

The general rate of errors \(\:{\forall\:}_{d-1}\) and the absolute deviation \(\:\left|{e}_{r}-mk{q}^{-1}\right|\) are combined in the Eq. 6, \(\:\frac{1}{2}\) which reflects modifications in error metrics \(\:K\) and feature impact. The\(\:mk{q}^{-1}\) subtraction denotes a correction factor to balance the metric \(\:{M}_{kp}\), guaranteeing the precision and dependability \(\:{\partial\:}_{\equiv\:2}\) of the IDS in identifying and handling threats.

$$M_{{vb}} \left( {z – 1} \right) = ~D_{2} E\left( {m + nkv^{{ – 1 + mp}} } \right)*\left\langle {M_{{k – 1}} ,~V_{{b\left( {wq – 1} \right)}} } \right\rangle ~$$

(7)

In this case, the scaling factor \(\:{M}_{vb}\left(z-1\right)\) modifies the metric according to feature and error integration and is affected by factors such as \(\:{D}_{2}E\). It is probable that the expression \\(\:m+nk{v}^{-1+mp}\) represents a dot product on Eq. 7 or an interaction between weighted factors \(\:{M}_{k-1}\) and prior metrics \(\:{V}_{b\left(wq-1\right)}\).

$$G\left( {x,cv^{{p – k}} } \right) = ~\frac{1}{4}\left[ {M^{{2p}} – Qw\left( {1 – pk} \right)} \right] + \left| {\left| {z_{v} – e_{{r – 1}} } \right|} \right|~$$

(8)

A performance measure \(\:x,c{v}^{p-k}\) for the MLIDS-ML system is represented by Eq. 8. Squared parameters \(\:G\) and a weighted adjustment \(\:{M}^{2p}\) are balanced by the term \(\:Qw\left(1-pk\right)\), which represents the effect of these factors on system performance. The extra term \(\:\left|{z}_{v}-{e}_{r-1}\right|\) quantifies the difference between the error metrics.

The (MLIDS-RFA) is presented in this paper. The technology is intended to recognize particular network threats in many phases. The comprehensive protection of VANETs is ensured by this multi-stage technique, which increases detection accuracy and resistance against diverse security threats.

Fig. 3
figure 3

System architecture for multi-layer vehicle intrusion detection systems.

A Multi-Layer Vehicle VIDS is shown in Fig. 3 as an example of its design. This technology will make vehicular networks more secure and dependable by identifying and countering possible dangers. The architectural Collection of Data Layer collects information from various vehicle data sources and communication methods. This layer guarantees the thorough collection of data from the vehicle’s surroundings. The next step is to refine the data using the preprocessing layer. Data cleansing involves removing errors and outliers, extracting features to find important qualities, and selecting characteristics to rank the distinctive characteristics for additional investigation. The data quality creates a vital layer with the work based on the detection method.

The method variety is used by possible intrusion with irregularities based on the spots by the detection layer at various stages of a system relying on foundations. To enhance detection accuracy while decreasing false positives, it employs the detection of anomaly methods, signature approaches, behavioral modeling, and a comprehensive decision mechanism. The Response Layer kicks in when it senses an incursion. Notifications, intrusion mitigation measures, and reporting log maintenance are all under the purview of this layer. Timely and suitable reactions are guaranteed in response to threats that are discovered. The Feedback & Learning Layer allows the system to improve using learning and feedback loops continuously. This layer enables the system to adjust to new threats by incorporating fresh data and learning from past reactions.

Fig. 4
figure 4

A driverless vehicle’s peripherals and reference design.

An effective IDS may be built using ML and predictive modeling techniques. Training supervised learning algorithms on tagged data, where intrusions are categorized as either benign or malicious, allows for real-time intrusion detection and categorization. As shown in the highlighted part of Fig. 4, our study enhanced the architecture by strategically positioning the IDS to filter all critical communications, human interactions, and environmental interactions for intrusion. Clustering is a kind of unsupervised learning that may be used to spot out-of-the-ordinary patterns and uncover previously unknown dangers. Regarding autonomous vehicle intrusion detection, evaluating many machine-learning models and looking at their performance measures like F1-score, recall, accuracy, and precision is important. Below, you can find the specifics of these steps. It is common practice to handle raw data before preparing it for use in a machine-learning model. This is the first and most important stage in developing a machine-learning model. The data preparation process in this study included cleaning all three datasets. Data cleaning included filtering out unwanted noise. At this point, the study dealt with missing data by deleting or replacing them with an average value. To make the data usable, all outliers and useless characteristics were removed. At this stage, this study normalized the data by, if needed, scaling the characteristics to a common range. Data normalization involves scaling the values of attributes in a dataset as a data preparation technique. The process of standardization entails adjusting the data so that it follows a normal distribution. A standard normal distribution is a distribution where the mean is zero and the standard deviation is one.

$$\forall _{{d – 1}} = \left\{ {\left( {Y_{{z – 1}} ,~Mx\left( {1 – qw} \right)*\left( {F\left( {g – hk\left( {1 + p} \right)} \right)} \right)} \right)} \right\}~$$

(9)

A collection of variables, including \(\:{\forall\:}_{d-1}\) and a product term \(\:{Y}_{z-1},\:Mx\) is represented by the Eq. 9, \(\:1-qw\). This product combines a function \(\:g-hk\) that adapts depending on attack parameters \(\:F\) and feature interactions with a weighted factor \(\:1+p\).

$$\left| {\left| {n_{{b1}} – \left( {u – \left( {mk^{{p – n}} } \right)} \right)} \right|} \right| = j – \left( {E^{f} \left( {m – np} \right)} \right)~$$

(10)

The difference between actual values \(\:{n}_{b1}\) and a computed reference \(\:u-\left({mk}^{p-n}\right)\) is measured by the Eq. 10, \(\:j\). An adjustment factor that incorporates the error function and parameter modifications \(\:{E}^{f}\) is represented by the \(\:m-np\).

For VANET intrusion detection, the suggested solution uses a feature selection procedure based on ML to determine which characteristics are most important. Effective real-time monitoring is made possible by this method, which eliminates processing overhead and decreases reaction time. The vital aspects and the concentration of the system are guaranteed with the context validated through neutralization and timely threat identification.

Improving detection capabilities with the application of ensemble models

The MLIDS-RFA improves detection accuracy by using ensemble models, namely the Random Forest Algorithm (RFA), to harness the capabilities of several weak classifiers. The attack patterns are complicated by ensemble methods and identifying false positive aspects with the security level.

Fig. 5
figure 5

Exploit detection system for VANETs with multiple stages.

The network defenses are strengthened by the data mining-based features depending on the choice and ensemble models where the image showcases the VANET IDS. The first step of the system’s Data Collection Stage is gathering data from different automobile communication and network operations. The gathered information is used to detect any network security risks. In the Feature Identification Stage, the system analyzes the acquired data using the Random Forest Algorithm (RFA) to identify the most relevant characteristics. Through a laser-like focus on the qualities most indicative of incursions, this method streamlines data and saves processing overhead. In the next step, the Composite Models Stage, the chosen characteristics are used to mix several ML models. This makes the system better at detecting and defending against different attacks. Finding and categorizing various network assaults is the job of the Attack Detection Stage. In the Intervention and Mitigation Stage, which follows the discovery of an attack, the system avoids network failures by real-time monitoring, identifying hostile nodes, and applying mitigation strategies. This all-encompassing method keeps detection rates and false positives low, protecting intelligent transportation systems while ensuring VANETs are well-protected, as shown in Fig. 5.

$$Q_{w} \left( {f – jp} \right) = q\left( {\frac{{er}}{{\left( {v – mn^{{s – 1}} } \right)}}} \right)*sin\forall ~\left( {n – 1} \right)~$$

(11)

A weighted adjustment is represented by the Eq. 11, \(\:{Q}_{w}\left(f-jp\right)\), where the parameters \(\:q\) and \(\:\frac{er}{\left(v-{mn}^{s-1}\right)}\) are associated with the operational context of the system. The error rate \(\:sin\forall\:\) is adjusted by the term \(\:\left(n-1\right)\) using feature scaling for detection accuracy analysis.

$$Q\left( {z – 1} \right) = \frac{1}{2}*\left( {tan + 1\cos \left( {\partial *~\frac{v}{{2\forall }}} \right) + ~B_{{nm}} \left( {n – w} \right)} \right)~$$

(12)

\(\:Q\left(z-1\right)\) are integrated using the Eq. 12. To improve the accuracy \(\:\partial\:*\:\frac{v}{2\forall\:}\) and threat-response \(\:\text{c}\text{o}\text{s}\) of the IDS, the term \(\:\frac{1}{2}\) adds a correction \(\:{B}_{nm}\) based on parameters \(\:tan\), and \(\:n-w\) for computational efficiency analysis.

$$B_{{jkp}} = ~D_{f} \left( {v – 1} \right)*k\left( {Prv_{{j – 1}} + ~rd_{{\left( {z – 1} \right)}} } \right)~$$

(13)

The metric is scaled according to the parameters \(\:{D}_{f}\), \(\:v-1\) taking operational and feature impacts into account, using Eq. 13, \(\:{B}_{jkp}\). To guarantee precise threat detection and system performance, the term \(\:Pr{v}_{j-1}\) incorporates historical data \(\:{rd}_{\left(z-1\right)}\) and current detection metrics \(\:k\) based on scalability analysis.

$$G\left( {v,pk} \right) = M\left( {q – wk} \right)_{{v – 1}} *~L\left( {m\left( {n – kp} \right)} \right)~$$

(14)

Equation 14 for a metric that incorporates many components, \(\:G\left(v,pk\right)\). To account for operational and feature-based impacts, the term \(\:M{\left(q-wk\right)}_{v-1}\) modifies the metric based on the parameter differences \(\:L\left(m\left(n-kp\right)\right)\text{d}\text{e}\text{p}\text{e}\text{n}\text{d}\text{i}\text{n}\text{g}\) on adaptability to network change analysis.

$$Bz_{{m – 1}} = ~\forall _{d} \left( {erp^{{n – 2}} } \right)*(|{\text{|}}r_{{sf}} – p_{{nk}} {\text{)}}~$$

(15)

Here, the general mistake function \(\:{Bz}_{m-1}\) adjusted for feature and recognition parameters \(\:{\forall\:}_{d}\) is reflected in the equation \(\:er{p}^{n-2}\), and the variance between computer results \(\:{r}_{sf}\) and target values \(\:{p}_{nk}\) is measured by Eq. 15 for detection performance analysis.

With its multi-stage IDS, the proposed MLIDS-RFA can identify and counter various threats, making VANETs more secure. First, the approach uses a feature selection technique based on ML to find the most important information for intrusion detection, which speeds up reaction times and reduces processing overhead. The system then uses an ensemble model technique, namely the Random Forest Algorithm, to combine numerous weak classifiers and enhance detection accuracy. Using this route, the detection system may successfully identify intricate attack patterns with few false positives and good detection rates. Simulated results show that the system can secure VANETs, which is crucial for the future generation of transportation networks to have secure and dependable communication.

To improve detection accuracy, the MLIDS-RFA combines several weak classifiers using ensemble models, particularly the Random Forest Algorithm. This method enhances the system’s capacity to detect intricate patterns of attacks in VANETs. Consequently, it improves the network’s security in general and decreases the number of false positives.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *