Effective deep learning aided vehicle classification approach using Seismic Data

Herein, we evaluate the performance of the proposed VC scheme. Table 2 details the dataset parameters conducted in the evaluation process. Also, Table 3 highlights the hyperparameter details of the compared methods (SVM,CNN, and LSTM), including our proposed CL approach. To promote a thorough assessment, the data was divided into three subsets: (20%) for training purposes, (60%) for validation, and (20%) for testing. This distribution guaranteed an equitable representation and dependable analysis. The testing set was an impartial evaluation of the model’s generalization ability. The performance evaluation used response time and VC accuracy as primary metrics. By employing these metrics, the system’s effectiveness was evaluated. Response time assesses the system’s velocity and effectiveness in producing outcomes, whereas VC accuracy evaluates the system’s capability to identify and categorize vehicles accurately. The obtained results from the self-supervised contrastive learning approach for vehicle seismic signal classification exhibit robustness and effectiveness, especially when combined with various data augmentation techniques.

Table 2 Parameters employed by default in the evaluation.

Table 3 Hyperparameter settings for investigated models.

Figure 10 illustrates the model’s accuracy as the size of the training dataset increases. With a dataset size of 145 samples, the model achieves 82.4% accuracy. As the dataset size increases to 290 samples, the accuracy improves to 89.8%. Further increasing the dataset size to 580 samples leads to a significant improvement in accuracy, reaching 93.6%. With 1162 samples, the accuracy reaches 95.8%. The model achieves a high accuracy of 98.5% and 99.8% when the dataset size is increased to 2325 and 4650 samples, respectively. These results demonstrate the importance of having a more extensive training dataset to achieve higher model performance. As the dataset size increases, the model can learn more comprehensive representations and generalize better, improving accuracy.

Figure 11 compares the model’s accuracy using different data augmentation techniques on a dataset of 580 samples. Without any data augmentation, the model achieves an accuracy of 70.6%. Applying the “reversal” augmentation technique improves the accuracy to 80.8%. The “Down” and “Up” augmentation methods result in accuracies of 83.1% and 83.2%, respectively. The “Shifting” augmentation technique further boosts the accuracy to 86.3%. Combining multiple techniques, the “Hybrid” augmentation achieves the highest accuracy of 93.6%. These results demonstrate the effectiveness of data augmentation in improving the model’s performance, especially when the training dataset is relatively small. The “Hybrid” approach, which leverages multiple augmentation techniques, is the most beneficial in enhancing the model’s accuracy.

Figure 12 compares the accuracy of the proposed approach and the latest technique on 580 samples. To ensure a fair comparison, we tested the same dataset using the previous algorithms with the identical hyperparameters they employed, i.e., convolutional neural networks (CNN), Naive Bayes (NB), Long Short-Term Memory (LSTM), as in^26,27,29, respectively. The NB classifier²⁷ achieves an accuracy of 61.2% on the dataset with 580 samples. The CNN model achieves an accuracy of 80.1% on the 580-sample dataset. The LSTM model²⁹ achieves an accuracy of 82.4% on the 580-sample dataset. The proposed approach achieves the highest accuracy of 93.6% on the 580-sample dataset. The results demonstrate that the proposed approach significantly outperforms the latest techniques, including NB²⁷, CNN²⁶, and LSTM²⁹, on the 580-sample dataset. This indicates that the proposed method can effectively utilize the available data and learn more comprehensive representations, leading to superior performance compared to the state-of-the-art models. The considerable gap between the accuracy of the proposed approach (93.6%) and the other techniques (61.2% for NB, 80.1% for CNN, and 82.4% for LSTM) highlights the effectiveness and robustness of the proposed method. This finding is particularly notable, as it demonstrates the proposed approach’s ability to achieve high accuracy even with a small dataset size of 580 samples. These results indicate that the proposed approach outperforms the latest techniques across all dataset sizes. This confirms the importance of data augmentation in improving the model’s ability to learn diverse and representative features. As the dataset size decreases (1162, 580, 290, and 145 samples), the performance of individual augmentation techniques gradually diminishes while the proposed hybrid approach remains more robust.

Figure 13 compares the time taken by the proposed approach and the latest techniques. The NB classifier has the fastest inference time of 15.9 ms. The proposed approach takes 16.1 ms, slightly higher than NB but significantly faster than the other techniques. The SVM and LSTM models take 31.1 and 26.3 ms, respectively. The Logistic Regression (LR) and CNN models have the longest inference times of 20.2 and 41.7 s, respectively. These results demonstrate that the proposed approach balances accuracy and inference time well, making it a practical and efficient solution for real-world applications. The relatively fast inference time of the proposed approach is a desirable characteristic, especially in scenarios where quick decision-making is required. The results presented in Figs. 5, 6, 7 and 8 provide a comprehensive evaluation of the proposed approach and its performance compared to the latest techniques. The findings highlight the advantages of the proposed method in terms of accuracy, dataset efficiency, and inference time, making it a compelling choice for practical applications. The comprehensive results indicate that the self-supervised contrastive learning approach, especially with hybrid data augmentation, is highly effective for seismic signal classification. It consistently outperforms traditional classifiers, yielding state-of-the-art accuracy rates and demonstrating its potential for real-world applications in seismic analysis and related fields. Combining data augmentation and self-supervised learning is a powerful strategy for extracting meaningful and discriminative features from seismic signals, leading to superior classification performance across diverse dataset sizes. The self-supervised contrastive learning approach, in combination with hybrid data augmentation, presents a compelling solution for seismic signal classification tasks. The results indicate its effectiveness in extracting meaningful features from seismic signals and achieving state-of-the-art accuracy, especially in scenarios with limited labeled data.

Table 4 Performance metrics for fivefold cross-validation of the self-supervised contrastive learning model for vehicle classification using seismic data.

Table 4 summarizes the performance metrics obtained from the fivefold cross-validation for our proposed approach. The model consistently demonstrated high accuracy, precision, recall, and F1 scores, with minimal variation across the folds. Contrastive loss remained low throughout, indicating the model’s efficiency in separating seismic signal data from different vehicle classes. The results from the fivefold cross-validation clearly illustrate the model’s effectiveness and reliability in vehicle classification using seismic data. For further details, fivefold cross-validation divides the dataset into five equal parts or folds. The model is trained on four folds and tested on the remaining one. This process is repeated five times, with each fold acting as the test set once. This approach ensures the model performs well across different parts of the dataset, providing a more accurate assessment of its ability to generalize to unseen data. In this study, the model achieved impressive accuracy scores, ranging from 99.5 to 99.8% across all folds. This highlights its capability to classify different types of vehicles based on seismic signals, regardless of the test data. Such consistent accuracy suggests that the model is generalizing well and not overfitting to specific parts of the data. The data augmentation techniques used to expand the training dataset artificially were key to this success, ensuring more diverse and robust learning from seismic signals. These techniques helped prevent overfitting, enabling the model to perform well even with a limited dataset. The contrastive loss values remained low across all folds (between 0.010 and 0.015), a positive indicator of the model’s ability to differentiate between seismic signals from different vehicle classes. This is critical because lower loss values mean seismic signals from the same vehicle class are closely grouped. In contrast, signals from other classes are more distinctly separated, leading to higher classification accuracy. The high precision (up to 99.4%) and recall (up to 99.6%) further confirm the model’s effectiveness. High recall means that the model correctly identifies vehicles, while high precision shows that it avoids incorrect classifications. This balance is essential for traffic monitoring and accident prevention applications, where misclassifications can have serious consequences.

Figure 14 illustrates the relationship between dataset size and model accuracy, with a 95% confidence interval (CI) for each model. The results clearly demonstrate that CL consistently outperforms CNN and LSTM across all dataset sizes. CL achieves higher accuracy, showcasing its robust feature extraction capabilities. Even with a limited dataset of 145 samples, CL significantly outperforms CNN and LSTM, proving its effectiveness in data-scarce environments. As the dataset size increases, CL maintains its advantage, reaching near 100% accuracy at 4650 samples. CNN and LSTM exhibit gradual performance improvements as dataset size increases; however, their accuracy plateaus at lower levels than CL. This suggests that CNN and LSTM require larger datasets to enhance classification performance effectively. Additionally, all models show a steep accuracy increase when dataset size grows from 145 to 1162 samples, highlighting the importance of data availability. The confidence intervals indicate that CL exhibits higher stability (narrower CI), whereas CNN and LSTM have higher variance, mainly when working with smaller datasets. These findings reinforce the superiority of CL in seismic-based vehicle classification, demonstrating its strong generalization capabilities and making it an optimal choice for real-world deployment in ITSs.

Table 5 Pairwise t-test results (CL vs. others).

The results presented in boxplot Fig. 15 and pairwise t-test Table 5 provide a comprehensive statistical evaluation of model performance differences in seismic-based vehicle classification. The ANOVA and boxplot analysis clearly indicate that CL achieves the highest accuracy compared to CNN, LSTM, and SVM. The boxplot visualization further highlights CL’s superior performance, demonstrating both higher accuracy and minimal variance, which confirms its stability across multiple runs. Additionally, the ANOVA test results (p value< 0.0001) validate that at least one model performs significantly differently from the others, reinforcing CL’s effectiveness. The pairwise t-test results, where all p values are< 0.0001, establish statistically significant performance differences among the models. The high t-statistics further confirm that CL substantially outperforms CNN, LSTM, and SVM in classification accuracy. While SVM is computationally efficient, its lack of deep feature extraction capabilities results in inferior performance, making it unsuitable for complex seismic-based vehicle classification tasks. Overall, CL’s significant advantage in classification accuracy, validated by ANOVA and pairwise t-tests, underscores its ability to learn highly discriminative features. The boxplot visualization reaffirms CL’s stable and consistent performance, highlighting its robustness in feature extraction and computational efficiency. These findings establish CL as the optimal model for real-time deployment in Intelligent Transportation Systems, ensuring both accuracy and efficiency in seismic-based vehicles.

The complexity analysis of the compared models, based on FLOPs and memory usage, are previewed in Tables 6 and 7, revealing significant differences in computational efficiency. SVM demonstrates the lowest computational cost with only 0.005 FLOPs and minimal memory usage (18 MB), making it the most lightweight model. However, its simplicity may come at the expense of performance in complex tasks. On the other hand, the CNN and LSTM models exhibit significantly higher FLOPs (2.3 and 4.7, respectively) and memory consumption (450 MB and 680 MB), indicating their substantial computational cost. LSTM, in particular, has the highest resource demand, which may limit its practical deployment in resource-constrained environments. Meanwhile, our proposed CL model achieves a balanced trade-off, requiring 1.9 FLOPs and 275 MB of memory. This optimized performance highlights its advantage in delivering competitive efficiency while maintaining a lower computational burden than CNN and LSTM. It is a more practical choice for real-world vehicle classification applications where accuracy, efficiency, and computational cost matter, especially since this system might be deployed on low-cost computers such as Raspberry Pi.

Limitations

One limitation of this study is the absence of high-speed vehicle data. The experiments were conducted at Kyushu University’s ITO campus, where road regulations restrict vehicle speeds to 40 km/h. Therefore, the impact of high-speed vehicle transitions on seismic wave characteristics was not included and left for future investigation. However, since vehicle combustion engines generate a partial portion of the seismic waves, their frequency components are relatively independent of vehicle speed⁴¹. Nonetheless, higher-speed vehicles may introduce additional complexities, such as increased wave energy, possible Doppler effects, and variations in wave propagation patterns. Future studies should investigate these factors by collecting seismic data from high-speed vehicle environments to evaluate their impact on classification performance and model generalization.

Furthermore, our system is mainly designed to work in specific areas, such as one-lane streets, toll collections, intersections, etc. To accommodate any situation (multi-lane), we need to increase the number of deployed geophones on a large scale. Besides, our approach faces hardware limitations, such as geophone sensitivity and environmental interference, which can affect data accuracy. Additionally, software limitations include the need for significant computational resources for training and challenges in model generalization to diverse real-world traffic conditions, necessitating further fine-tuning and validation. Moreover, extreme environmental perturbations, such as heavy precipitation, can significantly affect seismic sensors by attenuating surface and S waves, potentially leading to system failures. While such perturbations are inherently challenging to mitigate, we propose several procedures to minimize their impact, including improved sensor shielding, adaptive noise filtering techniques, and site selection strategies that enhance measurement stability.

Table 6 Flop comparison of compared models.

Table 7 Memory usage comparison of compared models.

Source link