DC bias optimization in intelligent DCO-OFDM Li-Fi systems using hybrid machine learning with hardware validation

Machine Learning


Hybrid linear with KNN regression

Initially, the number of neighbors (n-neighbors) for KNN regression was set to 1. The number of features (k) varied from 1 to 7. For each configuration, the RMSE, R2 score, and MAPE were calculated to evaluate model performance. Subsequently, the number of neighbors (n-neighbors) increased to 3, and the number of features was adjusted again. The same evaluation metrics, RMSE, R2 score, and MAPE, were computed for this new configuration to ensure consistency in the performance assessment. Finally, the number of neighbors (n-neighbors) was set to 5, and the process of varying the number of features and evaluating the same metrics was repeated. The objective was to determine the optimal configuration that yields the highest R2 score and the lowest RMSE and MAPE values.

Algorithm 1
Algorithm 1The alternative text for this image may have been generated using AI.

Hybrid linear regression with KNN model for DC bias prediction.

Table 4 shows the three measured metrics (R2 score, RMSE, and MAPE) for each configuration, varying the number of neighbors (n-neighbors) and the number of features. As shown in Table 4, increasing the number of features generally improves the performance of the KNN regression model. This improvement was evident across all values of n-neighbors. For example, with n-neighbors set to 1, the RMSE decreased from 0.3267 with 1 feature to 0.2026 with 5 features. Similarly, the R2-Score improved from 0.9071 to 0.9643, and the MAPE decreased from 18.0061 to 9.7031. However, after reaching five features, additional features provide diminishing returns in performance improvement.

Table 4 RMSE, R2-score, and MAPE for the hybrid linear with KNN regression on the MATLAB dataset.

Also, the results indicate that the number of neighbors significantly impacts the performance metrics. When comparing different values of n-neighbors for the same number of features, n-neighbors equal to 1 often yield the best performance. For instance, with five features, the RMSE is 0.2026 for n-neighbors equal to 1, compared to 0.2462 and 0.2537 for n-neighbors equal to 3 and 5, respectively. Similarly, the R2-Score and MAPE show that n-neighbors equal to 1 generally yield higher accuracy and lower errors.

The experimental results demonstrate that for optimal performance of the KNN regression model, using five features with n-neighbors equal to 1 provides the best balance of accuracy and error minimization, resulting in the lowest RMSE of 0.20263 and the highest R2-Score of 0.96427. Increasing the number of features beyond 5 yields marginal improvements and using more than 1 neighbor generally results in higher RMSE and MAPE values, and lower R2-Scores.

In Fig. 5, the predicted and actual DC bias values are presented for a hybrid linear model with a KNN regression with n-neighbors equal to 1 and varying numbers of features to demonstrate the effectiveness of the hybrid model. From Fig. 5 we found that the actual value of the DC bias and the predicted value of the DC bias are well-aligned, especially at k equal to 5. This close alignment indicated that the hybrid linear-KNN regression model was effective in accurately predicting the appropriate DC bias value for DCO-OFDM-based Li-Fi systems.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.

Actual and predicted bias values for the hybrid linear with KNN regression from the MATLAB dataset.

Hybrid polynomial with KNN regression

For the case of a hybrid polynomial with KNN regression, we compute the same metrics: R2 score, RMSE, and MAPE, varying the polynomial degree (p), the number of neighbors (n-neighbors), and the number of features (K).

Algorithm 2
Algorithm 2The alternative text for this image may have been generated using AI.

Hybrid polynomial regression with KNN model for DC bias prediction.

Table 5 shows the results for the three metrics as the polynomial degree (p) varies from 1 to 3, with n-neighbors equal to 1, 3, and 5, and features ranging from 5 to 7. From the results shown in Table 5, the optimal performance of the hybrid polynomial with the KNN regression model by using seven features with n-neighbors equal to 5 and degree of 2, which provided the best balance of accuracy and error minimization, with the lowest RMSE of 0.18847 and the highest R2-Score of 0.96908.

Table 5 RMSE, R2 score, and MAPE for the hybrid polynomial with KNN regression on the MATLAB dataset.

The predicted and actual DC bias values for the hybrid polynomial with the KNN regression model are shown graphically in Fig. 6. From Fig. 6, the predicted value is too close to the actual DC bias value, indicating that the model successfully selected the DC bias value using seven features with n-neighbors equal to 5 and degree 2.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.

Actual and predicted bias values for the hybrid Polynomial with KNN regression from the MATLAB dataset.

Error distribution analysis

The spread of prediction errors produced by the hybrid polynomial with KNN regression was illustrated in Fig. 7. This storyline offers a perspective on the differences between the model’s real and projected values. The prediction error is shown on the x-axis, with values from around -0.4 to 0.4, and the frequency of these errors is shown on the y-axis. Most prediction errors cluster near zero, indicating that the hybrid model accurately predicts in most cases.

Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.

Error distribution for a hybrid polynomial with KNN regression.

The Kernel Density Estimation (KDE) curve overlaid on the histogram also lends further support, offering a smooth estimate of the probability density of the error distribution. The peak of the KDE curve is close to zero, indicating strong model performance with minimal errors. Additionally, the error distribution appears fairly balanced, suggesting that the model does not exhibit a consistent tendency to overestimate or underestimate. The even distribution of mistakes indicates how the hybrid method successfully merges the advantages of the hybrid Polynomial with KNN Regression, resulting in enhanced predictive accuracy. In general, the storyline shows that the hybrid model’s forecasts are correct, with most errors minor and evenly distributed around zero.

The error distribution of the hybrid Linear Regression with KNN Regression was presented in Fig. 8. Unlike the more tightly clustered error distribution seen in Fig. 7, the error range in this model is significantly wider, extending from -2 to 3. This broader spread suggests that the system produces more substantial errors, especially in the positive direction, where the model appears to overestimate.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.

Error distribution for hybrid linear with KNN regression.

The KDE curve in this case shows multiple peaks, indicating potential variations in model performance under different conditions. Overall, the model’s performance seems less consistent than that of the hybrid Polynomial with KNN Regression approach. So, the first model demonstrates better accuracy with most errors clustering near zero, while the second model shows a broader range of errors, suggesting less precision.

Hardware implementation validation

To enhance the reliability and practical value of the proposed models, this section documents a newly added validation phase based on real-time hardware measurements. The motivation for this addition was to verify the accuracy of the machine learning models in real-world scenarios, compare their performance against data collected from a hardware implementation of a Li-Fi system, and assess their robustness beyond simulation environments.

To implement the real-time Li-Fi transmission system, a hardware setup was constructed using a high-brightness LED as the optical transmitter and a photodiode as the optical receiver at a distance of 0.3 m, as shown in Fig. 9. The LED was modulated with DCO-OFDM signals, while the photodiode captured the received optical signal and converted it into an electrical signal. The output was processed by an analog front-end circuit connected to an Arduino microcontroller, which sampled and recorded the signal data.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.

Real-time hardware setup for DCO-OFDM-based Li-Fi system.

The Arduino collected the received signal parameters and exported them to a CSV file via the serial interface. This data was then analyzed and used as input to the same machine learning models used on the MATLAB dataset, with the same dataset size of 250 samples and eight parameters. The setup was designed to closely emulate the conditions of a Li-Fi communication system under practical constraints.

To evaluate the effectiveness of the proposed hybrid ML models under real-world conditions, the same trained models from the simulation phase were applied directly to the hardware-generated dataset. The hardware data was formatted to match the simulation feature structure, including parameters such as mean, minimum, maximum, standard deviation, bit error rate, and the applied DC bias. This ensured that the models could process the new data without any retraining or reconfiguration.

The results showed that the hybrid polynomial regression with KNN continued to outperform the hybrid linear regression with KNN on the hardware data. It provided more accurate predictions, with lower errors and better correlation to the actual DC bias values. This demonstrates that the hybrid polynomial with the KNN model is not only effective in simulation but also robust and reliable in real-world scenarios.

These findings confirm the generalizability of the proposed approach, validating that the trained models can accurately predict the optimal DC bias in DCO-OFDM-based Li-Fi systems using both simulated and real-time hardware data. Figure 10 shows the actual and predicted bias values for the two hybrid models’ real-time hardware data: a) hybrid linear with KNN, b) hybrid Polynomial with KNN.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.

The actual and predicted bias values for the two hybrid models of real-time hardware data.

A comparative analysis of the two proposed hybrid ML models intended for real-time hardware bias prediction in optical wireless communication systems is shown in Fig. 11. Performance metrics, RMSE and R2, are visualized in a grouped bar chart, making it easy to directly compare the Hybrid Linear Regression with KNN and Hybrid Polynomial Regression with KNN models. The results show that both hybrid architectures achieve robust predictive performance, with Hybrid Linear with KNN and Hybrid Polynomial with KNN modes yielding RMSE values of 0.3130 and 0.2960, respectively, which corresponds to a 5.4% reduction in prediction error for the polynomial-based approach. Correspondingly, the R2 scores of 0.7916 and 0.8137 indicate that both models explain a substantial proportion of variance in hardware bias, with the hybrid Polynomial with KNN model capturing approximately 2.8% more variance than its linear counterpart.

Fig. 11
Fig. 11The alternative text for this image may have been generated using AI.

Performance comparison of hybrid ML models for real-time hardware bias prediction.

To verify the feasibility of the proposed system over extended transmission distances, additional experiments were conducted using the same experimental setup while varying the transmitter–receiver separation. Figure 12 presents the experimental arrangement at different link distances, confirming that the system was physically tested beyond the nominal operating range considered in the main performance analysis.

Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.

Experimental setup illustrating the proposed system, tested at different transmission distances.

As AWGN in optical wireless communication systems increases with transmission distance, Fig. 13 provides a three-dimensional visualization of the family of noise observed. The Heatmap represents: the vertical axis is noise amplitude (V), the horizontal axes are transmission distance (m) and temporal variation (µs); color intensity means noise magnitude. A clearly visible threshold plane, which is 0.3 m long, determines whether operations on board or ashore will be affected by noise before and after its demarcation line is crossed. Based on the noise characterization presented:

  • Short-Range Optimization (< = 0.3 m): Systems can be optimized for maximum spectral efficiency with minimal error protection overhead. Receiver designs can prioritize linearity over sensitivity.

  • Medium-Range Adaptation (0.3–1.0 m): Implementation of adaptive power control and modulation schemes becomes essential.

  • Long-Range Operation (> 1.0 m): Requires sophisticated noise mitigation including advanced equalization techniques, diversity combining (spatial, temporal, or frequency), Machine learning-based channel estimation, adaptive filtering with noise prediction.

Fig. 13
Fig. 13The alternative text for this image may have been generated using AI.

Three-dimensional characterization of AWGN propagation in hardware design.

The 0.3-m threshold emerges as a key design parameter for practical Li-Fi system deployment, balancing performance objectives with implementation complexity.



Source link