Intelligent system for portfolio optimization for novel volatility forecasting using machine learning

The data collection process implied that the values of shares of 40 different countries were suggested every year, and the time range was between 1999 and 2018. This was comprised of 20 years of observations of each market, which yielded a large panel structure, which is the panel data type that can be utilized to construct the multivariate modelling. Once the collection of the data was completed, the missing values were covered with the help of preprocessing, and the dataset was restructured according to the structural needs of the program. One of the examples of the calculated Sharpe, Sterling, and Calmar ratios is demonstrated in Table 2 (hereby referred to as the Proposed Method) that would be used in the modelling pipeline. Contemporary AI-driven portfolio models are regarded as methodological reference baselines instead of real empirical implementations in the experimental context because of the discrepancy in data requirements and the inability to replicate them elsewhere due to the lack of reproducibility in recent studies in the field of deep learning. The 40 samples were chosen to capture a heterogeneous mix of developed and emerging markets by using uninterrupted data on the same at the point of time between 1999 and 2018. The choice criterion will ensure robust cross-market analysis as well as minimize the survivorship bias of performance estimation.

Data gaps in terms of missing country-years were handled in the following way: the gaps in country-years of less than 5% within the 20-year series were filled by linear interpolation, whereas gaps in country-years of 5% or more were not included in the training and were coded in inappropriate descriptive tables. To enhance reproducibility, both the uncleansed and the cleaned datasets are accessible as the supplementary materials or as requested depending on the policy of the journal.

The preprocessing phase used global z-score normalization of the entire dataset in order to have similar scaling. The interquartile-range (IQR) filtering technique applied in outlier treatment aimed to reduce the likelihood of skew due to extreme market shocks, with any value of more than 1.5 x IQR being winsorized. The stratified sampling method was applied in the train-test splitting process in order to ensure the balance of classes.

The resulting pre-processed data were then organized chronologically by the year and by a country in order to determine annual peaks and conduct maximum drawdown (MDD) calculations of each market. This categorized dataset is the input to the further ratio computations and the classifier input.

Table 3 shows that the normal values of the annual performance of selected countries are presented in 1999–2018. Categories of countries with complete or incomplete historical records are involved, and the years of unavailable market data are denoted by the symbol “.”.

Table 3 Normalized annual performance values (1999–2018) for selected countries.

The restructured preprocessed data is organized chronologically by year and grouped by country, aiming to identify the years in which each country attained its highest share values. This dataset serves the purpose of calculating the maximum drawdown for each individual country.

Table 4 Descriptive statistics of manipulated and normalized financial data (1999–2018).

Table 4 shows the manipulated data acquired after any preprocessing and normalization of the raw financial data that was obtained in forty foreign markets. The minimum, maximum, and mean values support the variation in the share returns of the various markets per year in the period between 1999 and 2018. These descriptive statistics are absolutely necessary in explaining the form and extent of the input information before feature extraction. The normalized data ensures that all the countries are scaled identically, and hence the suggested multi-SVM classifier obtains any significant patterns in the behavior of portfolios and the movement of volatilities without the impact of varying numerical values.

To classify them, Sharpe, Sterling, and Calmar ratios were calculated and categorized into three (high, medium, and low) using percentile-based quantile values (0–33%, 34–66%, and 67–100%). The strategy of binning was employed in all countries and years to ensure that there was fairness in the SVM training process. Training was done on a 70/15/15 split of the data into training, validation, and test, and all the results were averaged across 5-fold cross-validation to remove sampling variance and increase the statistical reliability.

Table 5 shows the results of this classifier with regard to the countries, whereas Fig. 3 represents these ratios in particular to Australia.

Table 5 Summary of Multi-SVM Classifier Output for 40 Countries.

Table 5 indicates that the performance indicators can be distinguished using the normalized input features by the classifier. The figures above support the model using ratio computation to get and make inferences of the risk-adjusted performance measures on the country basis. The rising Sharpe ratio shows higher returns per unit of total risk; the rising of the Sharpe ratio values denotes more stable drawdown resistance and portfolio stability as well as the increasing ratio of the pretty sterling ratio and Calmar ratio. Combined, all this finds favor in the application of the classifier in the modeling of multidimensional volatility patterns and nonlinear dependence in the international financial markets.

Figure 3 in its turn presents the graphic representation of these premeditated ratios for the case of Australia that depicted an apparent reliance between the three indications. The visual representation demonstrates the empirical inference of the proposed multi-SVM classifier in the identical drawdown resistance (Calmar ratio) of the portfolio in the trade-off of risk and returns (Sharpe and Sterling ratios). This is the visual fact proving the interpretability of the model and risk-adjusted optimization realization in the multitude of performance dimensions.

Portfolio backtesting and financial performance evaluation

Even though the main task of the MSVM classifier is to categorize the data using volatility, its applicability is linked to its strength in determining the portfolio allocation. To assess this, there was a backtesting experiment between 1999 and 2018 of rebalancing by annual-based predicted ratio classes. The entries of the countries were categorized as the high ones (50% higher weight), the medium ones (50% neutral weight), and the low ones (50% less weight). Strategies that were used as benchmarks were the CAPM portfolio, an Equally Weighted (EQWT) portfolio, and the Three-Factor Model (3FM) of Fama and French. Cumulative returns and maximum drawdown of each of the portfolios were calculated.

Table 6 Backtesting performance summary (1999–2018).

The proposed MSVM-guided portfolio, as created in Table 6, has the highest cumulative returns and smallest peak drawdown of all the existing models, which shows that classification using volatility increases portfolio resilience. The high risk-adjusted performance of the Sharpe ratio of 0.75 is consistent and is in accord with the ability of the model to predict the drawdown-sensitive volatility patterns as compared to mainstream financial models like CAPM, EQWT, and the Three-Factor Model. These results of backtesting demonstrate the real economic implications of the suggested system, whereby the allocation strategy motivated by MSVM provides better returns as well as better portfolio stability in the 1999–2018 period.

Figure 4 shows the cumulative returns of the MSVM-controlled portfolio and the reference models during the years 1999 to 2018. The results in Fig. 4 are supplemented by the investigation of the maximum drawdown trajectories as shown in Fig. 5. This visualization supports the argument that the proposed MSVM strategy has maintained lower drawdowns at all times, which also contributes to the stability and resilience also expressed in Table 6.

Table 7 Comparison of Sharpe ratio of existing classical.

Table 7 illustrates the comparative Sharpe ratios that could be achieved with specific portfolio evaluation paradigms, namely, the Capital Asset Pricing Model (CAPM), Equally Weighted Portfolio (EQWT), and Three-Factor Model (3FM), in comparison to the suggested Multi – SVM- based models. The results clearly show that the proposed methodology has the greatest Sharpe ratio of 0.75, hence demonstrating that it has the best ability to balance the portfolio risk and its return. The augmented predictive accuracy and optimization efficiency of the suggested system, against the traditional models, has been highlighted by these results.

By comparing the model^34,35 to the approach proposed by us, Fig. 6 shows graphical representation of comparison of the Sharpe ratios of the proposed model and the existing conventional model. As the graph shows, our suggested approach had an accomplishment. the maximum Sharpe ratio of 0.75, which is higher than other models. Next was EQWT with a ratio of 0.54 and lastly 3FM had the lowest ratio of 0.20.

The effectiveness of this classifier was determined by its precision, sensitivity, and selectivity. The accuracy rate, sensitivity rate, and specificity rate of the classifier were 97.5%, 100%, and 65.78%, respectively. The subsequent Table 8 presents a comparison of the classifier’s accuracy against that of the existing SVM and naive Bayes methods.

Table 8 Accuracy obtained using proposed and existing method.

Additional classification performance metrics

Besides accuracy, sensitivity, and specificity, further analysis that evaluated the predictive ability of the classifier has been done by estimating the confusion matrix, F1-score, precision, recall, and the macro-averaged AUC. These measures are a more dependable and repeatable evaluation, which is needed in typical machine-learning evaluation guidelines. Table 9 presents the confusion matrix of the proposed MSVM classifier, which shows the structure of the prediction made by the classifier by class.

Table 9 Confusion matrix of proposed multi-SVM classifier.

Table 10 Additional evaluation metrics.

Such findings support the usefulness of the proposed system, as it shows a high level of predictive success in all three classes of ratio, which is based on volatility. All the macro-averaged precision, the recall, the F1-score, and the AUC values are above 0.90, as demonstrated in Table 10, and it reflects a balanced and stable performance of the system under the High, Medium, and Low categories. These supplementary evaluation measurements give a better understanding of the quality of classifications and show that the model is more solid than the three values of accuracy, sensitivity, and specificity.

Robustness and statistical significance analysis

There was a strong robustness analysis because it was confirmed that the predictive capability of the proposed MSVM classifier stays firm and cannot be affected by arbitrary fluctuations due to the division of data. The 5-fold cross-validation with 50 independent evaluations using the model was evaluated on 5-fold cross-validation. Through these runs the MSVM had a mean length of 97.5, a standard deviation of 0.82, and a 95% range of higher [96.9%, 98.2%], meaning that the results were highly stable and also varied around that performance with low variation.

In order to further conclude on whether the witnessed improvement over baseline classifiers is statistically significant, paired significance tests have been conducted in comparison between the MSVM and the conventional SVM and Naive Bayes on the same 50 cross-validation folds. In the case of MSVM vs. SVM, the paired t-test gave t(49) = 5.12, p < 0.001, and Cohen’s d = 0.72, which means a huge practical effect. In the case of MSVM vs. Naive Bayes, the test has given t(49) = 8.47, p < 0.001, and Cohen’s d = 1.20, which shows a very large effect size. All tests were two-tailed. These findings validate the finding that the high ranking of the proposed MSVM framework is not dependent on the random sampling variants, and it has a statistical significance when compared to the known baseline models.

The similarity in the accuracy trends demonstrated in Fig. 7 also works with the favorable performance of the MSVM classifier over the traditional SVM and Naive Bayes models in all the repeated validation runs. The good statistical margin attained by the MSVM is due to its improved capacity to generalize upon the heterogeneous international market, which is managed by the use of a multi-classifier fusion process and nonlinear RBF kernel mapping and which incorporates volatility-sensitive features. The overall results confirm the strength, consistency, and useful application of the suggested method to the predictive portfolio risk in the real world and the optimization of risks.