Metaheuristic optimization of deep CNNs for multi-class diagnosis of cervical cancer and lymphoma

This study proposed different metaheuristic optimization algorithms with pre-trained and fine-tunning VGG16 to enhance the performance of classification in Multi-Class Diagnosis of Cervical and Lymphoma Cancer. The suggested framework offers an outstanding approach for improving detection precision without necessitating the development of new structured algorithms. To ensure the effectiveness and robustness of the methodology, extensive testing is conducted across various cancer image datasets. This section discusses the suggested framework’s typical procedures including the main three steps: (i) data preprocessing, (ii) metaheuristic optimization & VGG-16 Fine tuning, and (iii) Cancer Classification.

Datasets

(I)

Datasets description

This study evaluates the proposed framework using two primary oncological datasets: Cervical Cancer (SIPaKMeD)^30,32 and Lymphoma^31,33, both sourced from the “Multi Cancer Dataset” repository.

Cervical cancer dataset: Originally consisting of 966 high-resolution images across five subclasses (Dyskeratotic, Koilocytotic, Metaplastic, Parabasal, and Superficial-Intermediate), the data was expanded via augmentation to 25,000 images to enhance model robustness³⁰.
Lymphoma dataset: This set initially comprised 966 images across three subclasses (Chronic Lymphocytic Leukemia, Follicular Lymphoma, and Mantle Cell Lymphoma). Following identical augmentation procedures, the dataset was expanded to 15,000 images³¹.

Tables 3 and 4 summarize the technical specifications and the final distribution of the training, validation, and testing sets used in our experiments.

Table 3 Details on the data set used for training and evaluating the proposed models.

Table 4 Distribution of dataset.

(a)

Cervical cancer dataset.

Cervical cancer is the fourth most common cancer among women in the world. Detection of cervical cancer cells has played a very important role in clinical practice. The Cervical Cancer dataset Images obtained from Prahlad Mehandiratta dataset³². It consists of 966 images of 5 subclasses, 223 Dyskeratotic, 238 Koilocytotic ,271 Metaplastic, 108 Parabasal Superficial, and 126 Superficial-Intermediate. Then the images were augmented by using rotation up to 10-degree, width and height shift: Up to 10% of the total image size, shearing and zooming: 10% variation, brightness adjustment: ranges from 0.2 to 1.2 for varying light conditions, and randomly horizontal flip³⁰. Samples of cervical cancer images are represented in Fig. 1.

(b)

Lymphoma dataset

Lymphoma consists of 966 images of 3 subclasses, 113 Chronic Lymphocytic Leukemia, 139 Follicular Lymphoma,122 Mantle Cell Lymphoma³³. Then the images were augmented by rotation up to 10-degree, width & height shift: up to 10% of the total image size, shearing & zooming: 10% variation, brightness adjustment: ranges from 0.2 to 1.2 for varying light conditions, and randomly Horizontal Flip³¹. Samples of Lymphoma images are represented in Fig. 2.

(II)

Datasets preprocessing

To prevent overfitting and improve generalization, we applied a standardized augmentation pipeline including 10° rotations, 10% width/height shifts, zooming, and brightness adjustments (0.2 to 1.2). Following augmentation, a two-step Preprocessing Pipeline was applied:

1.

Spatial Resizing: All images were resized to a uniform 224 × 224 pixels to meet the architectural requirements of the VGG-16 model.
2.

Normalization: Pixel intensities were rescaled from the standard [0, 255] range to [0, 1]. This step ensures numerical stability and accelerates gradient convergence during the optimization process.

Methodology for classification

This study employed transfer learning with both pre-trained and fine-tuned VGG-16 models to enhance performance and computational efficiency. Initially, the pre-trained model functioned as a feature extractor by the process of freezing its convolutional layers. Then, the original fully connected layers were substituted with custom dense layers made just for the target classification task. Furthermore, to ensure the model’s generalization and effective convergence, hyperparameter optimization was also employed to determine the best learning rate, optimizer type, batch size, and dropout rate. The article also systematically evaluated a number of optimizers, like Adam, RMSprop, and SGD, with different learning rate settings to ensure the stability of the training process. After the first step, the upper convolutional layers of the network were unfrozen and retrained with the best hyperparameters evaluated based on the metaheuristic optimizers on the pretrained VGG-16 to enhance the performance of the proposed model. The VGG-16 model was able to fit the target dataset well by using this combined method, which included feature extraction, hyperparameter optimization, and finally fine-tuning. All the mentioned techniques have ensured that the training phase is accurate, more generalized, and less costly. The Visual Geometry Group (VGG) at the University of Oxford made the VGG-16 architecture. It is a deep convolutional neural network architecture. It is well-known for being easy to operate, having a consistent design, and working well for many computer vision tasks. Figure 3 shows that the VGG-16 architecture has 16 trainable layers, 13 of which are convolutional layers and 3 of which are fully connected layers. The entire network uses small (3 × 3) convolution kernels and (2 × 2) max-pooling layers in the same structure on all layers. Because of this uniform structure³⁴, VGG-16 can gradually pull out abstract and distinguishing features while keeping the architecture clear.

Hyperparameter encoding and fitness function

To interface the metaheuristic algorithms with the VGG-16 architecture, each search agent (such as a whale in WOA or a particle in PSO) is assigned a position vector that represents a specific hyperparameter configuration. This configuration includes both continuous variables, such as the learning rate and dropout rate, and discrete variables, such as the batch size and the choice of optimizer (e.g., Adam, SGD, RMSprop). For discrete variables, the continuous output of the optimization algorithms is mapped using nearest-integer rounding to select the appropriate categorical value. To evaluate the quality of each search agent’s proposed configuration, a fitness evaluation mechanism is established. In this framework, the primary objective is to maximize the diagnostic capability of the VGG-16 model. Therefore, the fitness score assigned to each agent is the Validation Accuracy achieved after training the VGG-16 model for a defined number of epochs using that agent’s specific hyperparameter set. During each iteration, the metaheuristic optimizer records these fitness scores and updates the agents’ positions, systematically guiding the population toward the hyperparameter configuration that yields the highest validation accuracy without overfitting.

Optimization algorithms

Metaheuristic optimization techniques, including the Whale Optimization Algorithm (WOA), Grey Wolf Optimizer (GWO), Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Ant Colony Optimization (ACO), and Modified Particle Swarm Optimization (MPSO), are widely employed to optimize the hyperparameters selection of the pretrained and adaptively trained VGG-16 model. These optimization techniques were inspired by their proven effectiveness in addressing complex, high-dimensional problems in many fields including medical image analysis. PSO is recognized for its rapid convergence, rendering it highly suitable for swiftly identifying near-optimal solutions within extensive search spaces. GA’s genetic operators facilitate comprehensive search functionalities, enabling the exploration of an extensive array of solution spaces characterized by intricate cancer features. GWO sustains an equilibrium between exploration and exploitation, which is essential for preventing convergence to local optima during the training process. ACO mimics pheromone signaling to map out optimal trajectories within the feature space, while WOA is specifically favored for its bubble-net technique, which robustly handles high-dimensional complexity. The MPSO introduces structural tweaks designed to sustain diversity and prevent the search process from stagnating too early. These algorithms systematically explore the solution space to identify the optimal configuration for the deep learning models, enhancing classification accuracy while reducing computational redundancy. The proposed system employs metaheuristic optimization with VGG16 to the features of the cervical cancer and lymphoma datasets, thereby obviating the necessity for manual parameter tuning or algorithm adjustments. Within the realm of medical imaging and computer-aided diagnosis, the utilization of optimization algorithms serves as an effective approach for refining the parameters intrinsic to deep Convolutional Neural Networks (CNNs). Researchers aim to enhance the performance of the VGG-16 model by autonomously tuning essential hyperparameters to address the heterogeneity within cancer classes and the similarities across different classes. Table 5 summarizing the comparison of the metaheuristic optimization algorithms used in the proposed framework.

Table 5 The comparison of the proposed metaheuristic optimization algorithms.

(I)

Whale Optimization Algorithm (WOA)

The Whale Optimization Algorithm (WOA) emulates the characteristic social conduct and foraging methodologies of humpback whales, particularly their “bubble-net” feeding tactic. This approach simulates the hunting behavior of humpback whales, specifically their technique of encircling prey and generating spiral bubble nets to capture them³⁵. The algorithm initiates by designating the current leading candidate solution as the target prey, predicated on the assumption that it represents the optimal position in proximity to the true optimum³⁶. Following search agents, depicted here as whales, strive to modify their positions in relation to the most efficient search agent. The encompassing behavior is mathematically represented by the following Eqs. (1–5)^35,36:

$$D=\left| {~C~.~{X^*}\left( t \right) – X\left( t \right)} \right|$$

(1)

$$X\left( {t+1} \right)={X^*}\left( t \right) – A.D~~~~~~~$$

(2)

Where t is the iteration number, ${X^*}$ describes the best solution position vector obtained, X the search agent position agent, finally A and C are the coefficient vectors and calculated as Eqs. (4–5).

Where a descends from 2 to 0 across the iterations to encourage both exploration and exploitation. r is a random vector in the range [0, 1]. The algorithm uses a spiral equation to demonstrate the helix-shaped path of the whales as they get closer to their prey, such as in the bubble-net attack. This behavior is represented by the following Eq. (5).

$$X\left( {t+1} \right)=D^{\prime}.~{e^{bl}}.\cos \left( {2\pi l} \right)+{X^*}\left( t \right)$$

(5)

Where $D^{\prime}=~\left| {~{X^*}\left( t \right) – X\left( t \right)} \right|$ denotes the distance between the whale and the prey, b characterizes the shape of the logarithmic spiral, and l is a random variable within the interval [-1, 1]. The algorithm presumes a 50% likelihood of alternating between the shrinking encircling mechanism and the spiral model to update the whales’ positions.

(II)

Grey Wolf Optimizer (GWO)

The Grey Wolf Optimizer (GWO) is a metaheuristic optimization algorithm which employs the social hierarchy and hunting strategies of grey wolves as its foundation. It utilizes the leadership hierarchy of wolves to systematically search through complex search spaces to find the best solutions. GWO is a promising way to improve the accuracy and efficiency of computer vision classification applications by optimizing their parameters and settings in the field of medical images classification. The GWO has three main steps: initialization, updating, and selection. In the initialization phase, a random set of solutions is made in the search space. Each solution stands for one wolf in the pack. A grey wolf’s mathematical location (or solution) can be described as ${x_i}=\left( {{x_{i1}},{x_{i2}},{\text{~}}{x_{i3}}, \ldots ,{x_{iD}}} \right)$, where i = 1,2, ., N (N is the total number of wolves or population size) and D is the problem dimension³⁷. Prey is the best solution the algorithm is seeking for where $\alpha$ is the Fitness Solution, $\beta$ is the second-best solution, $\delta$ is the third-best solution, and $\varvec{\omega}$ is the rest of the solutions. The following formula, Eqs. (6–12)³⁸, mathematically represent the social hierarchy of grey wolves. The initial step occurs when grey wolves surround their prey³⁸.

$$\vec {D}=\left| {\vec {C} \cdot \overrightarrow {{P_t}} \left( i \right) – ~\vec {P}\left( i \right)} \right|$$

(6)

$$\vec {P}\left( {i+1} \right)=\overrightarrow {{P_t}} \left( i \right) – \vec {D} \cdot \vec {A}$$

(7)

Where, $\vec {A}$ and $\vec {C}$ are the coefficients vectors, ${P_t}$ is the position of the target prey, $\vec {P}$ is the gray wolf position vector, and i is the iteration number. $\vec {C}$ and $\vec {A}$ can mathematically described as³⁸:

$$\vec {A}=2\vec {a} \cdot \overrightarrow {{r_1}} – ~\vec {a}{\text{~}},{\text{~}}\vec {a}=2\left( {\frac{{1 – i}}{I}} \right)$$

(8)

$$\vec {C}=2 \cdot \overrightarrow {{r_2}}$$

(9)

Where $\vec {a}$ Linearly reduced over the iterations between 2 and zero, i is the number of the iteration, I is the iterations maximum number and $\overrightarrow {{r_1}}$and $\overrightarrow {{r_2}}$ are random vectors within the range of [0–1].

$$\:\overrightarrow{{D}_{\alpha\:}}=\:\left|\overrightarrow{{C}_{1}}\cdot\:\overrightarrow{{P}_{\alpha\:}}-\:\overrightarrow{P}\right|,\:\overrightarrow{{D}_{\beta\:}}=\:\left|\overrightarrow{{C}_{2}}\cdot\:\overrightarrow{{P}_{\beta\:}}-\:\overrightarrow{P}\right|,\:\overrightarrow{{D}_{\delta\:}}=\:\left|\overrightarrow{{C}_{3}}\cdot\:\overrightarrow{{P}_{\delta\:}}-\:\overrightarrow{P}\right|$$

(10)

$$\:\overrightarrow{{P}_{1}}=\overrightarrow{{P}_{\alpha\:}}-\overrightarrow{{A}_{1}}\cdot\:\overrightarrow{{D}_{\alpha\:}}\:,\:\overrightarrow{{P}_{2}}=\overrightarrow{{P}_{\beta\:}}-\overrightarrow{{A}_{2}}\cdot\:\overrightarrow{{D}_{\beta\:}},\:\overrightarrow{{P}_{3}}=\overrightarrow{{P}_{\delta\:}}-\overrightarrow{{A}_{3}}\cdot\:\overrightarrow{{D}_{\delta\:}}$$

(11)

$$\:\overrightarrow{P}\left(t+1\right)=\:\frac{\overrightarrow{{P}_{1}}+\overrightarrow{{P}_{2}}+\overrightarrow{{P}_{3}}}{3}$$

(12)

Figure 4 represents the social hierarchy and hunting strategies of grey wolves.

(III)

Particle Swarm Optimization (PSO)

Particle Swarm Optimization (PSO) is a popular method of metaheuristic optimization that is based on how groups of birds or fish move together. Particle Swarm Optimization (PSO) utilizes a group of possible solutions, called particles, each with its own position and speed, to search for space with more than one dimension. The basic idea is that these particles will gradually improve until they reach the best possible state, based on their own and each other’s experiences³⁹. To make the mathematical model of Particle Swarm Optimization (PSO), employ Eq. (13) to modify the velocity and Eq. (14) to modify the position⁴⁰.

$$\:{\vartheta\:}_{i,d}\left(t+1\right)=\:\omega\:\times\:{\vartheta\:}_{i,d}\left(t\right)+\:{c}_{1}\times\:{r}_{1}\times\:\left({{P}_{b}}_{i,d}-{P}_{i,d}\left(t\right)\right)+{c}_{2}\times\:{r}_{2}\times\:\left({{G}_{b}}_{d}-{P}_{i,d}\left(t\right)\right)$$

(13)

$$\:{P}_{i,d}\left(t+1\right)={P}_{i,d}\left(t\right)+\:{\vartheta\:}_{i,d}\left(t+1\right)\:$$

(14)

Where,, $\:{c}_{1}$ and $\:{c}_{2}$ are the acceleration constants, $\:\omega\:$ is the inertia weight for local and global exploration balancing, $\:{\vartheta\:}_{i,d}$ is the initial velocity, $\:{r}_{1}$ and $\:{r}_{2}$ are random numbers in the range (0,1), $\:{P}_{i,d}$ is the initial position for each i which is the particle and the dimension is d, $\:{P}_{b}$ is the agent’s best solution, and the global best solution is $\:{G}_{b}$.

(IV)

Genetic Algorithm (GA)

Genetic Algorithm (GA) is an effective method for identifying the best solution. It is based on genetics and the idea of natural selection. GA mimics the processes of evolution to quickly search through complex solution spaces⁴¹. The algorithm starts with a set of possible solutions and then utilizes an objective function to determine their performance. The GA framework includes chromosome visualization, selection, crossover, mutation, and the calculation of fitness functions. The GA method starts by making a population of n chromosomes. The fitness of each chromosome is evaluated and based on the fitness score two chromosomes are chosen. To make children, these chosen chromosomes are crossed over at one point. This offspring then undergoes a uniform mutation process, resulting in the creation of new offspring. This new offspring is added to the population. The operations of selection, crossover, and mutation continue until the new population is completed. GA adapts its method of search based on the individuals it examines, which lets it find many optimal solutions. It maintains the diversity of the population by changing the original schema with a new version. The next three Eqs. (15–17) show the main parts of GA mathematically⁴².

$$\:{S}_{p}\left(x\right)=\frac{f\left(x\right)}{\sum\:_{i=1}^{N}f\left(x\right)}$$

(15)

Where $\:{S}_{p}$ is the selection probability, and $\:N$ is the solution size, $\:f\left(x\right)$ is the fitness function, and $\:x$ is the individual solution.

$$\:{x}_{offspring}=concat({x}_{\text{1,1}:k},\:{x}_{2,k+1:N})$$

(16)

Where,, $\:k$ is the point of the cross-over and $\:N$ is the length of the solution.

$$\:{x}_{mutated}=\:\left\{\begin{array}{c}{x}_{offspring}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:with\:probaility\:1-{p}_{m}\\\:mutate\left({x}_{offspring}\right)\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:with\:probaility\:{p}_{m}\end{array}\right.$$

(17)

Where, $\:{x}_{offspring}$ is the offspring, and $\:{p}_{m}$ parameter of probability which is set before running the genetic algorithm.

(V)

Ant Colony Optimization (ACO)

The Ant Colony Optimization (ACO) algorithm is a probabilistic method based on the behavior which ant colonies follow in searching for food. For example, it is based on how ants can find the shortest path between their nest and a food source⁴³. This method uses “pheromones,” chemical signals that ants release to guide other ants in the colony. A group of artificial ants moves through a graph that shows the states of the problem to find solutions in the computational model. Two primary factors affect the probability of an ant ($\:k$) moving from node (n) to node (m): the pheromone concentration on the edge $\:{\tau\:}_{nm}$ and a heuristic value $\:{\aleph\:}_{nm}$, that determines desirability the move (often defined as the inverse of the distance). The transition probability is represented as follows in Eq. (18)^43,44.

$$\:{P}_{nm}^{k}=\frac{{\left({\tau\:}_{nm}\right)}^{\alpha\:}{\left({\aleph\:}_{nm}\right)}^{\beta\:}}{\sum\:_{l\in\:{N}_{n}^{k}}{\left({\tau\:}_{nl}\right)}^{\alpha\:}{\left({\aleph\:}_{nl}\right)}^{\beta\:}}\:,\:\text{i}\text{f}\:j\in\:\:{N}_{n}^{k}\:$$

(18)

Here, $\:{N}_{n}^{k}$ is the group of nodes that ant k can move to from node n. The parameters $\:\alpha\:$ and $\:\beta\:$ demonstrate the importance of the pheromone trail in comparison to the heuristic information. After all the ants have come up with their own solutions, the pheromone trails are updated to demonstrate the solutions accuracy. This update process has two phases: evaporation, which stops pheromones from building up excessively and deposition, in which ants leave new pheromones along the paths they have traveled. Equation (19) describes the global pheromone update rule⁴³.

$$\:{\tau\:}_{nm}\left(t+1\right)=\left(1-\:\epsilon\:\:\right)\:\cdot\:\:{\tau\:}_{nm}\left(t\right)+\:\sum\:_{k=1}^{i}{\varDelta\:\tau\:}_{nm}^{k}\:$$

(19)

Where $\:\epsilon\:$ presents the rate of evaporation (where 0 < $\:\epsilon\:$ < 1), and $\:{\varDelta\:\tau\:}_{nm}^{k}$ presents the pheromone amount deposited by the kth ant, typically correlated to the efficiency of the solution it developed. This feedback mechanism guarantees that, over time, the colony progresses toward the most optimal route.

(VI)

Modified Particle Swarm Optimization (MPSO)

Modified Particle Swarm Optimization (MPSO) improves upon the standard PSO framework to mitigate problems of premature convergence and stagnation at local optima, which are common challenges in high-dimensional optimization. Although conventional PSO utilizes static parameters, MPSO enhances search efficiency through the incorporation of an adaptive inertia $\:\omega\:$ omega and an acceleration factor $\:\alpha\:$. The adaptive inertia weight $\:\omega\:$ diminishes linearly throughout the execution process, transitioning the emphasis from global exploration in the initial phases to local exploitation in the subsequent stages. The acceleration factor actively pushes particles toward the best global position based on their own history⁴⁵. This change is meant to make the search process easier and cut down on the time it takes to find the best hyperparameters for deep learning models. Equation (20) defines the update rule in MPSO⁴⁵.

$$\:{\vartheta\:}_{i}^{t+1}=\:\omega\:\left(t\right)\cdot\:\:{\vartheta\:}_{i}^{t}+\:{c}_{1}\cdot\:{r}_{1}\cdot\:({P}_{best,i}-{x}_{i}^{t}{)+c}_{2}\cdot\:{r}_{2}\cdot\:\left({G}_{best}-{x}_{i}^{t}\right)-\:\alpha\:({G}_{best}-\:{P}_{best,i})$$

(20)

Where, $\:{c}_{1}$ and $\:{c}_{2}$ are the coefficients of acceleration, $\:{r}_{1}$ and $\:{r}_{2}$ are random vectors in range [0–1], $\:\alpha\:\:$is the acceleration factor, $\:{G}_{best}$ is the global best solution and $\:{P}_{best,i}$ is the local best solution. The updated velocity position $\:{x}_{i}^{t+1}$ for particle i is represented in equations (21).

$$\:{x}_{i}^{t+1}={x}_{i}^{t}+{\vartheta\:}_{i}^{t+1}$$

(21)

Integration workflow of the dual-strategy framework

The integration of VGG-16 with the metaheuristic algorithms operates through a systematic, nested loop architecture. The operational workflow proceeds as follows:

Step 1: Initialization: The target dataset (Cervical or Lymphoma) is preprocessed, augmented, and divided into training, validation, and test subsets.

Step 2: Population Setup: A population of N = 5 search agents is initialized randomly within the predefined hyperparameter bounds.

Step 3: Pre-trained Optimization Phase: For a maximum of T = 3 iterations, each agent’s hyperparameter set is injected into a VGG-16 model (with frozen convolutional bases). The model is trained on the training set, and the validation accuracy is returned to the optimizer as the fitness score.

Step 4: Metaheuristic Update: The chosen optimization algorithm (e.g., WOA, GWO) updates the agents’ positions based on the highest recorded fitness score.

Step 5: Fine-tuning Phase: Once the global best hyperparameter configuration is identified, the upper convolutional layers of the VGG-16 are unfrozen. The model is then fully retrained (fine-tuned) using these optimal hyperparameters to adapt the feature extraction process specifically to the oncological dataset.

Step 6: Final Evaluation: The final optimized model is evaluated against the unseen test dataset to generate the ultimate accuracy, precision, recall, and specificity metrics.

Computational complexity analysis

The overall computational complexity of the proposed framework is governed by two primary factors: the execution time of the metaheuristic algorithm and the evaluation time of the VGG-16 fitness function. Let N represent the population size (number of search agents), T denote the maximum number of iterations, and D represent the dimension of the search space (the number of hyperparameters being optimized, which is D in this study).

The computational time for initializing the population is O(N×D). During the optimization phase, updating the positions of the search agents across algorithms like WOA, GWO, and PSO requires O(T×N×D). However, evaluating the fitness of each agent requires training in the pre-trained VGG-16 model for a set number of epochs. If the computational cost of training CNN is denoted as O(C_CNN), the total complexity of the proposed framework can be formulated as:

$${\text{O}}\left( {{\text{Total}}} \right)\,=\,{\text{O}}({\text{N}} \times {\text{D}})\,+\,{\text{O}}({\text{T}} \times {\text{N}} \times ({\text{D}}\,+\,{{\text{C}}_{{\text{CNN}}}}))$$

Because the time required to train a deep neural network is vastly greater than the time required to update the metaheuristic positions (C^CNN > > D), the asymptotic computational complexity fundamentally simplifies to O(T×N×C_CNN). By deliberately restricting the population size (N = 5) and iterations (T = 3) during the hyperparameter search phase, the proposed framework strategically minimizes the T×N multiplier. This ensures that the computational overhead remains highly manageable while successfully identifying the global optimum for the final fine-tuning phase.

Source link