Dataset, hardware and software
In this study, the Breast Ultrasound Images Dataset medical data sets (BUSI)38 is used, and it is publicly available for academic and research purposes through open-access repositories. The images were collected in 2018 from 600 female patients. The dataset holds benign 487 records, malignant 210 and normal 133, respectively. The images are in PNG format with a resolution of 500 × 500 pixels. The advantage of BUSI is that each image is followed by a binary mask of the lesion’s ground truth. In the proposed study, only benign and malignant classes were used because the HFs are derived from texture and binary mask. The AutoMLs were supplied a dataset splited into 70% (487 samples/images) for train, and 30% (210 samples/images) for testing.
The hardware environment had the following architecture: Processor Intel(R) Core (TM) i7-1065G7 CPU @1.30–1.50 GHz, RAM 16.0 GB, and Windows 11 operating system, 64-bit, × 64-based processor. The environments programming Matlab R2019 with Image processing toolbox for computing of features and Python (version 3.9), PyCaret (version 3.0.4), and TPOT (version 0.12) versions, respectively, were used.
Hybrid image features
Computed features are called hybrid features, because they are extracted both from texture and shape of the object in an image. Different textural, geometric, fractal, statistical, or other measures could be used to calculate the features.
Their particular characteristics depend on texture information and shape and are interpreted by the dictionary of BC type expressed in US images. Additionally, a reliable analysis of the lesion is ensured by the computation of textual information from pixel arrangements within the BC lesion. The HFs go through three primary stages: (i) extracting Haralick’ features from the BC lesions only; (ii) computing Hu moments from the mask of the lesions; and (iii) obtaining complex HF through integrating the first two features type in a polynomial regression; this approach was proposed because hybrid features are interpreted as dependent variables, in which the coefficients are given by Hu moments and independent variable by Haralick’ features, respectively.
Haralick’ features
An intuitive idea in distinguishing between breast lesions with Haralick’ features is that these show the pixel arrangements, which vary in the presence of microcalcifications, branching patterns, or smoothing breast lesion textures39,40.
In order to avoid redundant information, only the breast lesions were analysed; for this, the ground truth images (Fig. 1, column b) were overlapped in the original images (Fig. 1, column a), thus resulting the region of interest (Fig. 1, column c).
The texture features of a grayscale image are associated with Gray Level Co-Occurrence Matrix (GLCM)41. The important Haralick features as energy (EN), entropy (ENT) homogeneity (H), contrast (C), correlation (CR) and dissimilarity (D) in Eqs. (1–6) are expressed.
$$EN = \mathop \sum \limits_{i = 1}^{N – 1} \mathop \sum \limits_{j = 1}^{N – 1} p\left( {i, j} \right)^{2}$$
(1)
$$ENT = – \mathop \sum \limits_{i = 1}^{N – 1} \mathop \sum \limits_{j = 1}^{N – 1} p\left( {i,j} \right)\log p\left( {i,j} \right)$$
(2)
$${\text{H}} = \mathop \sum \limits_{i = 1}^{N – 1} \mathop \sum \limits_{j = 1}^{N – 1} \frac{{p\left( {i,j} \right)}}{{1 + \left( {i – j} \right)^{2} }}$$
(3)
$$C = \mathop \sum \limits_{i = 1}^{N – 1} \mathop \sum \limits_{j = 1}^{N – 1} \left( {i – j} \right)^{2} p\left( {i,j} \right)$$
(4)
$$CR = \mathop \sum \limits_{i = 1}^{N – 1} \mathop \sum \limits_{j = 1}^{N – 1} p\left( {i,j} \right)\left( {\frac{{1 – \mu_{x} }}{{\sigma_{x} }}} \right)\left( {\frac{{1 – \mu_{y} }}{{\sigma_{y} }}} \right)$$
(5)
$$D = \mathop \sum \limits_{i = 1}^{N – 1} \mathop \sum \limits_{j = 1}^{N – 1} \left| {i – j} \right|p\left( {i,j} \right)$$
(6)
N is the number of gray levels, μx, μy and σx, σy are the means and standard deviations in Oy and Oy directions, and \(p\left( {i,j} \right)\) is the probability density of the pixels located at established distance41,42. From features expressed by (1–6) equations, it was built the set \(x_{fi} \in \left\{ {EN, ENT, H, C, CR, D} \right\}\).
The mentioned texture features showed significant differences in BC lesions and presented some of the most crucial image texture properties39.These were computed from all the images for prediction43, monitoring44 and classification45 of BC.
Hu moments
Hu Moments are descriptors that are used to describe shape objects in an image. They begin with a shape’s regular moment in a binary image.
In the analysis of BC, a critical parameter is the shape of the lesion, because it suggests the type of cancer. Figure 1b1 and b2 display a sample from the BUSI dataset, which contains ground truth for each lesion. In computer vision, the invariant Hu moments are commonly used as global form descriptors36, so that their capacity in analysing shape is explored. The shape of benign BC is well circumscribed; meanwhile, malignant BC is ill-defined in borders and has a non-uniform shape36. The BC types have different characteristics of shapes; thus, the Hu moments are viewed as shape descriptors, because they are invariant to translation, scale, rotation, and reflection36,46. In this context, only first six moments were proposed as coefficients in a polynomial regression, and their mathematical approaches are shown below:
$$M_{ij} = \mathop \sum \limits_{x} \mathop \sum \limits_{y} x^{i} y^{j} I\left( {i,j} \right)$$
(7)
where \(I\left( {i,j} \right)\) is the pixel intensity value located at the coordinates \(\left( {i,j} \right)\).
The centroid of the shape, is also called the geometric centre as it is a point that fixes the centre of gravity of a particular shape.
$$\overline{x} = \frac{{M_{10} }}{{M_{00} }}\;{\text{and}}\;\overline{y} = \frac{{M_{01} }}{{M_{00} }}$$
(8)
The centroid shape coordinates can be used to calculate the relative moments in the following way:
$$\mu_{pq} = \mathop \sum \limits_{x} \mathop \sum \limits_{y} \left( {x – \overline{x}} \right)^{p} \left( {y – \overline{y}} \right)^{q} I\left( {i,j} \right)$$
(9)
Taking into account that seven separate invariant moments for shape discrimination can be built44, they can be expressed by equations as follows (10)-(16).
$$\eta_{1} = \mu_{20} + \mu_{02}$$
(10)
$$\eta_{2} = \left( {\mu_{20} – \mu_{02} } \right)^{2} + 4\mu_{11}^{2}$$
(11)
$$\eta_{3} = \left( {\mu_{30} – 3\mu_{12} } \right)^{2} + \left( {3\mu_{21} – \mu_{03} } \right)^{2}$$
(12)
$$\eta_{4} = \left( {\mu_{30} + \mu_{12} } \right)^{2} + \left( {\mu_{21} + \mu_{30} } \right)^{2}$$
(13)
$$\begin{aligned} \eta_{5} & = \left( {\mu_{30} – 3\mu_{12} } \right)\left( {\mu_{30} + \mu_{12} } \right)\left( {\left( {\mu_{30} + \mu_{12} } \right)^{2} – 3\left( {\mu_{21} + \mu_{03} } \right)^{2} } \right) \\ & \quad + \left( {3\mu_{21} – \mu_{03} } \right)\left( {\mu_{21} + \mu_{03} } \right)\left( {3\left( {\mu_{30} + \mu_{12} } \right)^{2} – \left( {\mu_{21} + \mu_{03} } \right)^{2} } \right) \\ \end{aligned}$$
(14)
$$\eta_{6} = \left( {\mu_{20} – \mu_{02} } \right)\left( {\left( {\mu_{30} + \mu_{12} } \right)^{2} – \left( {\mu_{21} + \mu_{03} } \right)^{2} } \right) + 4\mu_{11} \left( {\mu_{30} + 3\mu_{12} } \right)\left( {\mu_{21} + \mu_{03} } \right)$$
(15)
Polynomial regression and feature computing
A polynomial regression refers to a nonlinear combination of coefficients and independent variables. The proposed method adapts a polynomial regression of degree six, obtaining the \(y_{i}\) (i = 1,2,3…,6) HFs through a nonlinear combination of the polynomial regression’s coefficients \(\eta_{j}\)(j = 1,2,3…,6) (Hu moments) and the independent variable \(x_{fi} \in \left\{ {EN, ENT,H,C,CR,D} \right\}\) Haralick’s features.
The general form of a polynomial regression of degrees six is given by the following equation46,47.
$$y_{i} = \eta_{1} x_{fi}^{5} + \eta_{2} x_{fi}^{4} + \cdots + \eta_{5} x_{fi} + \eta_{6} \;\left( {{\text{i}} = {1},{2}, \ldots ,{6}} \right)$$
(16)
Equation (17), which expresses Eq. (16) in detail using matrices, shows hybrid features in the first column matrix, independent variables in the main matrix, and coefficients in the last column matrix.
$$\left[ {\begin{array}{*{20}c} {y_{1} } \\ {y_{2} } \\ {y_{3} } \\ {y_{4} } \\ {y_{5} } \\ {y_{6} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {x_{f1}^{5} } & {x_{f1}^{4} } & {x_{f1}^{3} } & {x_{f1}^{2} } & {x_{f1} } & 1 \\ {x_{f2}^{5} } & {x_{f2}^{4} } & {x_{f2}^{3} } & {x_{f2}^{2} } & {x_{f2} } & 1 \\ {x_{f3}^{5} } & {x_{f3}^{4} } & {x_{f3}^{3} } & {x_{f3}^{2} } & {x_{f3} } & 1 \\ {x_{f4}^{5} } & {x_{f3}^{4} } & {x_{f4}^{3} } & {x_{f4}^{2} } & {x_{f4} } & 1 \\ {x_{f5}^{5} } & {x_{f3}^{4} } & {x_{f5}^{3} } & {x_{f5}^{2} } & {x_{f5} } & 1 \\ {x_{f6}^{5} } & {x_{f3}^{4} } & {x_{f6}^{3} } & {x_{f6}^{2} } & {x_{f6} } & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\eta_{1} } \\ {\eta_{2} } \\ {\eta_{3} } \\ {\eta_{4} } \\ {\eta_{5} } \\ {\eta_{6} } \\ \end{array} } \right]$$
(17)
The polynomial regression was used to the detriment of linear regression because the last one is a linear combination between features, and in fact the HF can be interpreted as a product between two types of features; instead, the polynomial regression reinforces the contribution of Hu moments in the final value of HF, giving decreasing weights in order of importance of Hu moments. For example, the first moment is the most important moment because it expresses the area of the lesion36.
The Fig. 2 displays the processed image types. The original images for benign and malignant conditions are displayed in first column, the binary mask of ROIs is displayed in column two, and the overlap of the first two is displayed in column three. Because only evaluating the texture of the ROI is relevant, the images presented in column three were used to compute textural features. The binary image was used to calculate the Hu moment. The first line indicates a benign lesion, while the second line indicates a malignant lesion, images from BUSI dataset. The images shown in column (a) and (b) belong BUSI dataset and column (c) depicts the images obtained in a pre-processed process.
There is a strong association between the morphology of the breast tumour and the tumour diagnostic outcome48. Because segmented lesions include enough shape information to distinguish between benign and malignant tumours, it is important to consider these aspects49. Additionally, distinct types of tissue can be identified using textural properties50. Together form hybrid features \(y_{1} ,y_{2,} y_{3} ,y_{4,} y_{{5,{ }}}\) and \({ }y_{6}\) supplied the AutoMLs tools.
A pre-analysis of data was necessary, because the justification of classification is necessary. In this way, the probability distributions of the hybrid features \(y_{1} ,{ }y_{2} , y_{3} ,{ }y_{4} , y_{{5,{ }}}\) and \({ }y_{6}\) with violin plot was plotted.
AutoML and ensemble model learning
An ensemble strategy for classifying benign and malignant classes is presented in this work. This procedure has been divided into three phases: (i) collecting and annotating data for the AutoMLs; (ii) automated machine learning, or AutoML for short, is a new discipline where building ML models from data is done automatically. In the proposed study, two distinct AutoMLs as PyCaret and TPOT were suggested while (iii) validating the data was done by using ensemble bagging, boosting, and stacking techniques. This type of classifier is proposed because PyCaret optimizes model hyperparameters and boosts ML productivity. The GA, which the TPOT uses, gives answers for finding and optimizing issues in ML.
The Hfs had to be annotated because a binary classification was used, and the data was collected from the BUSI image database, which included both benign and malignant classes.
PyCaret is an open-source machine learning library in Python and this tool automates, evaluates and compares ML classifiers, having integrated the fourteen classifiers: LightGBM, GBC, KNN, ADA, LR, RF, SVM, VM, ET, DT, Ridge, LDA, QDA and NB. Moreover, it is a method for hyperparameter tuning. Model training, explainability, data preprocessing, and exploratory data analysis are all supported by PyCaret. Additionally, PyCaret offers data scaling and transformation methods to change the shape of the distribution and reduce the magnitude of variation of image classification23.
The pipeline optimization tool (TPOT) based on GA uses Python automated ML system. It offers the optimal pipeline that was discovered. Moreover, it allows modifying the pipeline deeper17. The top-performing models from each generation are provided by TPOT and combined to produce a new generation of models that produces the best results in a classification task18. The proposed TPOT version consisted of the following classifiers: BNB, MNB, DT, ET, RF, GBC, KNN, SVC, LR, XGB, and MLP.
An EML technique called bootstrap aggregation, often known as bagging, is used to overcome overfitting issues in regression and classification. Only the predicted set that the classifiers provide is used for BA; the training set is divided into subsets using bootstrap and cross-validation. The model’s base prediction is constructed by merging the forecast outcomes for every subset. In this work, the KNN classifier is used; typically, this technique is applied using DT, NB, or KNN classifiers51,52.
A significant benefit of utilizing the EML technique is its integrated k-fold cross-validation option. The results obtained are computed for both bagging and stacking using tenfold cross-validation. This method not only enhances the reliability of the model’s performance estimates but also aids in reducing overfitting by validating the model on multiple data subsets.
The logistic regression method is used for stacking in the classification process. In contrast to bagging, stacking increases the final forecast’s interpretability by confirming each prediction. The procedure may be divided into two parts: first, baseline models are used to forecast test dataset results, and then, a meta-classifier is used to create new predictions by using all of the baseline models’ predictions as input51,52,53.
An EML technique called “boosting” is used to evaluate the performance of a group of weak classifiers and turn them into strong classifiers. This approach was first created for classification when a predictor’s average accuracy was measured.
Boosting deals with the sequential learning of the predictors is used in the EML process where each classifier is trained using the entire data set, and subsequent classifiers are trained using the sets that performed the best in previous runs. To create a powerful classifier by linearly combining weak classifiers, the XGB classifier was chosen, which is recognized as the most popular boosting technique50.
Dataset collection and performance evaluation
Breast US images are in grayscale and the BUSI dataset, also it provides the ground truth of lesion considered binary mask, where the shape and the size of the breast lesion is given. Figure 1 illustrates samples of the original scanned images and the ROI and its binary mask. The features \({ }y_{1} ,y_{2,} y_{3} ,y_{4,} y_{{5,{ }}}\) and \({ }y_{6}\) are computed for the entire image database: benign 487, malignant 210 images. The BUSI dataset includes one, two, or three masks for the same ROI; in this instance, only the larger mask was retained. Figure 3 shows a graph for each HF obtained with Eq. (16). The resulting values for the benign and malignant classifications are interpreted using the violin graph. The data was grouped around the median value and this type of graph was selected because it shows the mean (black line) and median (red line) values for each grouping.

Violin plot for each HF; (a) y1, (b) y2, (c) y3, (d) y4, (e) y5, (f) y6.
The main objective of the binary classification method is to identify a model and the relationship between the input dataset, which includes HFs. Hence, at first, in this research, the confusion matrix is considered as one of the common performance indicators for determining the model prediction. In addition, in terms of accuracy, F1-score, Matthews correlation coefficient and area under the curve, the performance of classification was evaluated, and computed from the basic confusion matrices \(\left[ {\begin{array}{*{20}c} {TP} & {FP} \\ {FN} & {TN} \\ \end{array} } \right]\), where TP and TN are the numbers of samples whose prediction results in accordance with the observation class. FP and FN are the numbers of the samples predicted to be the opposite class.
The MCC is a standard measure for analyse of imbalanced data. It generates a value in the [− 1; + 1] a high value for all the four basic rates of the CM and low value otherwise. Imbalanced classes can lead to misleading accuracy metrics; using MCC, a more comprehensive view of how well a classifier is performing across both classes can be performed. ACC, AUC and F1-score can range from 0 (worst result) to 1 (perfect result), ACC and F1-score the most used metrics in evaluation, that express the ratio of TP and TN over all the elements and the mean of precision and recall. AUC gives information about precision or negative predictive value obtained by a classifier54.
$$ACC = \frac{TP + TN}{{TP + TN + FP + FN}}$$
(18)
$$AUC = 1 – \frac{1}{2}\left( {\frac{FP}{{FP + TN}} + \frac{FN}{{FN + TP}}} \right)$$
(19)
$$F1 – score = \frac{{2{*}precision{*}recall}}{precision + recall}$$
(20)
$$precision = \frac{TP}{{TP + FP}}$$
(21)
$$recall = \frac{TP}{{TP + FN}}$$
(22)
$$MCC = \frac{TP \cdot TN – FP \cdot FN}{{\sqrt {\left( {TP + FP} \right) \cdot \left( {TP + FN} \right) \cdot \left( {TN + FP} \right) \cdot \left( {TN + FN} \right)} }}$$
(23)
The BUSI free and public image database was used for this study. The ethics committee of the Baheya Hospital for Early Detection & Treatment of Women’s Cancer, Cairo, Egypt, approved all experimental protocols, and patient protection is discussed in chapter “2.4. Ethical Considerations” of the reference38. In this context, all methods were carried out in accordance with relevant guidelines and regulations, and all experimental protocols were approved.
