Machine learning approach for wheat variety identification using single-seed imaging

Machine Learning


Models performance

Tables 2 and 3 present the confusion matrices and evaluation metrics for the models examined in this study. As the results indicate, the proposed model demonstrates the most consistent and accurate classification performance among all evaluated architectures. Its confusion matrix exhibited strong diagonal dominance, indicating reliable separation of wheat varieties with minimal inter-class overlap. Quantitatively, it attained the highest accuracy (0.922) and F1-score (0.936), outperforming both EfficientNet-B4 (0.852) and Inception-ResNet-v2 (0.830). The model’s balanced precision (0.929) and recall (0.943) demonstrate effective generalization and robust feature extraction. Although the MLP-PCA model achieved lower complexity and competitive accuracy (0.860), its slightly reduced recall and F1-score reveal limited capability in complex cases. Nonetheless, MLP-PCA outperformed both Inception-ResNet-v2 and EfficientNet-B4, showing that combining PCA-based feature reduction with a tuned neural network can yield efficient and accurate classification. Furethermore, the Proposed CNN-GAP model achieved the narrowest confidence interval (0.922 ± 0.044), indicating highly stable performance across varieties. The MLP-PCA and EfficientNet-B4 models also demonstrated moderate stability, with intervals of 0.860 ± 0.059 and 0.860 ± 0.076, respectively. In contrast, Inception-ResNet-v2 exhibited the widest interval, reflecting greater variability and a higher tendency toward misclassification. These statistics confirm that the Proposed CNN-GAP model not only achieves the highest accuracy but also demonstrates superior consistency and generalization within the wheat dataset.

Table 2 Confusion matrix of the investigated models.
Table 3 Performance evaluation and comparison of different models on the test dataset.

Although Inception-ResNet-v2 and EfficientNet-B4 are state-of-the-art deep architectures, their lower accuracy in this study can be attributed to several technical factors. Both models contain a very large number of parameters, which makes their performance highly dependent on the availability of large and diverse training datasets. When applied to a relatively small dataset, these deep networks are prone to overfitting. This tendency is evident in their confusion matrices, where a higher frequency of misclassifications appears, particularly among visually similar wheat variety classes. In contrast, the Proposed CNN-GAP model employs a lighter architecture that reduces overfitting and yields a confusion matrix with stronger diagonal dominance, indicating more reliable class separability.

Furthermore, the MLP-PCA model benefits from the dimensionality-reduction capability of PCA, which effectively removes noisy, redundant, and highly correlated features. This process results in a more compact and discriminative feature representation, contributing to the model’s higher precision and recall values. Overall, the Proposed Model and MLP-PCA demonstrated the best performance and were therefore selected for further optimization and comparative evaluation.

MLP performance with dimensionality reduction

As illustrated in Fig. 4, the principal component analysis (PCA) method significantly improved the accuracy of the MLP classifier. Using 27 PCA-selected features yielded an average validation accuracy of 90.55%, compared with 89.14% when all 58 extracted features were used. Thus, dimensionality reduction enhanced classification by mitigating noise and multicollinearity. Comparable findings were reported by Asif et al.27, who achieved 92.3% accuracy in rice grain classification using PCA-reduced morphological features. Similarly, Gayathri et al.28 demonstrated that feature selection improved MLP accuracy for rice disease classification from 86.6 to 90.2%. These results collectively confirm that PCA-based feature selection improves generalization and training efficiency for seed classification tasks.

Fig. 4
Fig. 4

Effect of dimensionality reduction on (a) accuracy and (b) loss of the MLP for both the training and validation datasets.

Optimization of MLP architecture

To achieve optimal MLP performance, the number of neurons and layers was systematically varied (Fig. 5). The first hidden layer ranged from 27 to 57 neurons, and the second from 14 to 28 neurons. The training was repeated 10 times for each configuration, and mean ± standard deviations were calculated for performance metrics. Results demonstrated that while neuron number had a fluctuating influence, two hidden layers with 37 and 21 neurons, respectively, combined with a dropout rate of 0.3, achieved the best performance.

Fig. 5
Fig. 5

Effect of the number of neurons and layers on the performance of the MLP for the train dataset. 27 PCA were used as the inputs of the MLP.

The dropout layer improved model generalization by preventing co-adaptation among neurons. Validation accuracy increased from 87.64% to 89.55% with dropout (Fig. 6), confirming its regularization effect29,30. However, a dropout ratio of 0.5 led to a decline in accuracy due to excessive weight deactivation. These outcomes align with prior studies emphasizing that moderate dropout (0.2–0.4) often yields the best regularization balance31,32.

Fig. 6
Fig. 6

Effect of dropout on the (a) accuracy and (b) loss of the MLP model for the train and validation dataset.

Effect of GAP and FCL on CNN performance

Figure 7 compares the convergence behavior of CNNs employing global average pooling (GAP) and fully connected layers (FCL). The CNN-GAP exhibited stable learning behavior, with training and validation curves converging smoothly, suggesting minimal overfitting. Conversely, the CNN-FCL achieved higher training accuracy but demonstrated divergence in loss curves—an indication of overfitting.

Fig. 7
Fig. 7

The trend in (a) accuracy and (b) loss of proposed-GAP and proposed-FCL for the train and validation dataset.

The superior performance of the GAP-based network can be attributed to its parameter efficiency, reducing the risk of overfitting and enhancing generalization. GAP eliminates dense weight matrices, decreases computational burden, and improves robustness—particularly valuable for small datasets. Similar trends were observed by Haseli Golzar et al.33 in the classification of cucumber and Azadnia et al.20 in medicinal plant identification, where GAP-based CNNs improved both efficiency and accuracy compared with dense-layer designs.

Statistical evaluation confirmed the robustness of CNN-GAP performance, with ANOVA tests (p < 0.05) showing significant improvement over FCL-based CNNs and MLP models. Across five independent training runs, the mean test accuracy for CNN-GAP was 92.2 ± 0.044%, highlighting strong repeatability.

Influence of structural parameters

The effect of architectural parameters on CNN performance is summarized in Table 4. Among the tested architectures, the best-performing configuration included three convolutional layers (32, 64, and 128 filters), one hidden layer with 128 neurons, and a dropout rate of 0.5. Adding a fourth convolutional layer slightly decreased validation accuracy due to overfitting and longer convergence time.

Table 4 Impact of network architecture parameters—including convolutional layers, batch normalization, dropout rate, and hidden neurons—on the performance of the proposed CNN models.

In investigating the effect of the number of dense layers in the classification part on the performance of the CNN, no significant differences were observed between using one or two hidden layers. Thus, only a single hidden layer was employed with the view of minimizing network complexity. When assessing the effect of the number of neurons in the hidden layer on estimation accuracy, the results indicated that increasing the neuron number did not significantly improve model performance. Therefore, 128 neurons were selected as the optimal number for the hidden layer to prevent overfitting.

In investigating the effect of the number of convolutional layers on the performance of the proposed CNN network, the results demonstrated that the network with three convolutional layers with 32, 64, and 128 filters achieved superior performance compared to two or four layers. Additionally, implementing a dropout layer with a dropout ratio of 0.5 had a notable impact on network accuracy. Specifically, the accuracy decreased as the dropout rate decreased. From Table 4, lower dropout rates are associated with decreased accuracy, proving that dropout is an important hyperparameter to be tuned to optimize the performance of the CNN. Among those variations, the best performance of the proposed CNN was obtained using a dropout rate of 0.5, three convolutional layers, and one hidden layer in the classifier block.

Batch normalization is widely recognized for its contribution to the stabilization of the training process in CNNs. On the contrary, this is not a universally beneficial approach because sometimes it can have an adverse effect on the performance of the model. Specifically, this technique may contribute to overfitting, especially in networks trained on smaller datasets or when batch sizes are very small34,35. The results presented in Table 4 demonstrate that, in both the fully connected layer (FCL) and global average pooling (GAP) methods, batch normalization negatively impacts CNN performance.

Visualization of discriminative features

Figure 8 displays the class saliency maps of the final CNN-GAP model, highlighting image regions most influential in classification. High activations concentrated around seed edges and surface textures suggest that the network primarily relied on morphological and textural cues to differentiate varieties. The strong consistency of activation patterns across samples within each class demonstrates that the model captured robust, variety-specific discriminative features. This finding aligns with the biological reality that wheat varieties exhibit subtle yet distinctive variations in shape and texture.

Fig. 8
Fig. 8

The class saliency map visualization from the last convolutional layer of the proposed CNN-GAP model.

Table 5 compares the performance of the proposed method with several state-of-the-art approaches for seed classification. For adulteration recognition in bulk paddy samples, an accuracy of 93.31% was reported using an MLP model trained on PCA-reduced features36. Comparable findings have been reported in CNN-based studies on chickpea37 and corn38, where accuracies of 94% and 96.46% were achieved, reflecting the benefits of deep feature extraction and optimized hyperparameters.

Table 5 Comparison of the proposed method with existing state-of-the-art approaches.

Across the literature, two factors consistently influence performance: the number of varieties examined and the degree of similarity among their morphological patterns. The lower accuracy obtained in the present study, relative to results reported in39,40, can be attributed to the high similarity among the wheat varieties evaluated. Unlike rice or maize, which exhibit more pronounced visual distinctions, wheat varieties share subtle differences in shape, surface texture, and pigmentation, resulting in reduced inter-class separability. Additionally, the inherent intra-class variability in wheat further complicates fine-grained classification. These characteristics make wheat-variety identification inherently more challenging and explain the slightly lower accuracy of the proposed CNN-GAP model.

Model generalization and practical considerations

Testing the CNN-GAP model on unseen chickpea samples yielded an average accuracy of 68.12%, revealing limited cross-domain generalization. This outcome underscores the model’s sensitivity to domain shifts—particularly variations in seed morphology and color. The reduced performance observed in the chickpea evaluation highlights an important limitation of the current study. Because the proposed model was specifically optimized to capture intra-class variability among wheat varieties, its accuracy naturally decreases when applied to a morphologically distinct crop. Chickpea seeds differ substantially in shape, surface texture, size, and color distribution, and these domain shifts fall outside the feature space learned during training. Such a decline is therefore expected and underscores the necessity of species-specific training data for reliable cross-crop deployment. Importantly, this result does not reflect a deficiency in the model architecture itself; rather, it demonstrates the inherent sensitivity of deep-learning classifiers to domain mismatches when trained exclusively on a narrow, single-species dataset.

To mitigate these limitations and enhance generalization in real-world applications, several strategies should be explored in future work. These include: 1- constructing multi-species training datasets that incorporate diverse grain types; 2- applying domain-adaptive augmentation techniques, such as brightness variation, color jitter, random shadows, and heterogeneous illumination patterns; and 3- leveraging transfer learning from large-scale agricultural image datasets that encompass multiple seed categories. These strategies collectively provide a clear pathway for improving the robustness, transferability, and practical applicability of the proposed model in broader agricultural contexts. Additionally, computational evaluation revealed that the CNN-GAP architecture achieved an average inference time of 13.6 ms per seed on a mid-range GPU, indicating feasibility for real-time sorting applications. The lightweight structure (≈2.1 M parameters) makes it suitable for integration into low-cost embedded hardware for on-site agricultural deployment.



Source link