Automated weed and crop recognition and classification model using deep transfer learning with optimization algorithm

In this approach, an AWRC-DLMLO method is developed. The main purpose of the AWRC-DLMLO method is to effectively detect and classify weeds and crops. To accomplish that, the AWRC-DLMLO method has image processing, segmentation and feature extraction using ShuffleNetV2, LOA-based parameter selection and a CQN-based classification process. Figure 1 depicts the entire flow of AWRC-DLMLO method.

Image preprocessing

Primarily, the AWRC-DLMLO technique takes place GF utilizing image pre-processing implemented to eliminate unwanted noise. GF is a vital image pre-processing state in weed and crop recognition²³. It includes relating a Gaussian blur to level the image, decreasing noise and improving the difference between weeds and crops. By convoluting the image with a Gaussian kernel, GF aids in upholding significant edges while removing minor variants. This outcome in clearer, more different features, which increases the accuracy of successive classification and segmentation methods. Generally, GF improves the robustness of crop and weed recognition methods thus by delivering more reliable and cleaner input data.

Segmentation process

Next, the segmentation is also developed utilizing the RA-UNet model for generating segments. The architecture of U-Net contains blocks of decoder and encoder which are related through a bridge at all levels²⁴. The bridge relations act responsibly to merge both the up‐sampling and down‐sampling routes for reaching spatial data. As an alternative, the concatenation stage can transmit numerous insignificant and inadequate feature depictions from the encoder part in the grouping system. The attention mechanism applied depends upon the U‐Net architecture that has been recommended using hopeful results in medical imagery. This soft-attention mechanism is applied to retain and underline the best descriptive features and improve completed segmentation outcomes by plain U‐Net architecture. This soft-attention mechanism remarks the significant features and limits stimulations into distinct states. Consequently, model performance and sensitivity have been faintly enhanced by using an attention gate without the need for complex and difficult computing costs. Figure 2 shows the structure of RA-UNet model.

The applied soft-attention gate gets dual inputs,$\:\:g$ and $\:x$. The input $\:x$ is accomplished through the concatenation bridges resulting from initial levels of the encoder segment and contains superior spatial data. Input $\:g$ originates in the deepest levels on the network termed the gating sign consists of effectual contextual data and feature depiction to classify the focus area and provides weight to various imageries parts. This attention coefficient $\:\alpha\:\in\:\left[\text{0,1}\right]$ removes, assigns, and identifies the feature weights that involve the essential part of the region of the imageries. The attention mechanism progresses, receiving the pixel weights based on their significance in the training phases. The most applicable segment from the image will become weights higher than the least applicable portions. Hence, by employing the obtained weights within the training method, a technique is trained that is intended for the applicable part of the image. The multiplication of the obtained attention co-efficient and the input feature maps $\:{x}^{l}$, $\:\alpha\:$ generates attention gate outputs:

$$\:q_{{att}}^{I} = \psi \:^{T} \left( {\sigma \:_{1} \left( {W_{x}^{T} x_{i}^{I} + W_{g}^{T} g_{i} + b_{g} } \right)} \right) + b_{{\psi \:}} ,$$

(1)

$$\alpha _{i}^{I} = \sigma _{2} \left( {p_{{ott}}^{I} \left( {x_{i}^{I} ,{\text{~}}g_{i} ;\Theta _{{ott}} } \right)} \right),$$

(2)

in which the $\:{\sigma\:}_{1}$ and $\:{\sigma\:}_{2}$ parameters are parallel in relu and sigmoid activation functions and the $\:{\varTheta\:}_{att}$ parameters define different parameters having linear transformations $\:{W}_{\chi\:}$ and $\:{W}_{g}$, the $\:{\sigma\:}_{1}$ and $\:{\sigma\:}_{2}$ terminologies of the functions and bias $\:{b}_{\psi\:}$ and $\:{b}_{g}$.

Deeper neural networks (DNN) offer effective operations in segmentation tasks and challenging classification. The recommended U-Net-based architectures have many convolutional blocks in all levels. The convolution operation receives the input value and the Convolutional blocks and the activation function are applied on the input value and generate the output. In neural networks, the output of one convolutional block is the input of the next convolutional block. Therefore, when developing the neural network architecture in detail, the desired gradient between one and another block will be smaller because of the gradient vanishing effect and accuracy of the trained method will be ruined rapidly instead of building up. The training process and pretentious the model generalizability was carried out by the gradient vanishing difficulty. To reduce the complexity, the residual mechanism has been used and incorporated to the suggested solution to continuously update the desired gradient values in each convolutional block and enhance the trained method performance. The introduced residual blocks referred to as skip connections, will skip over more than a single layer, and will update the gradient values of more than a single layer above the layer stage onwards. Consequent to the combination of the soft-attention and the residual mechanism, the system will acquire the weights into the meaningful image components and suppress the problem of gradient vanishing during the training progress.

ShuffleNetV2 feature extraction

For feature extraction, the ShuffleNetV2 approach is exploited in the AWRC-DLMLO method to ascertain the feature vector. This system exhibits a detection method, that uses the ShuffeNetV2 technique as a basis for the network²⁵. To decrease the problem of computational, convolutional neural networks (CNN) often utilize $\:1$x$\:1$ convolution (Conv) previous to $\:3\text{x}3$ Conv, which simplifies the data flow among networks and decreases the data sizes. However, the utilization of $\:1\text{x}1$ Conv in higher-performance networks requires the obtainability of extensive computation sources. The ShuffeNetV1 network uses $\:1\text{x}1$ group Conv and channel shuffling. Then enhances its structure by efficiently decreasing the calculation of $\:1\text{x}1$ point-wise Conv over the usage of channel shuffling and group Conv processes. It is a very effective and lightweight network. ShuffeNetV2 signifies a development over ShuffeNetV1 in performance. This network integrates a $\:1\text{x}1$ Conv layer that aids in combining the features of earlier global average pooling. This outcome in fluctuating complication is chosen as 0.5, 1, 1.5 and $\:2\text{x}$. In this system, the ShuffeNetV2 $\:1\text{x}$ method was nominated for its real applicability. The network includes dual convolutional layers, an input layer, dual pooling layers, one fully connected (FC) layer, and three stages. Each stage part includes a basic and down-sampling unit with a stride of 1 and 2 respectively. Figure 3 demonstrates the infrastructure of ShuffleNetV2.

Parameter optimizer

Moreover, the LOA is applied to increase the hyperparameter and fine-tune the DL technique, further enhancing its performance. The LO method signifies a cutting-edge, bio-inspired meta-heuristic aimed to deal with global optimization issues²⁶. Taking stimulation from the involved behaviors of social detected in lemurs and types of lemur primates resident to Madagascar and nearby islands, LO reflects the distinct behaviors of a locomotive. The organized social groups are recognized as troops, lemurs show dual prominent behaviors of locomotive such as dance‐hopping and leaping. The former includes long jumps among trees, agility, and crossing extensive distances to hunt shelter and resources. Conversely, dance‐hopping denotes a synchronized, common drive done by lemurs in their crowds.

LO integrates these usual behaviors of lemur into its optimizer procedure, whereas an agent signifies every latent solution named a lemur. The locations resemble to candidate solution in the decision variable. To commence the method, a population of lemur is positioned at random in the variable limits stated in Eq. (3):

$$\:L_{i}^{j} = rand\: \times \:\left( {ub_{j} – lb_{j} } \right) + lb_{j} \forall \:i \in \:\left( {1,2,\: \ldots \:,n} \right),\forall \:j \in \:\left( {1,2,\: \ldots \:,d} \right)$$

(3)

Whereas $\:rand$ denotes a randomly produced value in the interval of $\:0$ to 1, $\:lb$ and $\:ub$ represent the lower and upper limits for the $\:jth$ search space in $\:ith\:$solution$\:.$.

Then, every lemur fitness is evaluated utilizing an objective function, which defines the global best ($\:gbest$) and neighbors’ best ($\:nbest$).

In the exploration stage, imitating the behavior of leaping, lemurs perform long jumps as per Eq. (4):

$$\:L_{{i + 1}}^{j} = L_{i}^{j} + abs\left( {L_{i}^{j} – gbest^{j} } \right) \times \:\left( {rand\: – 0.5} \right) \times \:2\:if\:rand\: \ge \:FRR$$

(4)

In the stage of exploitation, significant of dance-hopping, lemurs take part with neighboring lemurs according to Eq. (5):

$$L_{{i + 1}}^{j} = L_{i}^{j} + abs\left( {L_{i}^{j} – nbest^{j} } \right) \times \left( {rand~ – 0.5} \right) \times 2~if~rand < FRR$$

(5)

The alteration of the risk parameter FRR utilizing Eq. (4) certifies flexibility during the optimizer procedure:

$$\:PRR = HRR – Crnt_{{Iter}} \times \:\left( {\frac{{\left( {HRR – LRR} \right)}}{{{\text{Max}}_{ – } Iter}}} \right)$$

(6)

Here, $\:{\text{M}\text{a}\text{x}}_{-}Iter$ represents the maximum iteration count; $\:HRR$ and $\:LRR$ signify constant pre-defined values; $\:Crnt\_Iter$ signifies the present iteration. This progressive method trains LO with the ability to find the way and improve the compound problem landscape more efficiently.

The LOA emerges an FF to attain improved performances of classifiers. It specifies a positive integer to indicate the optimal performances of the candidate solutions. The decrease in the classification rate of error is analyzed in this study and it is given as the FF as in Eq. (7).

$$\begin{aligned} \:fitness\left( {x_{i} } \right) & = ClassifierErrorRate\left( {x_{i} } \right)\: \\ & = \frac{{number\:of\:misclassified\:samples}}{{Total\:number\:of\:samples}}*100\: \\ \end{aligned}$$

(7)

Weed and crop recognition process

Eventually, the CQN method is employed for the classification process. When the size of the recommendation list is $\:K,$ the cascading Q-networks method contains $\:K$ Q‐networks that are related in a cascading method and pick $\:K$ optimum items for recommendation in order²⁷. It employs a Q‐learning structure where an optimum action‐value function$\:{Q}^{*}(s,\:\mathcal{A})$will be studied and fulfill $\:{Q}^{*}({s}_{t},\:{\mathcal{A}}_{t})=\mathbb{E}\left[r\right({s}_{t},\:{\mathcal{A}}_{t})+\gamma\:{\text{m}\text{a}\text{x}}_{{\mathcal{A}}_{t+1\subset\:{\mathcal{I}}_{t+1}}}{Q}^{*}({s}_{f+1},{\mathcal{A}}_{f+1}\left)\right].$ When the action‐value function is learned, then an optimum policy for recommendation is expressed as

$$\:\pi \:^{*} \left( {s_{t} ,\:A_{t} } \right) = \arg \mathop {\max }\limits_{{A_{t} \subset \:I_{t} }} \left( {Q^{*} \left( {s_{t} ,\:A_{t} } \right)} \right),$$

(8)

Here, $\:{\mathcal{I}}_{t}\subset\:\mathcal{I}$ denotes the set of items accessible at time $\:t.$.

The cascading Q-networks method utilizes a set of $\:K$-related Q‐functions for addressing the vast combinative action space and making the optimum K‐ API combination. The recommender action is denoted as $\:\mathcal{A}=\{a1:K\}\subset\:\mathcal{I}$ and the optimum action is signified as $\:{\mathcal{A}}^{*}=\left\{{a}_{1:K}^{*}\right\}={\text{a}\text{r}\text{g}\text{m}\text{a}\text{x}}_{\mathcal{A}}{Q}^{*}(s,\:\mathcal{A})$. It is stimulated by the main fact:

$$\:\mathop {{\text{max}}}\limits_{{a_{1} :K}} Q^{*} (s,\:a1:K) = \mathop {{\text{max}}}\limits_{{a_{1} }} (\mathop {{\text{max}}}\limits_{{a_{2} :K}} Q^{*} (s,\:a_{{1:K}} )).$$

(9)

Depend upon this, a set of commonly constant functions $\:{Q}^{1*},\dots\:$ ,$\:{Q}^{K*}$ is defined to attain every optimum atomic action $\:{a}_{k}^{*}\in\:\left\{{a}_{1:K}^{*}\right\}$ as:

$$\:\left\{ {\begin{array}{*{20}l} {a_{1}^{*} = \mathop {{\text{argmax}}}\limits_{{a_{1} }} \left\{ {Q^{{1*}} \left( {s,a_{1} } \right): = \mathop {{\text{max}}}\limits_{{a_{2} :K}} Q^{*} (s,a_{1} :{\text{K}})} \right\},} \hfill \\ {\:a_{2}^{*} = \mathop {{\text{argmax}}}\limits_{{a_{2} }} \left\{ {Q^{{2*}} \left( {s,a_{1}^{*} ,a_{2} } \right): = \mathop {{\text{max}}}\limits_{{a_{3} :K}} Q^{*} (s,a_{1} :{\text{K}})} \right\},} \hfill \\ {\:a_{K}^{*} = \mathop {{\text{argmax}}}\limits_{{a_{K} }} \left\{ {Q^{{K*}} \left( {s,a_{{1:K – 1}}^{*} ,a_{K} } \right): = Q^{*} (s,a_{{1:K}} )} \right\}} \hfill \\ \end{array} } \right.$$

(10)

Therefore, an optimum action $\:{\mathcal{A}}^{*}$ can be attained in $\:o\left(K\left|\mathcal{I}\right|\right)$ calculations by using these functions in a cascading method.

Every $\:{Q}^{k*}$ function is adjustable by a multi-layer perceptron (MLP):

$$\:Q^{{k*}} = q_{k}^{T} \sigma \:(W_{k} [s\left\| {i_{1}^{*} } \right\|\: \cdots \left\| {i_{{k – 1}}^{*} } \right\|\:i_{k} ]^{T} + b_{k} )\:\forall \:k,$$

(11)

Here $\:\sigma\:$ denotes the sigmoid activation function, $\:s$ refers to the state embedding, $\:{i}_{j}^{*}(1\le\:j\le\:k-1)$ represents the API embedding acquired over equivalent to the optimum atomic action $\:{a}_{j}^{*}$, and $\:{i}_{k}$ signifies embedding of the candidate API $\:{i}_{k}\in\:\mathcal{I};{W}_{k}\in\:{\mathbb{R}}^{{d}_{n}\times\:({d}_{s}+{d}_{i}*k)},{q}_{k}\in\:{\mathcal{R}}^{{d}_{n}}$ and $\:{b}_{k}\in\:{\mathcal{R}}^{{d}_{n}}$ denote the set $\:{\varTheta\:}_{k}$ of parameters, and $\:{d}_{n},{d}_{s}$ and $\:{d}_{i}$ are the size of the MLP hidden layer, state embedding, and API embedding, correspondingly. This method represent the set of every parameter of the cascading $\:Q$-networks method as $\:{\varTheta\:}_{\mathcal{Q}}$ (i.e., $\:{\varTheta\:}_{\mathcal{Q}}=\{{\varTheta\:}_{1},\dots\:,{\varTheta\:}_{K}$}).