Monitoring and predicting cotton leaf diseases using deep learning approaches and mathematical models

Formal modelling and verification

Formal modeling and verification based on mathematics, logical construction, and formal language define and verify requirements that confirm the specifications. Formal modeling and verification are essential in safety-critical systems, offering a systematic mathematical framework that ensures correctness and reliability.

Formal verification is rigorous and ensures the correctness properties of the system’s safety. Formal methods are used in requirement verification and design to safeguard correctness properties using state, transition, proof obligations, theorem proving and inductive logic³¹. A model provides a basis to improve and verify the system rigorously, and it is also significant in safeguarding the system’s correctness. It takes input and systemically checks whether the system holds the correct safety properties.

Formal modeling involves the development of mathematical verification of a system’s architecture and dynamics, enabling potential states and transitions that the system experiences. The method eliminates ambiguities in specifications and ensures the system’s functionality corresponds with its requirements. Verification involves rigorously demonstrating that the model complies with fundamental qualities, such as safety and liveness, which ensure that the system avoids undesirable situations and reliably executes its intended activities.

This work systematically verifies that these criteria are maintained across scenarios using formal verification techniques like theorem proving and model checking. The model checker systematically inspects the model’s state space, ensuring designated properties, such as invariants or temporal properties, are designated. A Formal verification and modeling technique helps detect undetected errors in conventional testing methods.

Temporal logic of actions (TLA+)

Temporal logic of action (TLA+) is a formal specification and modeling language based on temporal logic and mathematical foundation. Leslie Lamport, TLA+, created a software development tool to ensure the correctness of concurrent and distributed systems³². It uses mathematical logic to describe system behaviour and allows designers to formulate and verify their design methods. Formal modeling and verification with the TLA + and PlusCal language are effective measures for ensuring that the concurrent and distributed systems are correct. TLA + is a language used for modeling behaviour, and the implementation process is more accessible than Pluscal. Formal verification via model checking ensures that a system verifies its requirements³². Figure 2 presents Workflow diagram for Temporal logic of action.

Model checking

Model checking^{33,34,35,36,37,38,39,40,41}is one of the most practical formal approaches for systematic, automatic, and exhaustive verification. “It is a computer-assisted method for analyzing dynamical systems that state-transition systems can model“³⁵ A mathematical model of the system is formed, and a comprehensive evaluation of the model is executed. It comprises checking all states and transitions in the model, i.e., an exhaustive model analysis. It generates an abstract demonstration of a system with all the possible transitions and states.

The correctness property of the model is verified through model-checking methods. The model takes input and systemically checks whether the system safeguards the system’s property. It is used to verify projects, i.e., the verification of software systems for spacecraft, nuclear reactors, aeroplanes, subway trains, and satellites. Its goal is to improve the reliability of verification by checking correctness properties.

“Model checking is an automated technique that, given a finite-state model of a system and a formal property, systematically checks whether this property holds for (a given state in) that model. Model-based verification techniques are based on models describing the possible system behaviour mathematically, precisely and unambiguously. The accurate modeling of systems leads to the identification of incompleteness, ambiguities, and inconsistencies in informal system specifications. A system model is accompanied by algorithms that systematically explore all states of the system model. It provides the basis for verification techniques ranging from an exhaustive exploration (i.e. model checking) to experiments with a restrictive set of scenarios in the model (i.e. simulation), or in reality (i.e. testing)“³⁶.

Model checking uses temporal logic, i.e. linear-time temporal logic (LTL)^44,45,46 and computation-tree temporal logic (CTL)^42,43 for stating, checking, and verifying behavioural properties. Model checking is the mathematical verification of a system, and its result involves an exhaustive systematic investigation of the mathematical model⁴⁷.

Correctness property

Safety property is correctness property. Correctness properties provide detailed system verification. The safety property is an invariant that asserts that “something bad never happens, that an acceptable state of affairs is maintained.” Calegari and Szasz⁴⁸have defined safety property “S = {a1, a2,., an} as a deterministic process that asserts that any trace, including actions in the alphabet of S, is accepted by S. ERROR conditions are like exceptions which state what is not required, as in complex systems we specify safety properties by directly stating what is required”. According to⁴⁹ “a safety property is a property that can specified by a safety formula of the form □p (i.e. temporal operator □ meaning always). This formula states that the property p holds throughout the computation”.

Justification for TLA+ model

TLA + selected for formal modeling and verification due to its strength in specifying and reasoning about system behaviours over time, particularly in concurrent or reactive systems like automated disease detection pipelines. TLA + allows us to rigorously define correctness properties—such as consistency, safety, and liveness- which are critical when deploying models in real agricultural environments. By applying model checking through TLA+, we systematically explore all possible states and transitions, ensuring that the detection process adheres to predefined rules under varying conditions. This approach enhances the reliability and robustness of our system by preemptively identifying potential design flaws before real-world deployment.

Convolutional neural network (CNN)

Zohar and Amir⁵⁰ thoroughly studied the coefficients and other CNN structures used in computer vision. CNN, a deep learning model, can solve problems involving grid-like data structures such as images and videos⁵¹. An essential part of a CNN consists of kernels, strides, padding, pooling, and flattening.

The convolutional filter kernel is essential for extracting features from input data in convolutional neural networks. The convolution operation takes a kernel and slides it across the input by conducting element-wise multiplication and summing the outputs. The feature maps capture the fundamental patterns via edges or textures, which help the network acquire hierarchical representations from the input. The movement spacing in CNNs, determined by the stride parameter, is the kernel’s distance when it moves over the input during the convolution process. Upon the prolongation of each step, the spatial down-sampling appears, decreasing the size of the output feature map, which, in turn, enhances calculation speed. Reducing the length of each step maintains the spatial details, resulting in larger maps of distinctive features. The selection of stride directly affects the balance between spatial resolution and computational workload, affecting the network’s capacity to identify complex patterns in the input.

In CNNs, padding adds extra pixels around the input data, usually with zero values. The main objective of padding is to avoid spatial dimension loss following convolution. It ensures that the convolution process considers the entire input space, which is significant at the corners where information could be lost. Padding is critical for preserving spatial information and maximizing the features for the overall performance and effectiveness of the network. Pooling layers in CNNs is vital for reducing the size of the feature maps produced by convolutional layers. Max pooling and average pooling are prevalent methods for selecting average values from a cluster of adjacent pixels. Combining reduces complexity while maintaining translation invariance and preserving crucial information in feature maps. By pooling, learners discard less essential details and concentrate on the critical attributes to achieve more effective hierarchical feature learning. The following stages of the convolutional layer and the pooling layer result in the output matrix flattened into a linear vector with several dimensions. Those low-dimensional vectors are put in fully connected networks and converted to a more familiar neural network structure. The flattened layer is fundamental for the operation of CNNs because it provides a format for the spatial features of the acquired image, more processing, and making decisions. It links the convolutional, fully connected layers, thus ensuring that when the network learns subtle patterns and correlations from the input information, it will identify complex tasks like image classification.

The convolution operation is a fundamental aspect of Convolutional Neural Networks (CNNs) and is crucial in feature extraction. Mathematically, the equation represents the sliding of a filter (or kernel) over the input image or previous feature map. The kernel weights are multiplied by the corresponding pixel values in the receptive field, and the sum of these products is computed for each kernel position. This sum is then passed through an activation function, often a nonlinear function such as ReLU or Sigmoid, to introduce non-linearity into the network. The result of this operation is a feature map that captures spatial hierarchies of the input data, helping the network to detect patterns such as edges, textures, and shapes in images. Adding a bias term further enhances the model’s flexibility by shifting the activation, allowing the network to fit the data better.

$$y_{{i,j}}^{l}=\sigma \left( {\mathop \sum \limits_{{m=0}}^{{k – 1}} \sum\limits_{{n=0}}^{{k – 1}} {w_{{m.n}}^{l} \cdot x_{{i+m,j+n}}^{{l – 1}}+{b^l}} } \right)………..Convolution{\text{ }}operation$$

Batch normalization (BN) is a technique employed to accelerate the training of deep neural networks by mitigating internal covariate shift, which refers to the changes in the distribution of layer inputs during training. The equation for batch normalization normalizes the input data by subtracting the batch mean and dividing it by the batch standard deviation, effectively standardizing the activations within each mini-batch. This step ensures that the network learns with consistent data distribution, reducing the chances of vanishing or exploding gradients. After normalization, the data is scaled and shifted by learned parameters γ (gamma) and β (beta), allowing the model to recover any necessary shifts or scaling that might have been lost in the normalization process. This operation stabilizes the training process and enables faster convergence, often resulting in improved model performance.

$$x=\frac{{x-u}}{{\sqrt {{\sigma ^2}+\grave{o}} }},{\text{ }}y=\gamma \hat {x}+\beta…………….Batch{\text{ }}Normaization$$

Max pooling is a downsampling operation commonly used in CNNs to reduce the spatial dimensions of the feature maps while retaining the essential information. The equation for max pooling describes the process of selecting the maximum value from each patch of the feature map, typically within a fixed-size window (e.g., 2 × 2 or 3 × 3). The primary goal of max pooling is to introduce spatial invariance by reducing the impact of small translations, rotations, and distortions in the input data. This operation reduces the computational burden of the network and helps mitigate overfitting by forcing the network to focus on the most prominent features rather than being sensitive to noise or minor variations. The downsampling effect achieved by max pooling enables deeper networks by reducing the number of parameters, thereby improving the model’s efficiency.

$${y_{i,j}}=\hbox{max} ({x_{i \cdot s+m,j \cdot s+n}})…………Max{\text{ }}Pooling$$

A neural network’s dense (or fully connected) layer aggregates information from all previous layers and maps it to the output. The equation for a dense layer involves multiplying the input vector by a set of weights and adding a bias term, which determines the activation level of each neuron in the layer. The output is then typically passed through an activation function, such as ReLU, Sigmoid, or Softmax, to introduce non-linearity and allow the model to learn complex patterns. In the context of a CNN, dense layers are typically used in the final stages of the network, where they aggregate the learned features from the convolutional and pooling layers to produce the final classification or regression output. The weights in the dense layer are learned through backpropagation during training, allowing the network to adjust and improve the mapping from input data to output predictions.

$${y_j}=\sigma \left( {\sum\limits_{{i=1}}^{{{n_{in}}}} {{w_{ij}}{x_i}+{b_j}} } \right)…………..Dense Layer$$

The softmax function is used primarily in the output layer of neural networks for multi-class classification problems. It converts the raw output scores logits into probabilities, ensuring that the sum of all predicted probabilities equals one. The softmax equation exponentiates each logit and normalizes it by dividing it by the sum of the exponentiated logits. This normalization process transforms the logits into a probability distribution, where each value represents the likelihood of a specific class. By applying softmax, the model provides a clearer interpretation of its output, enabling it to make probabilistic decisions. The class with the highest probability selected as the predicted label. Softmax is particularly advantageous in classification tasks with multiple possible categories, as it provides a coherent framework for decision-making, making it easy to identify the most probable class among many alternatives.

$$softmax({z_i})=\frac{{{e^{{z_i}}}}}{{\sum _{{j=1}}^{K}{e^{{z_j}}}}}$$

Recurrent neural network (RNN)

Recurrent Neural Network (RNN) produced to respond to problems from the sequences of input recognition⁵². Vanilla RNN aims to learn from input data encoded in the past but finds it hard to keep track of the information over a long period. Long Short-Term Memory (LSTM) is based on specific strategies that can catch and hold long-term associations more efficiently. Bidirectional RNNs can read input text in both forward and backward directions, allowing them to understand the contextual information from the prior and forward parts of the text at one time. In ESNs (echo state networks), the recurrent connections follow the simplest framework, whereas hierarchical RNNs with numerous layers provide an opportunity to learn complex representations from the data. Temporal resolutions divided among different neurons in Clockwork RNNs, and attention-based RNNs use mechanisms over a specific sequence segment. Every variant intended to address a challenge with various strategies combined with the ability of RNNs to perform tasks, including natural language processing, speech recognition, and time series analysis⁵³.

$$\begin{gathered} {\text{ }}Hidden{\text{ }}state{\text{ }}update:h\_t{\text{ }}={\text{ }}\tanh (W\_hh{\text{ }}*{\text{ }}h\_\left( {t-1} \right){\text{ }}+{\text{ }}W\_xh{\text{ }}*{\text{ }}x\_t{\text{ }}+{\text{ }}b\_h) \hfill \\ {\text{ }}Output:y\_t{\text{ }}={\text{ }}W\_hy{\text{ }}*{\text{ }}h\_t{\text{ }}+{\text{ }}b\_y \hfill \\ \end{gathered}$$

Here, W_hh, W_xh, and W_hy are the weight matrices for the hidden-to-hidden, input-to-hidden, and hidden-to-output connections, respectively. b_h and b_y are the hidden state and output bias terms, respectively.

Long short-term memory (LSTM)

Long short-term memory (LSTM) is a progressive version of the RNN with a structure to overcome the difficulties of long-term patterns in sequential input collection⁵⁷. LSTMs address the vanishing gradient problem that limits the capability of used RNNs to sustain information over long intervals. LSTMs include memory cells and three gating mechanisms, i.e. input, forget, and output gates, enabling the network to keep, discard, and produce information selection.

$$\begin{gathered} {\text{ }}Forget{\text{ }}gate:f\_t{\text{ }}={\text{ }}sigmoid\left( {W\_f{\text{ }}*{\text{ }}\left[ {h\_\left( {t-1} \right),{\text{ }}x\_t} \right]{\text{ }}+{\text{ }}b\_f} \right) \hfill \\ {\text{ }}Input{\text{ }}gate:i\_t{\text{ }}={\text{ }}sigmoid\left( {W\_i{\text{ }}*{\text{ }}\left[ {h\_\left( {t-1} \right),{\text{ }}x\_t} \right]{\text{ }}+{\text{ }}b\_i} \right) \hfill \\ {\text{ }}Cell{\text{ }}state{\text{ }}candidate:C\_tilde\_t{\text{ }}={\text{ }}\tanh \left( {W\_C{\text{ }}*{\text{ }}\left[ {h\_\left( {t-1} \right),{\text{ }}x\_t} \right]{\text{ }}+{\text{ }}b\_C} \right) \hfill \\ {\text{ }}Update{\text{ }}cell{\text{ }}state:C\_t{\text{ }}={\text{ }}f\_t{\text{ }}*{\text{ }}C\_\left( {t-1} \right){\text{ }}+{\text{ }}i\_t{\text{ }}*{\text{ }}C\_tilde\_t \hfill \\ {\text{ }}Output{\text{ }}gate:o\_t{\text{ }}={\text{ }}sigmoid\left( {W\_o{\text{ }}*{\text{ }}\left[ {h\_\left( {t-1} \right),{\text{ }}x\_t} \right]{\text{ }}+{\text{ }}b\_o} \right) \hfill \\ {\text{ }}Update{\text{ }}hidden{\text{ }}state:h\_t{\text{ }}={\text{ }}o\_t{\text{ }}*{\text{ }}\tanh \left( {C\_t} \right) \hfill \\ \end{gathered}$$

Here, W_f, W_i, W_C, and W_o are the weight matrices for the forget gate, input gate, cell state candidate, and output gate. b_f, b_i, b_C, and b_o are the corresponding bias terms. LSTMs can acquire and preserve significant contextual information over long periods, which makes them efficient at tasks that include sequences. LSTMs have demonstrated their significance in capturing complex relationships within sequential data, making them a fundamental component in various deep-learning applications⁵⁴.

Source link