Tomatoes are an economically important horticultural crop that is widely grown around the world and is rich in nutrients such as lycopene and vitamin C. It can be grown in open field systems or in controlled environments such as greenhouses. However, tomatoes exhibit asynchronous ripening, and fruits at different stages of ripeness often coexist on one plant. Additionally, tomatoes have a relatively short shelf life after harvest. Accurate and timely ripeness assessment is critical to minimizing post-harvest losses and maintaining product quality throughout the supply chain.
According to statistics, the average loss of fruit after harvest in China has reached 20%, and the annual economic loss exceeds 100 billion yuan. In contrast, developed countries typically record losses of less than 5%, with some countries achieving losses as low as 1% to 2%. The main cause of this imbalance is the asynchronous ripening of the fruit, which often leads growers to misjudge the optimal harvest time, resulting in early or late harvest. In greenhouse tomato production, labor costs alone account for more than 44.5% of the net profit. Currently, tomato harvesting and sorting relies primarily on manual labor, which suffers from low efficiency, high labor intensity, and rising costs.
On the other hand, tomato ripeness detection in natural environments faces many challenges. Illumination fluctuations significantly reduce recognition accuracy. Dense leaf cover, together with a complex background of soil and weeds, often obscures fruit characteristics and leads to erroneous ripeness assessments. So how can we accurately detect tomato ripeness under complex conditions while ensuring a model that is lightweight enough for real-world applications?
To address this problem, Professor Zhijie Fang and Associate Researcher Zijun Sun from the School of Electronic Engineering, Guangxi University of Science and Technology proposed a lightweight tomato ripeness detection model YOLOv11-MHS. Based on YOLOv11n, this model incorporates three important improvements. C3k2_MSCB module design. Multiscale convolutional blocks (MSCBs) are integrated to simultaneously extract and fuse features across different scales to improve detection accuracy. Redesign the model neck using a high-level functional screening fusion pyramid (HS-FPN) structure. This not only fuses important features and improves robustness in cluttered environments, but also reduces the size of the model. We introduce spatial and channel synergistic attention (SCSA) mechanisms into the C2PSA module to enhance the model's ability to handle complex scenes.
Experimental results show that compared to the baseline model YOLOv11n, YOLOv11-MHS reduces parameters by 35.2% and model size by 32.7% while achieving an improvement of 1.7% for mAP0.5 and 2.9% for mAP0.5-0.95. Compared to mainstream models such as Faster-RCNN, YOLOv7, and YOLOv8n, YOLOv11-MHS outperforms other models in metrics such as precision, recall, and average precision (mAP), and has clear lightweight advantages with GFLOPs, number of parameters, and memory footprint all lower than the comparison models.
Complex scene test: Under backlit conditions, only YOLOv11-MHS was able to find the unripe tomato on the far left. In dark environments, all other models missed some gains. Also, under leaf occlusion, YOLOv11-MHS achieved zero misses and demonstrated good robustness, similar to Faster-RCNN and improved YOLOv5.
This work provides technical support for tomato ripeness detection, whose lightweight nature facilitates deployment in resource-constrained devices. In the future, this model is expected to be integrated into autonomous harvesting robots and orchard monitoring systems to advance precision agriculture, reduce labor costs, and increase efficiency and intelligence in tomato production.
Source: Newswise
