Analysis of x-ray radiography images of pear fruit using deep learning networks

  • Beatriz Sousa Ferreira (Student)

Student thesis: Master's Thesis

Abstract

One of the major challenges facing the international pear production sector is the codling moth (Cydia pomonella), a pest that not only causes significant damage during production and harvest but also enters the international supply chain, leading to economic consequences such as import restrictions, consumer dissatisfaction, and potential health hazards. Current inspection methods are destructive and rely on random sampling, making them inefficient, labor-intensive, and prone to increasing food waste due to decisions being made at the batch level after testing only a limited number of samples. To address this, the implementation of Vision Transformer (ViT) models was explored in this thesis. Three pre-trained ViT architectures were used: first, fine-tuning a pre-trained model, then, training only the first and last layers while freezing others, and lastly simplifying the model by retaining only initial transformer layers. Additionally, a custom ViT model was optimized using the Optuna framework to fine-tune hyperparameters trained from scratch. These efforts aimed to improve pest detection using X-ray images of pears. Furthermore to assess the impact of patch size on ViT’ performance, was compared the performance of ViT models with patch sizes of 16 and 32 across all methods. It was concluded that the pre-trained ViT-B/16 model with all parameters frozen except for the first layers of the architecture and the last transformer, produced the best results with a balanced accuracy of 72,8%, a training Loss-SENS and LossGRAND-SENS of 0.0033, 0.0030 and a validation Loss-SENS and LossGRAND-SENS of 0.0017 and 0.0013, respectively. Despite the success of ViTs in image classification tasks in other studies, they did not outperform a CNN-based model, EfficientNet6, in this study on the pear dataset. Factors such as differences in augmentation techniques, training splits, and the inherent complexity of ViT architectures likely influenced these results. This reinforces the idea that ViTs typically require larger datasets and more precise tuning to reach optimal performance, highlighting their sensitivity to data quantity and model adjustments. This research identifies key challenges in pest detection. It addresses these issues by comparing the performance of ViTs and CNNs on small datasets, emphasizing the need for fine-tuning strategies tailored to specialized tasks. The thesis lays the groundwork for future advancements in pest detection, providing solutions to improve model robustness and accuracy in challenging real-world conditions.
Date of Award18 Dec 2024
Original languageEnglish
Awarding Institution
  • Universidade Católica Portuguesa
SupervisorBart Maria Alfons Nicolaï (Supervisor)

Keywords

  • X-ray CT
  • Codling moth
  • Vision transformer
  • Deep learning
  • Transfer learning

Designation

  • Mestrado em Engenharia Alimentar

Cite this

'