Benchmarking ResNet50 for Image Classification | ITEA Journal

SEPTEMBER 2024 I Volume 45, Issue 3

Benchmarking ResNet50 for Image Classification on Diverse Hardware Platforms

Matthew Wilkerson

Matthew Wilkerson

Fayetteville State University,
Fayetteville, North Carolina

Grace Vincent

Grace Vincent

North Carolina State University,
Raleigh, North Carolina

Zaki Hasnain

Zaki Hasnain

Jet Propulsion Laboratory,
California Institute of Technology,
Pasadena, California

Emily Dunkel

Emily Dunkel

Jet Propulsion Laboratory,
California Institute of Technology,
Pasadena, California

Sambit Bhattacharya

Sambit Bhattacharya

Fayetteville State University,
Fayetteville, North Carolina

DOI: 10.61278/itea.45.3.1008

Abstract

In edge computing, optimizing deep neural networks within limited computational resources is essential. This study concentrates on improving the efficacy of the ResNet50 model via static quantization. The applications targeted in this investigation encompass robotic space exploration, specifically Martian terrain classification and wildfire detection scenarios. We conducted performance evaluations on multiple platforms, including a desktop PC, an Intel Next Unit of Computing mounted on a drone, and an Nvidia Jetson Nano integrated into a custom robot, to assess the impact of quantization on computational efficiency and model accuracy. Our findings demonstrate that quantization achieved a reduction in model size by approximately 73-74% and decreased average inference times by 56-68%, with minimal effect on accuracy. These results corroborate the utility of quantization as a viable approach for the deployment of complex neural networks in edge computing environments, ensuring the retention of high accuracy levels.

Keywords: Edge AI, ResNet50, Quantification, Quantization Aware Training, Image Classification, Martian Terrain Classification, Wildfire Detection, Neural Networks

Introduction

Edge computing, an approach that processes data where it is generated rather than relying on centralized data centers, has become increasingly vital in the field of artificial intelligence (AI). Deploying deep neural networks (DNNs) directly on edge devices like drones, robots, and IoT devices is especially crucial in scenarios requiring rapid processing and decision-making, such as space exploration and environmental monitoring. However, these applications face significant challenges due to the limited computational resources available on edge devices.

As the capabilities of edge computing continue to advance, the need for efficient and timely data processing on-site has pushed the boundaries of how and where AI systems can be deployed. In this context, ResNet50, a model known for its robustness and high accuracy in image classification tasks, must be optimized to meet the stringent constraints of edge environments. Specifically, reducing the model’s computational load—including memory usage, processing power, and inference time—is essential for effective deployment.

This study addresses these challenges by implementing and benchmarking static quantization of ResNet50 to enhance its performance across diverse hardware platforms. The research is particularly relevant for critical applications such as Martian terrain classification and wildfire detection, where rapid, on-site processing can expedite response times, reduce communication overhead, and enable autonomous decision-making in remote and resource-constrained environments. By focusing on model quantization and computational efficiency, this study directly contributes to making sophisticated AI more accessible and effective in edge computing scenarios.

Literature Review

Recent advancements in AI have significantly enhanced the capability of edge devices, which are pivotal in domains requiring rapid processing and decision-making. Edge computing has become increasingly important in AI deployment, particularly for real-time applications. Giulio Reina (2024) explores the application of AI and robotics in precision agriculture, focusing on optimizing resource utilization across diverse edge devices such as central processing units (CPUs) and tensor processing units (TPUs). This work highlights the critical need for adaptive AI systems that manage resources efficiently while maintaining high performance under real-world conditions. The growing need for resource efficiency and real-time processing capabilities across various sectors underscores the strategic importance of AI optimization for edge computing.

DNNs are at the core of many modern AI applications due to their ability to model complex patterns in data. ResNet50, a deep convolutional neural network architecture, has been widely recognized for its robustness and efficacy in handling complex image recognition tasks. The name ‘ResNet’ stands for ‘Residual Network,’ which is renowned for its ability to train very deep networks without succumbing to the vanishing gradient problem (He et al., 2015). This makes it an ideal candidate for specialized applications such as Martian terrain classification and wildfire detection, which require high accuracy and reliability under challenging conditions. The architecture of ResNet50 includes 48 convolutional layers, each capable of extracting a rich set of features from input images. This deep architecture allows the model to discern subtle textural and color differences, essential for tasks like Martian terrain classification and wildfire detection, where the environment is highly variable, and the stakes are high.

Deploying DNNs directly on edge devices presents unique challenges due to the limited computational resources available on these platforms. However, the integration of DNNs with edge computing is crucial in scenarios that demand rapid processing and decision-making, such as space exploration and environmental monitoring. For instance, Ciraolo et al. (2024) have developed an optimized AI system for facial expression recognition tailored specifically for tele-rehabilitation on low-resource edge devices. This implementation demonstrates the potential of deploying emotionally intelligent systems in resource-constrained environments, highlighting how edge devices can support real-time, sensitive interactions in healthcare.

To address these resource constraints and enable the deployment of DNNs on edge devices, one effective strategy is quantization. By reducing the precision of the model’s parameters from 32-bit floating-point numbers to 8-bit integers, quantization can significantly decrease both memory footprint and computational complexity. This reduction is especially beneficial in edge computing scenarios where processing power is limited. Odili et al. (2024) examine the integration of advanced AI technologies in the oil and gas industry to optimize corrosion inspection and maintenance tasks. By processing data directly on edge devices, quantization techniques help minimize delays and enhance decision-making processes, which are crucial for maintaining the reliability and efficiency of operations in harsh environmental conditions.

While standard quantization can yield significant performance improvements, it is not without challenges, particularly in maintaining model accuracy. To overcome these challenges and further enhance the effectiveness of quantization, quantization-aware training (QAT) has emerged as a powerful technique. QAT simulates the effects of quantization during the training process itself, allowing the model to adapt to reduced precision and thereby mitigating potential accuracy losses. This approach is preferable to post-training quantization, which can result in significant drops in performance due to the model being optimized under full precision conditions. By integrating quantization into the training loop, as discussed by Nagel et al. (2021), models like ResNet50 can be made more robust, maintaining higher accuracy levels when deployed on resource-constrained devices. This is particularly important for applications such as Martian terrain classification and wildfire detection, where precision is critical.

Despite the progress in AI optimization for edge computing, there is still a significant gap in understanding the full potential of deploying quantized DNNs on diverse edge platforms. Most existing studies focus on either DNN optimization or edge computing in isolation, with limited exploration of the combined effects of quantization on DNN performance across different edge devices. This study aims to bridge this gap by providing a comprehensive evaluation of ResNet50’s performance on various edge devices, including the Intel Next Unit of Computing (NUC) and Nvidia Jetson Nano, when subjected to quantization-aware training. By addressing this gap, the research contributes to the broader application of AI in critical, real-world scenarios where speed and efficiency are paramount.

Methodology

In this study, we evaluated the performance of the ResNet50 model on various edge computing platforms to understand how these devices handle computationally intensive DNNs under resource constraints. We employed the PyTorch framework, utilizing its Torchvision module to access the pre-trained, quantizable version of Resnet50 (“ResNet50 Model – Torchvision”). We selected two edge devices for our experiments: the Intel NUC and the Nvidia Jetson Nano. The Intel NUC, mounted on a quadcopter drone, was chosen for its compact size and versatility, making it representative of a class of devices that balance portability with moderate computational power. The Nvidia Jetson Nano, integrated into a customizable robot, was selected for its popularity in AI and robotics applications, particularly due to its energy efficiency and ability to handle complex deep learning tasks with its GPU capabilities. These platforms are representative of the types of hardware commonly used in edge computing scenarios, where resource efficiency is paramount.

Given the limited computational resources on edge devices, we employed quantization to optimize the ResNet50 model for deployment. Specifically, we chose QAT over post-training quantization because QAT allows the model to adapt to reduced precision during the training process itself, thereby minimizing accuracy loss. This approach is especially critical when deploying models like ResNet50 on devices that rely heavily on CPU-based inference, such as the Intel NUC. By integrating quantization into the training loop, we ensured that the model maintained higher accuracy levels when deployed on resource-constrained devices, addressing the typical trade-offs associated with standard quantization methods.

Our methodology involved fine-tuning the ResNet50 model for two distinct tasks: Martian terrain classification and wildfire detection, to assess the model’s adaptability to different environmental conditions. For the Martian terrain classification task, we utilized a dataset curated by Lu, Steven, and Wagstaff (2020), comprising 6,820 images collected by the Mars Science Laboratory (MSL) Curiosity Rover. These images, captured by the Mast Camera (Mastcam) Left Eye, the Mast Camera Right Eye, and the Mars Hand Lens Imager (MAHLI), were chosen for their scientific relevance and the challenge they present in distinguishing between 19 distinct geological classes. The training set included 5,920 images, including augmented images, which were randomly sampled from sol (Martian day) range 1 – 948. The validation set consisted of 300 images and was randomly sampled from sol range 949 – 1920. The testing set contained 600 images, randomly sampled from sol range 1921 – 2224. These images were used to evaluate the final model’s performance, providing insights into its ability to classify new, unseen Martian terrain accurately.

FIG 1 MSL Data

Figure 1 – Sample images from the MSL dataset. Left image classified as ‘wheel’ and right image classified as ‘distant landscape.

In contrast, the wildfire detection task employed a dataset curated by El-Madafri et al. (2023), featuring images of forested areas categorized into ‘with fire’ and ‘without fire.’ This dataset was selected to evaluate the model’s performance in a high-stakes, real-world scenario where rapid detection is crucial. The images were sourced from various platforms, elevations, angles, and resolutions, providing a diverse set of data for robust model training. These two case studies were chosen to represent different types of edge computing applications, one focused on extraterrestrial exploration and the other on environmental monitoring. The training set comprised of 1,888 images; this set served as the foundation for model learning. The validation set consisted of 402 images, and this subset was used during the fine-tuning process to adjust model parameters, ensuring that the model generalizes well to unseen data. The testing set contained 410 images, and it was reserved for the final evaluation of the model, providing a measure of its performance on data it had not encountered during training.

Fig-2 Wildfire data

Figure 2 – Sample Images from the wildfire dataset. Both images contain smoke and fire and are classified as ‘fire.’

To evaluate the performance of the fine-tuned and quantized ResNet50 models, we established baseline benchmarks on the selected edge devices. The key metrics considered in this study were average inference time and model size, specifically the memory footprint required to store each model. These metrics were chosen because they directly impact the feasibility of deploying deep learning models in resource-constrained environments. Inference time was measured in milliseconds, and model size was assessed based on the memory usage in megabytes.

  Intel NUC Nvidia Jetson Nano
Wildfire-Baseline Model 316ms 50 ms
Martian-Baseline Model 580 ms 62 ms

Table 1 – Baseline Inference Time Metrics for Wildfire and Martian Terrain Classification Models on Intel NUC and Nvidia Jetson Nano.
[This table presents the average inference time (in milliseconds) for the baseline ResNet50 models, trained on wildfire detection and Martian terrain classification tasks, when deployed on two edge devices: the Intel NUC and Nvidia Jetson Nano. The Intel NUC, which relies on CPU-based inference, exhibits longer inference times compared to the Nvidia Jetson Nano, which benefits from GPU acceleration.]

Model Size (MB)
Wildfire-Baseline 90 MB
Martian-Baseline 92 MB

Table 2 – Baseline Model Size for Wildfire and Martian Terrain Classification Models.
[This table shows the memory footprint (in megabytes) of the baseline ResNet50 models used for wildfire detection and Martian terrain classification tasks. The Wildfire-Baseline model occupies 90 MB, while the Martian-Baseline model occupies 92 MB, indicating the storage requirements for deploying these models on edge devices.]

Despite employing the same ResNet50 architecture, the baseline models for Martian terrain classification and wildfire detection exhibited differences in inference time and model size, primarily due to the variation in the number of output classes. For instance, the wildfire detection model, with only two output classes, exhibited faster inference times compared to the Martian terrain model, which had 19 output classes. The Nvidia Jetson Nano, benefiting from GPU acceleration, outperformed the Intel NUC in terms of inference speed, particularly for the more complex Martian terrain task.

By applying static quantization, we further enhanced the models’ performance, particularly on the Intel NUC, which relies on CPU-based inference. Quantization reduced both the memory footprint and the computational complexity, thereby improving inference time without significant loss of accuracy. The performance variance across different devices and tasks was carefully assessed to validate the effectiveness of the quantization techniques used.

Results and Discussion

The application of QAT markedly improved the computational efficiency of the ResNet50 models, with minimal impact on accuracy. The results of this study are presented in terms of the reduction in memory footprint and inference time for both the Martian terrain classification and wildfire detection models.

Table 3 Model size

Table 3 – Model Size Comparison Before and After Quantization.
[This bar chart compares the memory footprint (in megabytes) of the baseline and quantized ResNet50 models used for Martian terrain classification and wildfire detection tasks. The baseline models for both tasks are significantly larger, with the Martian-Baseline at 92 MB and the Wildfire-Baseline at 90 MB. After applying static quantization, the size of both models is reduced to 24 MB, demonstrating a substantial decrease in storage requirements while maintaining the model’s functionality on edge devices.]

Average inference Time

Table 4 – Average Inference Time Comparison Before and After Quantization.
[This bar chart illustrates the average inference time (in milliseconds) for the ResNet50 models used in Martian terrain classification and wildfire detection tasks. The Martian-Baseline model has an inference time of 580 ms, and the Wildfire-Baseline model takes 316 ms. After applying static quantization, the inference times are significantly reduced, with the Martian-Quantized model at 186 ms and the Wildfire-Quantized model at 140 ms. This reduction highlights the efficiency gains achieved through quantization, particularly in time-sensitive edge computing scenarios.]

Case Study One: Martian Terrain Classification

Quantization significantly reduced the memory footprint of the Martian terrain model from 92 MB to 24 MB, representing a reduction of approximately 74% (see Table 3). This reduction is consistent with the expected outcome, as converting 32-bit numbers to 8-bit integers decreases the size of the model by a factor of four. In terms of inference time, the quantized Martian model saw a reduction of approximately 68%, with inference time decreasing from 580 ms to 186 ms (see Table 4). Despite these improvements in computational efficiency, the quantization discrepancy, or the slight decrease in classification accuracy, was kept below 2%. The baseline classification accuracy for the Martian model was 82%, which decreased by less than 2% after quantization. This minor reduction in accuracy is considered acceptable given the substantial gains in performance, demonstrating that quantization is a viable approach for deploying sophisticated neural networks on edge devices without sacrificing critical accuracy. Incorporation of Dynamic Architectures

Case Study Two: Wildfire Detection

Similarly, the wildfire detection model’s memory footprint was reduced from 90 MB to 24 MB, corresponding to a reduction of about 73% (see Table 3). Inference time was also significantly reduced by approximately 56%, from 316 ms to 140 ms (see Table 4). The baseline classification accuracy for the wildfire detection model was 94%, with a quantization discrepancy of less than 1%, indicating a minimal impact on the model’s ability to accurately detect wildfires. This finding underscores the effectiveness of QAT in maintaining high accuracy while optimizing model performance for deployment in resource-constrained environments.

Overall Implications and Future Work

The results from both case studies highlight the potential of QAT as a powerful technique for enhancing the efficiency of DNNs deployed on edge devices. The substantial reductions in both memory footprint and inference time across different tasks demonstrate the versatility and effectiveness of this approach. These findings are particularly significant for edge computing environments, where computational resources are limited, and the ability to maintain high accuracy with optimized performance is crucial. The minor quantization discrepancies observed suggest that QAT strikes an effective balance between efficiency and accuracy, making it a valuable tool for deploying AI models in real-world scenarios.

The success of applying QAT in this study provides a robust foundation for expanding research into additional edge computing environments. Future investigations will explore the applicability of these methodologies on other platforms, such as the Nvidia Jetson, leveraging the advanced capabilities of Nvidia’s TensorRT framework. Additionally, we plan to broaden the hardware scope by incorporating the Snapdragon Automotive Development Board (SA8155P) as a new platform for our experiments, thus diversifying the contexts in which our findings can be validated.

However, while QAT has proven effective in the context of Martian terrain classification and wildfire detection, it is important to acknowledge potential challenges when applying this technique to other image processing applications. For example, tasks requiring high precision, such as medical imaging or fine-grained object recognition, might struggle with the reduced numerical precision inherent in quantization. In such cases, the trade-off between performance and accuracy needs to be carefully evaluated to determine if QAT is suitable. Future research will focus on identifying specific application domains where QAT might be less effective and exploring potential mitigations, such as mixed-precision approaches or selective quantization, to balance these trade-offs.

Building on the success of model compression, our forthcoming research will also explore dynamic computational techniques aimed at further optimizing real-time model performance. Specifically, we intend to implement dynamic architectures, such as SkipNet (Wang et al., 2018) and BranchyNet (Teerapittayanon et al., 2017), to experiment with dynamic backbones. These techniques, combined with quantization, will be assessed for their potential to enhance efficiency and performance in constrained computational environments typical of edge devices. By exploring these advanced methods, we aim to establish new benchmarks for deploying sophisticated neural networks in scenarios where computational resources are limited, further pushing the boundaries of what is possible with edge computing.

Conclusion

This study has demonstrated the efficacy of QAT in optimizing the ResNet50 model for edge AI applications. By fine-tuning the ResNet50 model for Martian terrain classification and wildfire detection, and subsequently applying QAT, we achieved significant reductions in model size—approximately 73-74%—and inference times—56-68%—while maintaining high accuracy levels. The quantization process effectively minimized the typical accuracy loss associated with such techniques, with discrepancies of less than 2% for the Martian model and less than 1% for the wildfire detection model. These findings highlight the viability of deploying sophisticated neural networks within resource-constrained edge computing environments, where performance and efficiency are critical.

The results from this study underscore the potential of QAT to balance computational efficiency with accuracy, making it a valuable tool for a wide range of edge computing applications. The substantial improvements observed in both case studies suggest that QAT can be applied to other platforms and tasks, further advancing the capabilities of edge computing.

Looking ahead, future research will explore the extension of these techniques to additional platforms, such as the Nvidia Jetson and Snapdragon Automotive Development Board, to validate and expand upon our findings. Additionally, our research will delve into dynamic computational techniques, including SkipNet and BranchyNet, to further enhance real-time model efficiency without compromising performance. The integration of these advanced strategies aims to establish new benchmarks for deploying sophisticated neural networks in scenarios with limited computational resources, such as robotic space exploration and real-time environmental monitoring.

In conclusion, this research not only underscores the importance of model optimization for edge computing but also sets the stage for broader applications of AI technologies in critical, real-world situations where speed and efficiency are paramount. The methodologies and findings presented here contribute to the ongoing advancement of AI in edge environments, paving the way for more intelligent and efficient systems in the future.

Acknowledgements

The research and its results outlined in this paper is based upon work supported by NASA under award Nos. 80NSSC21M0312 and 80NSSC23M0054.

References

Ciraolo, Davide, Maria Fazio, Rocco Calabrò, Massimo Villari, and Antonio Celesti. “Facial Expression Recognition Based on Emotional Artificial Intelligence for Tele-Rehabilitation.” Digital Signal Processing and Communication Systems (2024). https://www.sciencedirect.com/science/article/pii/S174680942400154X.

El-Madafri, Ismail, Marta Peña, and Noelia Olmedo-Torre. “The Wildfire Dataset: Enhancing Deep Learning-Based Forest Fire Detection with a Diverse Evolving Open-Source Dataset Focused on Data Representativeness and a Novel Multi-Task Learning Approach.” Forests 14, no. 9 (2023): 1697. https://doi.org/10.3390/f14091697.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition.” 2015. arXiv:1512.03385 [cs.CV].

Lu, Steven, and Kiri Wagstaff. “MSL Curiosity Rover Images with Science and Engineering Classes (Version 2.1.0) [Data set].” Zenodo, 2020. https://doi.org/10.5281/zenodo.1049137.

Nagel, Markus, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, and Tijmen Blankevoort. “A White Paper on Neural Network Quantization.” 2021. arXiv:2106.08295 [cs.LG].

Odili, Patrick, Cosmas Daudu, Adedayo Adefemi, Ifeanyi Onyedika, and Gloria Usiagu. “Integrating Advanced Technologies in Corrosion and Inspection Management for Oil and Gas Operations.” Engineering Science & Technology Journal (2024).
https://doi.org/10.51594/estj.v5i2.835.

Reina, Giulio. “Robotics and AI for Precision Agriculture.” Robotics 13, no. 4 (2024). https://www.mdpi.com/2218-6581/13/4/64/pdf.

“ResNet50 Model – Torchvision.” Pytorch. Accessed January 9, 2024. https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html.

Teerapittayanon, Surat, Bradley McDanel, and H. T. Kung. “BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks.” 2017. arXiv:1709.01686 [cs.NE].

Wang, Xi, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez. “SkipNet: Learning Dynamic Routing in Convolutional Networks.” 2018. arXiv:1711.09485 [cs.CV].

Author Biographies

Matthew Wilkerson is an undergraduate student at Fayetteville State University, majoring in computer science. He currently serves as an undergraduate research assistant at the university’s Intelligent Systems Laboratory, where he is assigned to two NASA funded research projects involving deep neural networks. Prior to attending Fayetteville State University, Matthew served in the U.S. Army for over 22 years.

Grace Vincent is an Electrical Engineering Ph.D. student within the NC Plant Sciences Initiative at North Carolina State University. She received a B.S. in both Computer Science and Mathematics in 2022 from Fayetteville State University. She continues as a Graduate Research Assistant at the Intelligent Systems Lab at Fayetteville State University. Her research focus lies within computer vision, AI, and earth science.

Dr. Zaki Hasnain is a data scientist in NASA JPL’s Systems Engineering Division where he participates in and leads research and development tasks for space exploration. His research interests include physics informed machine learning and system health management for autonomous systems. He has experience developing data-driven, game-theoretic, probabilistic, physics-based, and machine learning models and algorithms for space, cancer, and autonomous systems applications. He received a B.S. in engineering science and mechanics at Virginia Polytechnic and State University. He received M.S. and Ph.D. degrees in mechanical engineering, and a M.S. in computer science at the University of Southern California.

Dr. Emily Dunkel is a Data Scientist in the Science Data Modeling and Computing Group, and the Astronomy and Physics Technology Demonstrations Office at the Jet Propulsion Laboratory, California Institute of Technology. She works on projects involving machine learning, computer vision, and physics-based modeling. Prior to joining JPL, Emily worked in the defense industry, developing physics-based models and deep learning algorithms, as well as at TrueCar, where she was a Statistician. Emily has a Ph.D. in Physics from Harvard University, where she developed methods to model quantum systems. She has bachelor’s degrees in chemical engineering and physics from UCLA.

Dr. Sambit Bhattacharya is tenured Full Professor of Computer Science at Fayetteville State University, North Carolina, USA. In 2023 he was honored by the University of North Carolina (UNC) Board of Governor’s Award for Teaching Excellence. Dr. Bhattacharya is experienced in developing and executing innovative and use-inspired research in Artificial Intelligence and Machine Learning (AIML) with a broad range of techniques and applications, with multidisciplinary teams and he leads externally projects funded projects. He directs the Intelligent Systems Lab at Fayetteville State University which hosts research and houses resources like robotics equipment, and high-performance computing for AIML research. Dr. Bhattacharya has served as faculty research fellow in research labs of the US Department of Defense, and he is currently a Visiting Scientist (part-time) at National Geospatial Intelligence Agency starting 2023.

ITEA_Logo2021
  • Join us on LinkedIn to stay updated with the latest industry insights, valuable content, and professional networking!