MARCH 2025 I Volume 46, Issue 1
Real-Time Inference for Unmanned Ground Vehicles Using Lossy Compression and Deep Learning
MARCH 2025 I Volume 46, Issue 1
MARCH 2025
Volume 46 I Issue 1
School of Mathematical and Statistical Sciences
Clemson University, Clemson, SC, USA
School of Mathematical and Statistical Sciences
Clemson University, Clemson, SC, USA
Holcombe Department of Electrical and Computer Engineering
Clemson University, Clemson, SC, USA
Holcombe Department of Electrical and Computer Engineering
Clemson University, Clemson, SC, USA
Holcombe Department of Electrical and Computer Engineering
Clemson University, Clemson, SC, USA
Autonomous vehicles rely on on-board perception systems for safe terrain navigation which becomes exceedingly important in rural areas. The aim of this study is to explore the effect compressed training images have on the performance of deep learning segmentation architectures and determine if lossy compression is a practical solution for providing real-time transfer speed for autonomous vehicle perception systems. To test the performance of compression on deep learning we apply ZFP, JPEG, and SZ3 to EfficientViT and UNet and rank test accuracy. As a result, this study found JPEG to achieve the highest compression ratio of 144.49× at JPEG quality level 0; while also achieving the fastest transfer speed of the compressors used on the Nvidia Xavier Edge Device. Furthermore, JPEG achieved the highest mIoU accuracy for both architectures tested in comparison to SZ3 and ZFP. Of the two deep learning architectures tested, EfficientViT outperforms U-Net for all lossy compressors at all levels of compression. EfficientViT achieves a peak mIoU of 95.5% at a JPEG quality level of 70. While U-Net peaks with an mIoU of 90.683% at a JPEG quality of 40.
This study advances autonomous vehicle development in two ways. First, it demonstrates that JPEG compression outperforms specialized scientific compressors (SZ3/ZFP) for off-road RGB perception systems. Second, it validates EfficeintViT’s effectiveness for resource-constrained autonomous navigation. These findings benefit autonomous vehicle engineers implementing perception systems, computer vision researchers working on embedded applications, and industry teams deploying off-road autonomous navigation solutions.
Keywords: Compression, Machine Learning, Computer Vision, Off-Road, Semantic Segmentation
– ∩: The intersection of two sets contains all elements that are common to both sets
– ∪: The union of two sets contains all elements that appear in either set (or both)
– C1: Constant to stabilize division when the denominator is small
– C2: Constant to stabilize division when the denominator is small
– µx: Mean value of pixel values for some image x
– µy: Mean value of pixel values for some image y
– σx: Variance value of pixel values for some image x
– σy: Variance value of pixel values for some image y
The mobile nature of Unmanned Ground Vehicles (UGV) requires restricted physical space and limited energy consumption, especially in off-road environments. UGVs are often deployed on rural terrain making it crucial for them to use many systems for perception such as Light Detection and Ranging (LiDAR), Radio Detection and Ranging (RADAR), and Red, Green, Blue (RGB) cameras [1]. RADAR only detects one object at once, while LiDAR is used for point-cloud distance data [2].
UGVs lack the computational power necessary to perform real-time (near zero seconds) inference using overly-inefficient semantic segmentation architectures [3]. UGVs often have computational power similar to or less than edge devices such as the Jetson Xavier that offer up to 21 Tera Operations Per Second (TOPS). Semantic segmentation architectures calculate classifications for every pixel of an image. These architectures use 10 TOPS or more depending on the image resolutions and amount of feature extraction [44]. In scenarios where semantic segmentation architectures cannot be run on-board, due to physical space limitations, power restrictions, or computational constraints, image data is capable of being transferred to a cloud-based system or external computer for processing.
Latency refers to the time it takes to transfer data from a host to a target [4]. By compressing images from a camera feed, the overall latency of data transferring is reduced. Compression techniques such as JPEG achieve a 1.14× speedup for medium datasets [5], [6]. In rural settings, bandwidth is much lower (100-220ms) when compared to urban settings (70-80ms) [7]. This reduction in bandwidth greatly hinders the ability of off-road UGVs to send and receive data quickly to deep learning architectures for inferencing. The total time required for the UGV to compress and send the data is pipeline-latency.
As the bandwidth decreases, the need for data compression rises. Data compression is performed on-board with lossless and lossy compressors. Lossless compressor algorithms reduce the byte-size of information while maintaining the ability to reconstruct the original data byte for byte [8]. Lossy compressor algorithms greatly reduce the original data size compared to their lossless compressor counterparts; however, the compressed data is unable to be reconstructed into the original image without introducing distortions [9]. Data distortion in lossy compressors can be traded against compression, so an increased compression can result in smaller sizes at the expense of more distortion . Smaller sizes of data reduce pipeline-latency significantly when transferred through a low-bandwidth network . The benefit of reduced transfer speed from lossless and lossy compressed images helps achieve near real-time autonomy on UGVs.
UGVs can utilize semantic segmentation deep learning architectures to classify each pixel of an image to a class, which then produces a map of the image segmented into clusters of classes called a mask [10]. Accuracy refers to the overlap between the masks and the ground-truth; to produce accurate predictions, semantic segmentation architectures need accurate images. Increased distortion caused by lossy compression typically decreases the architecture’s accuracy, due to the loss of information about the original data. At smaller bandwidths, a balance of compact data and low distortion must be met in order to achieve fast, but still accurate, architecture predictions.
This study contributes the following:
Multiple studies [5], [6], [11] try to using compression [12]. M. H. Faykus [11] focuses solely on JPEG compression applied to edge device testing for autonomous off-road perception systems and excludes compression algorithms such as SZ3 and ZFP, while this study includes these compression algorithms. M. Rahman [5] applies error-bounded lossy video compression for CNN-based pedestrian detection vision architectures but does not study the performance with off-road terrain data, and similarly, [6] focuses on applying lossy video compression to pedestrian identification using the YOLO CNN-based vision model. Our study differs from the previous literature by focusing on applying ed lossy compression to off-road images taken in low-bandwidth rural environments, in order to train and test semantic ViT and CNN architectures for autonomous off-road perception with a comparison against lossless compressors as a baseline.
In lossless compression, the data after decompression is byte-for-byte the same as the original data compressed. For images, this means that the raw image after decompression is pixel-for-pixel exact (no artifacts) to the original image. Lossless compression is preferred because no distortion is introduced into the data due to the compression.
The following lossless compressors are considered in this study:
Error-bounded Lossy compression algorithms compress data while allowing the user to control the maximum allowable difference between original data and the reconstructed data after compressing and decompression. Despite the loss of data, lossy compressors work to ensure that maximal visual information of the data is retained as it is compressed and decompressed. Lossless compression is ideal, but the large memory size of images yields the data to better performance through lossy compression due to the higher compression ratios. In our case images from a perceptual system like an RGB camera system are only compressed once, thus making lossy compression preferable for use in sending image data from the UGV to a cloud-based processing system.
The following lossy compressors are considered in this study:
Transmission Control Protocol (TCP) is used as the data transfer protocol. TCP is a core part of the Internet Protocol suite and ensures a reliable connection between devices [26]. TCP splits image data into smaller ”packets” of data, which are sent to a recipient; the recipient is guaranteed to receive the packets and sends back acknowledgment packets to confirm the successful delivery of each packet.
These architectures were selected based on their demonstrated performance in semantic segmentation tasks. U-Net has shown good generalizability from small sets [27] while EfficientViT has a good speed-accuracy balance [30].
Fig. 1: U-Net Architecture [27]
Fig. 2: EfficientViT Architecture [30]
The CAVS Traversability (CaT) dataset [31] was developed specifically for off-road autonomous navigation research. It contains 1,812 color images collected from diverse off-road environments including forests, fields, and trails across different seasons and lighting conditions. Moreover the testing set (30%) was chosen to ensure representativeness of the data by including general and specific terrain challenges. CaT was chosen for this study as it represents the current standard for off-road UGV perception evaluation, providing comprehensive ground-truth segmentation labels for traversable and non-traversable terrain. While other datasets exist for urban autonomous driving, CaT uniquely addresses the specific challenges of unstructured off-road environments.
We tested five lossless and three lossy data compression algorithms using defined test metrics to try to reduce data transfer times. The algorithms are tested on the CaT [31] dataset, and the results are modeled using a series of increasing transfer speeds to illustrate when compression is needed.
After the CaT dataset has undergone compression, EfficientViT and U-Net are trained on the compressed RGB images for each quality level of JPEG compression and each error bound of SZ3 and ZFP, as defined later. Training occurs independently, and accuracy and variation with different compressors and compression levels are measured. Only the RGB images are compressed; the masks (used for validating the accuracy of the model and for model training) are left uncompressed to ensure the model is not training on distorted masks. Distorted training masks degrade the model’s ability to provide accurate and realistic segmentation predictions.
After the CaT dataset has undergone compression, data transfer latency is calculated at varying bandwidth levels for each compressor at their maximum compression level respectively.
Hardware & Software Details | Details |
Xavier Make and Model | P2972, Jetson AGX Xavier |
EfficientViT Training Hardware | 1xV100, 24 CPU cores, 40 GB RAM |
U-Net Hardware | 1xV100, 1 CPU core, 16 GB RAM |
cudnn | 8.7.0.84 |
CUDA | 11.8.0 |
PyTorch Version | 2.1.2 |
Libpressio Version | 0.99.5 |
scikit-image Version | 0.24.0 |
Pillow Version | 10.4.0 |
SZ3 Version | 3.2.0 |
ZFP Version | 1.01 |
TABLE I: Hardware and software specifications for Xavier, EfficientViT, and U-Net. The code for this is available at https://github.com/CUFCTL/DL-for-UGVs-using-Compression
For the error-bounded lossy compressors SZ3 and ZFP, the image data is normalized to a floating point number from 0 to 1 to improve the compressors’ performance. This normalization ensures that the error bounds have a consistent meaning relative to the pixel data range. The normalized data is then compressed at seven different error bounds from e−1 to e−7 although for simplicity these bounds are referred to as 1E-1 to 1E-7. JPEG compression ranges from quality level 0 (Q0) to quality level 100 (Q100) by increments of 10 at a time. A quality level of 0 indicates maximum lossy compression while a quality level of 100 indicates minimal lossy compression. For the lossless compressors, a compression level of 9 is solely used from a range of compression levels 0 to 9 where 0 indicates no lossless compression and 9 indicates maximum lossless compression. For SZ3 and ZFP, Libpressio [32] is used for implementing their compression algorithms while the Python Pillow Image library is used for implementing JPEG compression. We designed this experiment for high throughput as all compressed data types of all levels were treated equally and ran on the Clemson Palmetto II Supercomputer Cluster using parallelization.
The following metrics are used to evaluate compression algorithms:
1.) Compression Ratio (CR)
CR determines the efficiency of compression algorithms by comparing the original data size to its compressed size.
CR shows the efficiency of the relative reduction in the data size. The higher the CR value, the better the relative decrease in data achieved.
2) Compression Bandwidth (cBW)
cBW is the speed at which the compressor reduces the size of a fixed amount of input data. As real-time applications are the focus here, the timing does not include the time needed to load from the disk as the data is resident in memory or read from an input buffer of a sensor device.
3) Decompression Bandwidth (dBW)
dBW is the speed at which the compressor decompresses the data. This includes the time for any additional post-processing after decompression necessary to return it to its correct form.
4)
5) SSIM
SSIM (Structural Similarity Index Measure) is used to measure the perceptible distortion of visual media like images or videos [34]. The SSIM requires an original image and a second image to compare to the original. In this case the original image is the original RGB terrain image and the second image is the lossy decompressed version of the original. The SSIM of two images x and y is calculated by the following equation:
Here, µx and µy are the mean pixel values of x and y respectively, σx2 and σy2 are the variances of x and y respectively, σxy is the covariance of x and y, and C1, C2 are constants to stabilize division with small denominators. To calculate this value in practice, the Scikit-Image Python [33] library’s structural similarity function was used with the input image data and the decompressed image data with a data range of 255.
For our studies, SSIM is used for off-road terrain segmentation tasks which requires preserving the structural integrity of the off-road image data. To elaborate, parts of off-road image data like the sky may be greatly distorted while still bringing little effect to segmentation accuracy since the sky is much more distinguishable than something like a fallen tree.
To determine when compression reduces the latency from the camera feed to the processing unit, we measure the entire compression/decompression time along with the data transfer time. To benefit from image compression, the time to send compressed images, Tcomp send, must be less than the time to send uncompressed images, Tsend. Tsend is defined as the data size over transfer bandwidth (BW).
Tcomp send is defined as the sum of the time to compress and send the reduced size data, Comp_Size
Semantic segmentation architectures such as EfficientViT and U-Net, accurately predict drivable terrain on uncompressed images [35] from the CaT Dataset [31], as illustrated in Figure 9. To determine if EfficentViT and U-Net maintain high accuracy on the compressed data, they are tested on lossy compressors SZ3, ZFP, and JPEG across multiple error levels.
When training EfficientViT and U-Net on CaT, a data split of 70% / 30% for Train and Test is used respectively, as provided in the dataset by default. The data contains 1812 images whereas the testing set is chosen to ensure variability, including general and specific terrain challenges [31]. This is uniform among all error levels for all compressors.
Fig. 3: IoU Equation [39]
A cloud-based system is a viable solution when an off-road system cannot efficiently process segmentation. However, this requires transferring data from off-road systems to cloud-based systems. Depending on the bandwidth between the off-road and cloud-based system, this communication time may vary. Because the data transfer is on the critical path of the processing pipeline (Figure 4), low bandwidth results in the data transfer becoming the performance bottleneck. The data are compressed to make the transfer more efficient for lower bandwidths, but at the addition of time to compress images.
Although applying data compression requires time for compression and decompression before and after transmitting the data to the cloud-based system, the overall effective bandwidth is lower than transmitting uncompressed images by sending the same information in fewer bytes.
Fig. 4: Data Transfer Pipeline
TCP has an average packet size ranging from 20 bytes to 60 bytes [26], which is a significant amount of data for a heavily lossy compressed image. The distortion caused from data loss during lossy compression compounded with potentially significant image data lost from lost packets could result in segmentation malfunctions and erroneous model decisions.
The HTTP POST requests were made via the Python requests.post command. Each image is transferred using TCP to a Flask-hosted Python server through the Xavier device, which is connected to the network via Ethernet; having an upload speed of approximately 303.93 Mbs. The server runs on a laptop, with a 13th Gen Intel(R) Core(TM) i7-13700HX CPU, Intel(R) WiFi 6E AX211 160MHz WiFi adapter, and 32.0GB of DDR5 4800MT/s on-board RAM, and is hosted on the same network, with a download speed of approximately 458.33 Mbs. Each HTTP POST request to the server has binary information for 1, 15, 30, 45, and 60 images respectively. The uncompressed image processing times are calculated with a compression time of 0. The SZ3 and ZFP lossy compressors are also grouped by error bounds, ranging from 1E-1 (highest compression bound) to 1E-7 (lowest compression bound). Similarly, the JPEG compressor is organized with quality levels, from Q0 to Q100.
Next, we discuss the data transfer results and evaluation metrics, with the total transfer time and mIoU modeled. We evaluate the metrics to determine which of the lossy or lossless data compression algorithms tested is most suitable to send real-time off-road image data between an UGV and cloud-system for semantic segmentation inference while maintaining accuracy.
Assuming a set bandwidth for the transfer speed from the UGV to the cloud-based system, the reduction in data is modeled in Figure 7 and shows improvement to the system’s overall pipeline-latency.
When bandwidth is low, there is a need for compression. For example, the time to transfer the CaT dataset was reduced from 379.70 seconds to 53.30 seconds with ZLIB compression, the fastest transfer speed of the lossless compressors tested. As the bandwidth is increased, the need for compression decreases. As the bandwidth is increased to 1000Mb/s, the quickest transfer time was for the uncompressed data. This is caused by the time it takes to compress and decompress the data, which becomes the bottleneck in the transferring process.
These results show that when UGVs are deployed in low bandwidth environments, data compression makes cloud-based processing a viable option for reduced pipeline-latency. When UGVs are in urban areas, the need for the data compression is lessened by the improved bandwidth.
Fig. 5: SZ3 and ZFP compression over error bounds
Fig. 6: JPEG compression over error bounds
Fig. 7: Time in seconds in relation to Bandwidths and compression ratio of lossless and lossy compressors to transfer the CaT training data
Although lossless compression shows improved transfer speeds at lower bandwidths, the inability of lossless compressors to discard less vital information is a downside of their use leaving improvement to reach real-time pipeline-latency. Figures 6 and 7 illustrate the impact of increasing error bounds on image quality for three lossy compression algorithms. Lossy compressors achieve higher compression ratios with faster transfer speeds, Figure 7, compared to lossless compression. When bandwidth is below 1000Mb/s, a bottleneck occurs by the generated data size being larger than the system’s transfer capabilities. When bandwidth is above 1000Mb/s there is a bottleneck caused by added compression time to send smaller data through lower bandwidths. Sending data utilizing lossy compressors at high bandwidths takes longer than sending uncompressed image data. Comparing the three lossy compressors, JPEG comfortably outperforms ZFP and SZ3 when all are set at their highest compression level or bound tested. Comparing ZFP and SZ3, SZ3 is able to achieve roughly 53x the compression ratio of ZFP at the same error bound of 1E-1, Figure 8. This large difference between compression ratios is most apparent at a bandwidth of 5Mb/s where ZFP takes approximately 3 times as long to send the CaT images than SZ3. Additionally, comparing SZ3 to the lossless compressors in Figure 7, SZ3 achieves a much higher compression ratio than any of the lossless compressors at the cost of a high compression time. This is due to a configurable internal lossless component inside SZ3. In this case, it is optimizing for compression ratio so that is also part of the reason why it achieves a much higher compression ratio than any of the other lossless compressors. As for ZFP in comparison to the other compressors, it appears to perform rather poorly for the data it compressed on for this study, taking the longest to transfer at the speed of 100Mb/s. This may be due to ZFP not being well suited for compressing this kind of data and might perform better on very large floating point data When looking at the SSIM vs Compression Ratio Graph 8 JPEG lossy compression provides a higher SSIM score at similar compression ratios for all quality levels, compared to the SZ3 and ZFP lossy compressors when applied to the CaT Dataset.
Fig. 8: Lossy compressors SSIM vs Compression Ratio Graph
Tables II, III, IV show the average performance of the SZ3, ZFP, and JPEG lossy compressors. Each cell entry corresponds to the pipeline-latency: sum of the average compression time, the binary conversion time, and the time until server response for the information of all images in the HTTP POST request.
Number of Images per POST | |||||
Error Bound | 1 | 15 | 30 | 45 | 60 |
1E-1 | 0.0831 | 0.9060 | 1.6948 | 2.8403 | 3.3126 |
1E-2 | 0.0935 | 0.9375 | 1.8179 | 2.8803 | 3.6589 |
1E-3 | 0.0927 | 1.0544 | 2.0068 | 3.1536 | 3.9589 |
1E-4 | 0.0987 | 1.1036 | 2.2203 | 3.3758 | 4.3075 |
1E-5 | 0.1080 | 1.2099 | 2.3917 | 3.6528 | 4.5420 |
1E-6 | 0.1107 | 1.3283 | 2.6460 | 4.1315 | 5.2813 |
1E-7 | 0.1158 | 1.4604 | 2.8972 | 4.5281 | 5.6620 |
Original | 0.1552 | 2.5250 | 5.1065 | 6.7330 | 11.5417 |
TABLE II: SZ3 compression and pipeline-latency (seconds).
Number of Images per POST | |||||
Error Bound | 1 | 15 | 30 | 45 | 60 |
1E-1 | 0.0683 | 0.8329 | 1.5947 | 2.2406 | 3.1137 |
1E-2 | 0.0747 | 0.8508 | 1.7024 | 2.5850 | 3.5521 |
1E-3 | 0.0826 | 1.1571 | 2.2353 | 3.4467 | 4.7076 |
1E-4 | 0.0873 | 1.2860 | 2.5480 | 3.7240 | 5.3518 |
1E-5 | 0.0962 | 1.5247 | 2.7777 | 4.6903 | 5.8239 |
1E-6 | 0.1054 | 1.6874 | 3.0794 | 6.4083 | 5.9869 |
1E-7 | 0.1172 | 1.8434 | 3.6109 | 7.2395 | 7.2397 |
Original | 0.1552 | 2.5250 | 5.1065 | 6.7330 | 11.5417 |
TABLE III: ZFP compression and pipeline-latency (seconds).
Number of Images per POST | |||||
Error Bound | 1 | 15 | 30 | 45 | 60 |
Q0 | 0.0284 | 0.0949 | 0.1472 | 0.2164 | 0.2632 |
Q10 | 0.0217 | 0.0907 | 0.1358 | 0.2317 | 0.3256 |
Q20 | 0.0235 | 0.1310 | 0.1604 | 0.2427 | 0.4002 |
Q30 | 0.0259 | 0.1189 | 0.1593 | 0.3362 | 0.3892 |
Q40 | 0.0229 | 0.1098 | 0.2021 | 0.3206 | 0.4851 |
Q50 | 0.0247 | 0.1124 | 0.2248 | 0.3338 | 0.6007 |
Q60 | 0.0268 | 0.1121 | 0.2140 | 0.2475 | 0.3907 |
Q70 | 0.0255 | 0.1379 | 0.2137 | 0.3342 | 0.6539 |
Q80 | 0.0284 | 0.2211 | 0.2657 | 0.3583 | 0.5012 |
Q90 | 0.0296 | 0.1883 | 0.3968 | 0.5238 | 0.8507 |
Q100 | 0.0416. | 0.3976 | 0.8189 | 1.2218 | 1.9571 |
Original | 0.1552 | 2.5250 | 5.1065 | 6.7330 | 11.5417 |
TABLE IV: JPEG compression and pipeline-latency (seconds).
When evaluating the performance of U-Net and EfficientViT we compare their accuracy on the compressed data (Figures 9 and 10) against the uncompressed data (Figures 11 and 12). Lossless compression algorithms are designed to reduce data size without sacrificing any original information, ensuring that decompressed data is identical to the original. Therefore, when architectures like U-Net and EfficientViT are evaluated on losslessly compressed data, the accuracy remains unchanged when compared to the uncompressed data.
Similar results are seen when using the JPEG compressor, where in Figure 10 accuracy is held above 95% until it drops lower to 94% with a quality level of Q0.
EfficientViT performs well with data from all three compressors JPEG, SZ3, and ZFP, Figure 10.
Fig. 9: U-Net MIoU vs Compression Ratio Graph
Fig. 10: EfficientViT MIoU vs Compression Ratio Graph
Fig. 11: Uncompressed/Lossless EfficientViT Accuracy
Fig. 12: Uncompressed/Lossless U-Net Accuracy
Data compression allows the processing of large amounts of image data on larger systems and faster transfer speeds in low bandwidth environments. The use of lossless and lossy data compression algorithms are viable means for enabling the use of a cloud-based system to UGV system. The use of lossless compressors does not allow for fast enough transfer speed to reach real-time pipeline-latency, but lossy compressors do lower latency without compromise of mIoU performance (1% variation) on U-Net and EfficientViT. EfficientViT outperforms U-Net on compressed image data allowing for higher levels of compression without a large diminishing accuracy. JPEG, among the lossy compressors tested, provides the best results in terms of transfer speed on an edge device, but degrades slightly more than ZFP and slightly less than SZ3. JPEG has the highest compression ratio out of the three tested lossy compressors and also possessed a higher SSIM value than the two lossy error-bounded compressors at every comparable compression ratio as demonstrated in Figure 8. ZFP degrades the least in performance on both U-Net and EfficientViT and has a medium compression speed between SZ3 and JPEG. However, ZFP achieves the lowest compression ratio out of all the lossy and lossless compressors at its largest error bound. In conclusion, carefully designed and evaluated data compression maintains UGV performance in lower bandwidth environments, while still maintaining a strong semantic segmentation performance for advanced Convolutional architectures like U-Net and advanced Vision Transformer architectures like EfficientViT.
A. Limitations and Assumptions
Our study operated under environmental and temporal constraints that affect the generalizability of the results. The testing was done in clear weather conditions and system performance may degrade in heavy rain, snow, or fog. Training additional architectures for other sensors may be needed to circumnavigate this. The data spanned daylight hours implying night operation would require additional lighting systems for navigation to rely on the RGB perception sensors. The results of the transfer speeds assume a continuous connection at set bandwidths. In real-world scenarios bandwidth will fluctuate and even go down at times, this study maps a range of bandwidths to account for low bandwidth scenarios.
B. Significance
The findings from this study have significant implications for autonomous vehicle development. Our results demonstrate that JPEG compression outperforms newer scientific compressors SZ3 and ZFP for off-road RGB perception tasks. This unexpected finding challenges assumptions about specialized compression techniques and suggests opportunities for simplified deployment architectures in autonomous systems. Additionally, our validation of EfficientViT’s performance on resource-constrained hardware advances practical autonomous navigation capabilities. This work bridges the gap between theoretical architectures and deployable solutions. Our analysis highlights the trade-offs between compression, transfer speeds and deep learning in real-world applications.
C. Implications
These results are particularly relevant for three groups: autonomous vehicle engineers utilizing our compression findings to optimize their perception pipeline, reducing latency thus improving real time decision making; computer vision researchers gain new insights into compression impacts on embedded vision system, thus enabling them to better refine their architectures for better performance; and industry teams accelerating off-road perception systems development by integrating our findings into their workflows.
The benefit of JPEG compression and EfficientViT provide a path to more efficient and realistic autonomous systems. By establishing a baseline for understanding the relationship between lossy compression, transfer speed, and deep learning; we provide a foundation for further research and optimization. Additionally, we offer practical insights to autonomous vehicle engineers on the performance of state-of-the-art error-bounded compressors, equipping them with the tools to make informed decisions about balancing compression efficiency and perception accuracy in real-world deployments.
D. Future Work
UGVs typically deploy a wide variety of perceptual systems beyond that of just colored imagery, such as LiDAR and RADAR. As LiDAR and RADAR data are typically large, this shows a promising avenue as compression trained architectures containing a large amount of data are seen to have higher accuracy. However, this would also bring challenges as data like LiDAR may introduce 3D data that is less optimal for error bounded compressors like SZ3 or ZFP while at the same time the floating point nature of LiDAR data may greatly improve SZ3 and ZFP’s compression performance. Additionally, the expansion to new compression algorithms could provide better compression ratios or faster compression and decompression times. For example, the utilization of video compression may be able to achieve a superior compression ratio on images that resemble the frames of a video feed, rather than compressing each image individually. Another example is with a modification of the SZ3 compressor, SZ3 is interchangeable in such a way that the wavelet transform portion is capable of being swapped for the discrete cosine transform (similar to JPEG) to help increase compression speeds. Furthermore, the use of a different transfer protocol could provide a potential speed boost; for example, UDP or the use of web-sockets could help to remove the time taken as when using TCP each packet requires sending an acknowledgment back to the sender, thus enabling even faster transfer speeds. Additionally, the use of lossy compression as preprocessing to boost accuracy may be studied further as in this study, the accuracy at certain compression levels is improved when compared against all compression levels.
Clemson University is acknowledged for their generous allotment of compute time on the Palmetto Cluster. This work is sponsored by the United States Defense Advanced Research Projects Agency under agreement HR00112320008. The content of the information does not necessarily reflect the position or the policy of the US Government, and no official endorsement should be inferred. This material is based upon work supported by the National Science Foundation under Grant No. SHF-1943114 and SHF-2312616. We acknowledge Adam Pickeral for his mentorship and contributions to this paper.
Ethan Marquez is an undergraduate student at Clemson University who is majoring in BS Computer Science with a minor in Physics and A.I.. His current research is with Dr. Mellissa Smith’s Machine Learning and Big Data Lab specializing in Deep Learning application. He is an author on a journal paper on Efficient Deep learning vision architectures for off road perception accepted to Scientific Research Publishing.
Adam Niemczura is an undergraduate at Clemson University. Majoring in BS Computer Science and a BS in Mathematics, his current area of research is in computer vision for autonomous off-road vehicle navigation. He is currently a member of Dr. Melissa Smith’s Machine Learning and Big Data Lab. He contributed to a journal paper on Efficient Deep learning vision architectures for off road perception which was accepted to Scientific Research Publishing. He has also presented this work at the Clemson Creative Inquiry Forum.
Cooper Taylor is an undergraduate at Clemson University. He is currently pursuing a BS in Mathematics and Computer Science. His research interests are in abstract math and machine learning. He is a member of the Clemson Watt AI Creative Inquiry and served as a research assistant for the group during the summer of 2024. During the summer his work entailed using machine learning to combat political misinformation on social media platforms.
Max Faykus III is a Ph.D student in Computer Engineering at Clemson University. His research interests include machine learning on varying levels of data compression. Max has experience working with data fusion using LiDAR and camera data for pixel-wise classification, Lossy and lossless OCT compression, and exploring how lossy compression distortion affects ML architectures. As a graduate student, he has taken several courses related to HPC including fault tolerance, parallel systems, GPUs, parallel computing and data compression. embedded computing, computer vision, parallel architecture, tracking systems. He also has taught at an undergraduate level related to topics including big data and machine learning.
Melissa C. Smith is a professor of Electrical and Computer Engineering at Clemson University with over 25 years of experience developing and implementing scientific workloads and machine learning applications across multiple domains, including 12 years as a research associate at ORNL. Her current research focuses on performance analysis and optimization with emerging heterogeneous computing architectures (GPGPU- and FPGA-based systems) for various application domains, including machine learning, high-performance or real-time embedded applications, and image processing. Her group collaborates with researchers in other fields to develop new approaches to the application/architecture interface, providing interdisciplinary solutions that enable new scientific advancements and capabilities.
Jon C. Calhoun directs the Future Technologies in Heterogeneous and Parallel Computing (FTHPC) Laboratory. The FTHPC Laboratory has broad interest in the advancement of high-performance computing systems and development of large-scale scientific software. Currently, the group’s primary efforts are dedicated to the development and integration of lossy and lossless data compression algorithms inside scientific workflows to remove key data generation, movement, and storage bottlenecks. His research group has been nominated and won the best paper and poster awards at prominent venues such as IEEE Cluster and the ACM Student Research Competition.
JUNE JOURNAL
READ the Latest Articles NOW!