Real-Time Inference for Unmanned Ground Vehicles Using Lossy Compression and Deep Learning

MARCH 2025 I Volume 46, Issue 1

Real-Time Inference for Unmanned Ground
Vehicles Using Lossy Compression and Deep Learning

Ethan Marquez

Ethan Marquez

School of Computing – Clemson University
Clemson, SC, USA

Adam Niemczura

Adam Niemczura

School of Mathematical and Statistical Sciences
Clemson University, Clemson, SC, USA

Cooper Taylor

Cooper Taylor

School of Mathematical and Statistical Sciences
Clemson University, Clemson, SC, USA

Max Faykus III

Max Faykus III

Holcombe Department of Electrical and Computer Engineering
Clemson University, Clemson, SC, USA

Melissa C. Smith

Holcombe Department of Electrical and Computer Engineering
Clemson University, Clemson, SC, USA

Jon C. Calhoun

Holcombe Department of Electrical and Computer Engineering
Clemson University, Clemson, SC, USA

DOI: 10.61278/itea.46.1.1004

Abstract

Autonomous vehicles rely on on-board perception systems for safe terrain navigation which becomes exceedingly important in rural areas. The aim of this study is to explore the effect compressed training images have on the performance of deep learning segmentation architectures and determine if lossy compression is a practical solution for providing real-time transfer speed for autonomous vehicle perception systems. To test the performance of compression on deep learning we apply ZFP, JPEG, and SZ3 to EfficientViT and UNet and rank test accuracy. As a result, this study found JPEG to achieve the highest compression ratio of 144.49× at JPEG quality level 0; while also achieving the fastest transfer speed of the compressors used on the Nvidia Xavier Edge Device. Furthermore, JPEG achieved the highest mIoU accuracy for both architectures tested in comparison to SZ3 and ZFP. Of the two deep learning architectures tested, EfficientViT outperforms U-Net for all lossy compressors at all levels of compression. EfficientViT achieves a peak mIoU of 95.5% at a JPEG quality level of 70. While U-Net peaks with an mIoU of 90.683% at a JPEG quality of 40.

This study advances autonomous vehicle development in two ways. First, it demonstrates that JPEG compression outperforms specialized scientific compressors (SZ3/ZFP) for off-road RGB perception systems. Second, it validates EfficeintViT’s effectiveness for resource-constrained autonomous navigation. These findings benefit autonomous vehicle engineers implementing perception systems, computer vision researchers working on embedded applications, and industry teams deploying off-road autonomous navigation solutions.

Keywords: Compression, Machine Learning, Computer Vision, Off-Road, Semantic Segmentation

Symbol Key:

– ∩: The intersection of two sets contains all elements that are common to both sets
– ∪: The union of two sets contains all elements that appear in either set (or both)
– C1: Constant to stabilize division when the denominator is small
– C2: Constant to stabilize division when the denominator is small
– µx: Mean value of pixel values for some image x
– µy: Mean value of pixel values for some image y
– σx: Variance value of pixel values for some image x
– σy: Variance value of pixel values for some image y

I. Introduction

The mobile nature of Unmanned Ground Vehicles (UGV) requires restricted physical space and limited energy consumption, especially in off-road environments. UGVs are often deployed on rural terrain making it crucial for them to use many systems for perception such as Light Detection and Ranging (LiDAR), Radio Detection and Ranging (RADAR), and Red, Green, Blue (RGB) cameras [1]. RADAR only detects one object at once, while LiDAR is used for point-cloud distance data [2].

UGVs lack the computational power necessary to perform real-time (near zero seconds) inference using overly-inefficient semantic segmentation architectures [3]. UGVs often have computational power similar to or less than edge devices such as the Jetson Xavier that offer up to 21 Tera Operations Per Second (TOPS). Semantic segmentation architectures calculate classifications for every pixel of an image. These architectures use 10 TOPS or more depending on the image resolutions and amount of feature extraction [44]. In scenarios where semantic segmentation architectures cannot be run on-board, due to physical space limitations, power restrictions, or computational constraints, image data is capable of being transferred to a cloud-based system or external computer for processing.

Latency refers to the time it takes to transfer data from a host to a target [4]. By compressing images from a camera feed, the overall latency of data transferring is reduced. Compression techniques such as JPEG achieve a 1.14× speedup for medium datasets [5], [6]. In rural settings, bandwidth is much lower (100-220ms) when compared to urban settings (70-80ms) [7]. This reduction in bandwidth greatly hinders the ability of off-road UGVs to send and receive data quickly to deep learning architectures for inferencing. The total time required for the UGV to compress and send the data is pipeline-latency.

As the bandwidth decreases, the need for data compression rises. Data compression is performed on-board with lossless and lossy compressors. Lossless compressor algorithms reduce the byte-size of information while maintaining the ability to reconstruct the original data byte for byte [8]. Lossy compressor algorithms greatly reduce the original data size compared to their lossless compressor counterparts; however, the compressed data is unable to be reconstructed into the original image without introducing distortions [9]. Data distortion in lossy compressors can be traded against compression, so an increased compression can result in smaller sizes at the expense of more distortion . Smaller sizes of data reduce pipeline-latency significantly when transferred through a low-bandwidth network . The benefit of reduced transfer speed from lossless and lossy compressed images helps achieve near real-time autonomy on UGVs.

UGVs can utilize semantic segmentation deep learning architectures to classify each pixel of an image to a class, which then produces a map of the image segmented into clusters of classes called a mask [10]. Accuracy refers to the overlap between the masks and the ground-truth; to produce accurate predictions, semantic segmentation architectures need accurate images. Increased distortion caused by lossy compression typically decreases the architecture’s accuracy, due to the loss of information about the original data. At smaller bandwidths, a balance of compact data and low distortion must be met in order to achieve fast, but still accurate, architecture predictions.

This study contributes the following:

  • Segmentation accuracy evaluation of state-of-the-art Vision Transformer (ViT) architecture against Convolutional Neural Networks (CNNs) architectures on varying levels of compressed images of off-road environments.
  • Analysis of the relationship between artifacts caused by lossy compression and UGV pipeline-latency reduction.
  • Analysis of the relationship between Structural Similarity Index Measure (SSIM) and Compression Ratio for JPEG, SZ3, and ZFP for off-road image data.

II. Background and Related Works

Multiple studies [5], [6], [11] try to  using compression [12]. M. H. Faykus [11] focuses solely on JPEG compression applied to edge device testing for autonomous off-road perception systems and excludes compression algorithms such as SZ3 and ZFP, while this study includes these compression algorithms. M. Rahman [5] applies error-bounded lossy video compression for CNN-based pedestrian detection vision architectures but does not study the performance with off-road terrain data, and similarly, [6] focuses on applying lossy video compression to pedestrian identification using the YOLO CNN-based vision model. Our study differs from the previous literature by focusing on applying  ed lossy compression to off-road images taken in low-bandwidth rural environments, in order to train and test semantic ViT and CNN architectures for autonomous off-road perception with a comparison against lossless compressors as a baseline.

A. Compression

In lossless compression, the data after decompression is byte-for-byte the same as the original data compressed. For images, this means that the raw image after decompression is pixel-for-pixel exact (no artifacts) to the original image. Lossless compression is preferred because no distortion is introduced into the data due to the compression.

  1. Huffman Encoding [42] is a method of lossless data compression, utilizing variable-length prefix encoding. It assigns smaller binary representations to values appearing more frequently in the dataset, in an attempt to achieve the lowest possible average binary representation message length.
  2. Finite-State Entropy [43] lossless compression achieves compression ratios close to the theoretical entropy limit, similar to arithmetic coding, but is computationally faster due to using simple operations such as addition and bit shifts rather than multiplication.

The following lossless compressors are considered in this study:

  • Block-Oriented Shuffling Compressor Lempel-Ziv (BLOSCLZ): BLOSCLZ [13] is a compressor based heavily on FastLZ [14]. FastLZ implements the Lempel-ZiV 77 (LZ77) algorithm of lossless data compression [15]. This algorithm encodes future data segments by maximum length copying from a past output buffer. The code word consists of the buffer address.
  • Zstandard (ZSTD): ZSTD [16] is a combination of dictionary matching LZ77 [15] with a large search and entropy-coding stage. It uses Huffman coding [17] and Finite-State Entropy. In this scheme, the set load of buffer and information is contained in the code words.
  • LZ4: LZ4 [18] has two sets of API’s LZ4 and LZ4HC [19]. The LZ4 compression algorithm breaks data down into multiple groups. Each of these groups begins with a one-byte token that is reduced to two 4bit fields. The first field is the number of bytes copied to the output. The second field is the number of bytes to copy from the decoded output buffer. LZ4HC expands upon LZ4 by compressing the data to a smaller size than LZ4 at the cost of longer compression times, hence the name LZ4HC where HC is an abbreviation for “High Compression”.
  • ZLIB: ZLIB [20] compression method uses a variant of LZ77 [15] called deflation. Deflation emits compressed data as different blocks. The deflation compressor has three modes: 1) no compression – this is done when another compression has already been performed on the data and the deflation compressor stores the data. 2) Compression with first LZ77 and then with Huffman coding. 3) Compression with LZ77, then Huffman coding using compressor-generated trees that are stored with the compressed data [20].

Error-bounded Lossy compression algorithms compress data while allowing the user to control the maximum allowable difference between original data and the reconstructed data after compressing and decompression. Despite the loss of data, lossy compressors work to ensure that maximal visual information of the data is retained as it is compressed and decompressed. Lossless compression is ideal, but the large memory size of images yields the data to better performance through lossy compression due to the higher compression ratios. In our case images from a perceptual system like an RGB camera system are only compressed once, thus making lossy compression preferable for use in sending image data from the UGV to a cloud-based processing system.

The following lossy compressors are considered in this study:

  • SZ3: SZ3 [21] is a lossy compressor designed for large High Performance Computing floating point data. SZ3 consists of four main steps, where the error is controlled during its linear-scale quantization step. The error metric used is the absolute difference between the predicted pixel value (after normalization to the range [0, 1]) and the actual normalized pixel value.
    1. Splitting the data to be compressed into fixed-sized chunks which then selects the most suitable prediction function based on this split data so as to predict future values in each chunk.
    2. Linear-scale quantization where a specified error bound is passed to quantify the difference between the predicted value and the actual value in each chunk.
    3. Linear-scale quantization, known as a quantization index, encodes using Huffman Encoding.
    4. Lossless compression to further compress the Huffman encoded data.
  • ZFP: ZFP [22] is a lossy compressor that takes 3D double-precision data and divides it into small fixed sized blocks. Each block has size 4 × 4 × 4, utilizing a user-specified number of bits. To compress these blocks, ZFP undergoes 5 stages. The error metric is the absolute difference between the original normalized pixel value (in the range [0, 1]) and the reconstructed pixel value after decompression.
    1. Changes the values in the block to a common exponent.
    2. Converts floating-point values to fixed-point values.
    3. Uses an orthogonal block transform to map each block value to a transform coefficient, decorrelating the values.
    4. Sorts the transform coefficients by their respective expected magnitudes.
    5. Encodes the transform coefficients one bit plane at a time [23]. There are three operational modes of ZFP: fixed precision (within set absolute error tolerance), fixed accuracy (fixed number of bit planes with variable number of bits), and fixed rate (set number of bits) [24].
  • JPEG: JPEG [25] is a lossy compressed image format standard that reduces size by exploiting spatial redundancies in images. The JPEG algorithm transforms image data into frequency components using the Discrete Cosine Transform (DCT). Error control in JPEG is achieved through quantization of the DCT coefficients. JPEG then quantizes these components and employs entropy coding with either Huffman or Arithmetic coding. JPEG significantly reduces the amount of data required to represent an image but does have a cost of losing some image data at higher levels of compression.

B. Data Transfer

Transmission Control Protocol (TCP) is used as the data transfer protocol. TCP is a core part of the Internet Protocol suite and ensures a reliable connection between devices [26]. TCP splits image data into smaller ”packets” of data, which are sent to a recipient; the recipient is guaranteed to receive the packets and sends back acknowledgment packets to confirm the successful delivery of each packet.

C. Architectures

These architectures were selected based on their demonstrated performance in semantic segmentation tasks. U-Net has shown good generalizability from small   sets [27] while EfficientViT has a good speed-accuracy balance [30].

  • U-Net: The U-Net Architecture [27] is a CNN-based architecture. It uses convolutional layers in tandem with convolutional transpose layers to form a ’U’ shape in the architecture. It is also fundamental to note the use of skip connections which are another major defining element of the U-Net architecture. The U-Net architecture is specifically designed for semantic segmentation tasks and for this study U-Net is pre-trained on the ImageNet [28] dataset with ResNet [29] as the encoder. A visual of the U-Net architecture is provided in Figure 1.
  • EfficientViT: EfficientViT combines CNN and Vision transformer (ViT) architectures [30]. This combination allows for high accuracy and quick inference speed which is necessary for UGVs to traverse terrain by camera data. EfficientViT follows a standard encoder decoder structure for segmentation neural networks where the encoder is pre-trained on ImageNet [28] for classification. The full EfficientViT architecture is shown in Figure 2.

U-Net Architecture Fig. 1: U-Net Architecture [27]

 

Fig. 2: EfficientViT Architecture [30]
Fig. 2: EfficientViT Architecture [30]

D. Datasets

The CAVS Traversability (CaT) dataset [31] was developed specifically for off-road autonomous navigation research. It contains 1,812 color images collected from diverse off-road environments including forests, fields, and trails across different seasons and lighting conditions. Moreover the testing set (30%) was chosen to ensure representativeness of the data by including general and specific terrain challenges. CaT was chosen for this study as it represents the current standard for off-road UGV perception evaluation, providing comprehensive ground-truth segmentation labels for traversable and non-traversable terrain. While other datasets exist for urban autonomous driving, CaT uniquely addresses the specific challenges of unstructured off-road environments.

III. Methods

We tested five lossless and three lossy data compression algorithms using defined test metrics to try to reduce data transfer times. The algorithms are tested on the CaT [31] dataset, and the results are modeled using a series of increasing transfer speeds to illustrate when compression is needed.

After the CaT dataset has undergone compression, EfficientViT and U-Net are trained on the compressed RGB images for each quality level of JPEG compression and each error bound of SZ3 and ZFP, as defined later. Training occurs independently, and accuracy and variation with different compressors and compression levels are measured. Only the RGB images are compressed; the masks (used for validating the accuracy of the model and for model training) are left uncompressed to ensure the model is not training on distorted masks. Distorted training masks degrade the model’s ability to provide accurate and realistic segmentation predictions.

After the CaT dataset has undergone compression, data transfer latency is calculated at varying bandwidth levels for each compressor at their maximum compression level respectively.

Hardware & Software   Details Details
Xavier Make and Model P2972, Jetson AGX Xavier
EfficientViT Training Hardware 1xV100, 24 CPU cores, 40 GB RAM
U-Net Hardware 1xV100, 1 CPU core, 16 GB RAM
cudnn 8.7.0.84
CUDA 11.8.0
PyTorch Version 2.1.2
Libpressio Version 0.99.5
scikit-image Version 0.24.0
Pillow Version 10.4.0
SZ3 Version 3.2.0
ZFP Version 1.01

TABLE I: Hardware and software specifications for Xavier, EfficientViT, and U-Net. The code for this is available at https://github.com/CUFCTL/DL-for-UGVs-using-Compression

A. Compression

For the error-bounded lossy compressors SZ3 and ZFP, the image data is normalized to a floating point number from 0 to 1 to improve the compressors’ performance. This normalization ensures that the error bounds have a consistent meaning relative to the pixel data range. The normalized data is then compressed at seven different error bounds from e−1 to e−7 although for simplicity these bounds are referred to as 1E-1 to 1E-7. JPEG compression ranges from quality level 0 (Q0) to quality level 100 (Q100) by increments of 10 at a time. A quality level of 0 indicates maximum lossy compression while a quality level of 100 indicates minimal lossy compression. For the lossless compressors, a compression level of 9 is solely used from a range of compression levels 0 to 9 where 0 indicates no lossless compression and 9 indicates maximum lossless compression. For SZ3 and ZFP, Libpressio [32] is used for implementing their compression algorithms while the Python Pillow Image library is used for implementing JPEG compression. We designed this experiment for high throughput as all compressed data types of all levels were treated equally and ran on the Clemson Palmetto II Supercomputer Cluster using parallelization.

The following metrics are used to evaluate compression algorithms:

1.) Compression Ratio (CR)

CR determines the efficiency of compression algorithms by comparing the original data size to its compressed size.

CR shows the efficiency of the relative reduction in the data size. The higher the CR value, the better the relative decrease in data achieved.

2) Compression Bandwidth (cBW)

cBW is the speed at which the compressor reduces the size of a fixed amount of input data. As real-time applications are the focus here, the timing does not include the time needed to load from the disk as the data is resident in memory or read from an input buffer of a sensor device.

3) Decompression Bandwidth (dBW)

dBW is the speed at which the compressor decompresses the data. This includes the time for any additional post-processing after decompression necessary to return it to its correct form.

4)

5) SSIM

SSIM (Structural Similarity Index Measure) is used to measure the perceptible distortion of visual media like images or videos [34]. The SSIM requires an original image and a second image to compare to the original. In this case the original image is the original RGB terrain image and the second image is the lossy decompressed version of the original. The SSIM of two images x and y is calculated by the following equation:

Here, µx and µy are the mean pixel values of x and y respectively, σx2 and σy2 are the variances of x and y respectively, σxy is the covariance of x and y, and C1, C2 are constants to stabilize division with small denominators. To calculate this value in practice, the Scikit-Image Python [33] library’s structural similarity function was used with the input image data and the decompressed image data with a data range of 255.

For our studies,  SSIM is used for off-road terrain segmentation tasks which requires preserving the structural integrity of the off-road image data. To elaborate, parts of off-road image data like the sky may be greatly distorted while still bringing little effect to segmentation accuracy since the sky is much more distinguishable than something like a fallen tree.

To determine when compression reduces the latency from the camera feed to the processing unit, we measure the entire compression/decompression time along with the data transfer time. To benefit from image compression, the time to send compressed images, Tcomp send, must be less than the time to send uncompressed images, Tsend. Tsend is defined as the data size over transfer bandwidth (BW).

Tcomp send is defined as the sum of the time to compress and send the reduced size data, Comp_Size

B. Architectures

Semantic segmentation architectures such as EfficientViT and U-Net, accurately predict drivable terrain on uncompressed images [35] from the CaT Dataset [31], as illustrated in Figure 9. To determine if EfficentViT and U-Net maintain high accuracy on the compressed data, they are tested on lossy compressors SZ3, ZFP, and JPEG across multiple error levels.

  • U-NET: The U-Net model is trained over the course of 120 epochs with a constant learning rate of 0.001. The Adam optimizer [36] is used with Categorical-Cross-Entropy as the loss function. Data augmentation is kept simple with color jitter and random erasing [37]. This configuration, along with any other hyperparameters, is made consistent across training on all levels of compression for all three compressors SZ3, ZFP, and JPEG. This is also the same configuration used for training on the original uncompressed image data [35].
  • EfficientViT: EfficientViT training starts with 20 warmup epochs where the learning rate increases from 0.0 to a learning rate of 0.001. Throughout the rest of the training, the learning rate changes based on a cosine learning rate scheduler [38]. Random flipping, hue changing, random cropping, and random erasing of image data are used for pre-processing [37]. The pre-processing, learning rate system, number of epochs, and all other hyperparameters are uniform across all levels of errors of all three compressors SZ3, ZFP, and JPEG. This uniform configuration is also used for training on the uncompressed data so as to keep the model results consistent.
  • Evaluation: The main metric for accuracy in semantic segmentation is Intersection Over Union (IoU), shown in Equation 9. The IoU is defined as the True Positive divided by the sum of the True Positive, False Positive, and False Negative. The reason the IoU is used instead of a raw accuracy metric is so that the model is assessed not only on how well it predicts the correct distribution of classes, but also how well the model predicts the correct locations of the classes. The mean intersection of union (mIoU) in Equation 10 is an average of all the individual IoU scores.

When training EfficientViT and U-Net on CaT, a data split of 70% / 30% for Train and Test is used respectively, as provided in the dataset by default. The data contains 1812 images whereas the testing set is chosen to ensure variability, including general and specific terrain challenges [31]. This is uniform among all error levels for all compressors.


Fig. 3: IoU Equation [39]

C. Data-to-pipeline Transfer

A cloud-based system is a viable solution when an off-road system cannot efficiently process segmentation. However, this requires transferring data from off-road systems to cloud-based systems. Depending on the bandwidth between the off-road and cloud-based system, this communication time may vary. Because the data transfer is on the critical path of the processing pipeline (Figure 4), low bandwidth results in the data transfer becoming the performance bottleneck. The data are compressed to make the transfer more efficient for lower bandwidths, but at the addition of time to compress images.

Although applying data compression requires time for compression and decompression before and after transmitting the data to the cloud-based system, the overall effective bandwidth is lower than transmitting uncompressed images by sending the same information in fewer bytes.

Fig. 4: Data Transfer Pipeline

TCP has an average packet size ranging from 20 bytes to 60 bytes [26], which is a significant amount of data for a heavily lossy compressed image. The distortion caused from data loss during lossy compression compounded with potentially significant image data lost from lost packets could result in segmentation malfunctions and erroneous model decisions.

The HTTP POST requests were made via the Python requests.post command. Each image is transferred using TCP to a Flask-hosted Python server through the Xavier device, which is connected to the network via Ethernet; having an upload speed of approximately 303.93 Mbs. The server runs on a laptop, with a 13th Gen Intel(R) Core(TM) i7-13700HX CPU, Intel(R) WiFi 6E AX211 160MHz WiFi adapter, and 32.0GB of DDR5 4800MT/s on-board RAM, and is hosted on the same network, with a download speed of approximately 458.33 Mbs. Each HTTP POST request to the server has binary information for 1, 15, 30, 45, and 60 images respectively. The uncompressed image processing times are calculated with a compression time of 0. The SZ3 and ZFP lossy compressors are also grouped by error bounds, ranging from 1E-1 (highest compression bound) to 1E-7 (lowest compression bound). Similarly, the JPEG compressor is organized with quality levels, from Q0 to Q100.

 

IV. Results

Next, we discuss the data transfer results and evaluation metrics, with the total transfer time and mIoU modeled. We evaluate the metrics to determine which of the lossy or lossless data compression algorithms tested is most suitable to send real-time off-road image data between an UGV and cloud-system for semantic segmentation inference while maintaining accuracy.

A. Lossless Data Transfer

Assuming a set bandwidth for the transfer speed from the UGV to the cloud-based system, the reduction in data is modeled in Figure 7 and shows improvement to the system’s overall pipeline-latency.

When bandwidth is low, there is a need for compression. For example, the time to transfer the CaT dataset was reduced from 379.70 seconds to 53.30 seconds with ZLIB compression, the fastest transfer speed of the lossless compressors tested. As the bandwidth is increased, the need for compression decreases. As the bandwidth is increased to 1000Mb/s, the quickest transfer time was for the uncompressed data. This is caused by the time it takes to compress and decompress the data, which becomes the bottleneck in the transferring process.

These results show that when UGVs are deployed in low bandwidth environments, data compression makes cloud-based processing a viable option for reduced pipeline-latency. When UGVs are in urban areas, the need for the data compression is lessened by the improved bandwidth.

Fig. 5: SZ3 and ZFP compression over error bounds

 

Fig. 6: JPEG compression over error bounds

 

Fig. 7: Time in seconds in relation to Bandwidths and compression ratio of lossless and lossy compressors to transfer the CaT training data

B. Lossy Data Transfer

Although lossless compression shows improved transfer speeds at lower bandwidths, the inability of lossless compressors to discard less vital information is a downside of their use leaving improvement to reach real-time pipeline-latency. Figures 6 and 7 illustrate the impact of increasing error bounds on image quality for three lossy compression algorithms. Lossy compressors achieve higher compression ratios with faster transfer speeds, Figure 7, compared to lossless compression. When bandwidth is below 1000Mb/s, a bottleneck occurs by the generated data size being larger than the system’s transfer capabilities. When bandwidth is above 1000Mb/s there is a bottleneck caused by added compression time to send smaller data through lower bandwidths. Sending data utilizing lossy compressors at high bandwidths takes longer than sending uncompressed image data. Comparing the three lossy compressors, JPEG comfortably outperforms ZFP and SZ3 when all are set at their highest compression level or bound tested. Comparing ZFP and SZ3, SZ3 is able to achieve roughly 53x the compression ratio of ZFP at the same error bound of 1E-1, Figure 8. This large difference between compression ratios is most apparent at a bandwidth of 5Mb/s where ZFP takes approximately 3 times as long to send the CaT images than SZ3. Additionally, comparing SZ3 to the lossless compressors in Figure 7, SZ3 achieves a much higher compression ratio than any of the lossless compressors at the cost of a high compression time. This is due to a configurable internal lossless component inside SZ3. In this case, it is optimizing for compression ratio so that is also part of the reason why it achieves a much higher compression ratio than any of the other lossless compressors. As for ZFP in comparison to the other compressors, it appears to perform rather poorly for the data it compressed on for this study, taking the longest to transfer at the speed of 100Mb/s. This may be due to ZFP not being well suited for compressing this kind of data and might perform better on very large floating point data When looking at the SSIM vs Compression Ratio Graph 8 JPEG lossy compression provides a higher SSIM score at similar compression ratios for all quality levels, compared to the SZ3 and ZFP lossy compressors when applied to the CaT Dataset.

Fig. 8: Lossy compressors SSIM vs Compression Ratio Graph

Tables II, III, IV show the average performance of the SZ3, ZFP, and JPEG lossy compressors. Each cell entry corresponds to the pipeline-latency: sum of the average compression time, the binary conversion time, and the time until server response for the information of all images in the HTTP POST request.

    Number of Images per POST
Error Bound 1 15 30 45 60
1E-1 0.0831 0.9060 1.6948 2.8403 3.3126
1E-2 0.0935 0.9375 1.8179 2.8803 3.6589
1E-3 0.0927 1.0544 2.0068 3.1536 3.9589
1E-4 0.0987 1.1036 2.2203 3.3758 4.3075
1E-5 0.1080 1.2099 2.3917 3.6528 4.5420
1E-6 0.1107 1.3283 2.6460 4.1315 5.2813
1E-7 0.1158 1.4604 2.8972 4.5281 5.6620
Original 0.1552 2.5250 5.1065 6.7330 11.5417

TABLE II: SZ3 compression and pipeline-latency (seconds).

 

    Number of Images per POST
Error Bound 1 15 30 45 60
1E-1 0.0683 0.8329 1.5947 2.2406 3.1137
1E-2 0.0747 0.8508 1.7024 2.5850 3.5521
1E-3 0.0826 1.1571 2.2353 3.4467 4.7076
1E-4 0.0873 1.2860 2.5480 3.7240 5.3518
1E-5 0.0962 1.5247 2.7777 4.6903 5.8239
1E-6 0.1054 1.6874 3.0794 6.4083 5.9869
1E-7 0.1172 1.8434 3.6109 7.2395 7.2397
Original 0.1552 2.5250 5.1065 6.7330 11.5417

TABLE III: ZFP compression and pipeline-latency (seconds).

 

  Number of Images per POST
Error Bound 1 15 30 45 60
Q0 0.0284 0.0949 0.1472 0.2164 0.2632
Q10 0.0217 0.0907 0.1358 0.2317 0.3256
Q20 0.0235 0.1310 0.1604 0.2427 0.4002
Q30 0.0259 0.1189 0.1593 0.3362 0.3892
Q40 0.0229 0.1098 0.2021 0.3206 0.4851
Q50 0.0247 0.1124 0.2248 0.3338 0.6007
Q60 0.0268 0.1121 0.2140 0.2475 0.3907
Q70 0.0255 0.1379 0.2137 0.3342 0.6539
Q80 0.0284 0.2211 0.2657 0.3583 0.5012
Q90 0.0296 0.1883 0.3968 0.5238 0.8507
Q100 0.0416. 0.3976 0.8189 1.2218 1.9571
Original 0.1552 2.5250 5.1065 6.7330 11.5417

TABLE IV: JPEG compression and pipeline-latency (seconds).

C. Model Performance

When evaluating the performance of U-Net and EfficientViT we compare their accuracy on the compressed data (Figures 9 and 10) against the uncompressed data (Figures 11 and 12). Lossless compression algorithms are designed to reduce data size without sacrificing any original information, ensuring that decompressed data is identical to the original. Therefore, when architectures like U-Net and EfficientViT are evaluated on losslessly compressed data, the accuracy remains unchanged when compared to the uncompressed data.

  1. 1) U-Net Performance: U-Net shows strong results training on the lossy compressed data. U-Net mIoU accuracy on ZFP remained in the 90% range (±1%) (Fig. 9) which is a 3% drop (±1% ) from uncompressed/lossless accuracy (Fig. 12). As for SZ3, the model’s mIoU experiences a greater shift over the error bounds although, like ZFP, preserve’s its mIoU when compared to uncompressed mIoU. Only at the 1E-1 error bound a 2.3% drop in mIoU is seen; the mIoU is still accurate enough to be applicable in practical scenarios. JPEG compression experiences similar positive results, maintaining stable mIoU scores over the various quality levels with each mIoU remaining at approximately 89% or higher (Fig. 9). Interestingly, the architectures’ mIoU scores appear to improve with some lossy compression as seen with quality level Q40 in the JPEG graph (Fig 9) or with error bound 1E-5 for both ZFP and SZ3 (Fig. 9). This phenomenon is possibly attributed to the lossy compression serving as another form of data augmentation, therefore strengthening the model’s capability to generalize across various images; a novel finding from this research. These findings are similarly reflected in the comparison between U-Net mIoU and compression ratio in Figure 9.
  2. 2) EfficientViT Performance: The results show a negligible decrease in accuracy after running different levels of compression, showing that even for the most lossy compressed data (1E-1 with SZ3 and ZFP) the accuracy drops to 94% as seen in Figure 10 which is just 1% lower than uncompressed/lossless accuracy, Figure 11. ZFP shows slight improvement over SZ3 in maintaining a steady accuracy on higher compressed data.

Similar results are seen when using the JPEG compressor, where in Figure 10 accuracy is held above 95% until it drops lower to 94% with a quality level of Q0.

EfficientViT performs well with data from all three compressors JPEG, SZ3, and ZFP, Figure 10.

Fig. 9: U-Net MIoU vs Compression Ratio Graph

 

Fig. 10: EfficientViT MIoU vs Compression Ratio Graph

 

Fig. 11: Uncompressed/Lossless EfficientViT Accuracy

 

Fig. 12: Uncompressed/Lossless U-Net Accuracy

V. Conclusion

Data compression allows the processing of large amounts of image data on larger systems and faster transfer speeds in low bandwidth environments. The use of lossless and lossy data compression algorithms are viable means for enabling the use of a cloud-based system to UGV system. The use of lossless compressors does not allow for fast enough transfer speed to reach real-time pipeline-latency, but lossy compressors do lower latency without compromise of mIoU performance (1% variation) on U-Net and EfficientViT. EfficientViT outperforms U-Net on compressed image data allowing for higher levels of compression without a large diminishing accuracy. JPEG, among the lossy compressors tested, provides the best results in terms of transfer speed on an edge device, but degrades slightly more than ZFP and slightly less than SZ3. JPEG has the highest compression ratio out of the three tested lossy compressors and also possessed a higher SSIM value than the two lossy error-bounded compressors at every comparable compression ratio as demonstrated in Figure 8. ZFP degrades the least in performance on both U-Net and EfficientViT and has a medium compression speed between SZ3 and JPEG. However, ZFP achieves the lowest compression ratio out of all the lossy and lossless compressors at its largest error bound. In conclusion, carefully designed and evaluated data compression maintains UGV performance in lower bandwidth environments, while still maintaining a strong semantic segmentation performance for advanced Convolutional architectures like U-Net and advanced Vision Transformer architectures like EfficientViT.

VI. Discussion

A. Limitations and Assumptions

Our study operated under environmental and temporal constraints that affect the generalizability of the results. The testing was done in clear weather conditions and system performance may degrade in heavy rain, snow, or fog. Training additional architectures for other sensors may be needed to circumnavigate this. The data spanned daylight hours implying night operation would require additional lighting systems for navigation to rely on the RGB perception sensors. The results of the transfer speeds assume a continuous connection at set bandwidths. In real-world scenarios bandwidth will fluctuate and even go down at times, this study maps a range of bandwidths to account for low bandwidth scenarios.

B. Significance

The findings from this study have significant implications for autonomous vehicle development. Our results demonstrate that JPEG compression outperforms newer scientific compressors SZ3 and ZFP for off-road RGB perception tasks. This unexpected finding challenges assumptions about specialized compression techniques and suggests opportunities for simplified deployment architectures in autonomous systems. Additionally, our validation of EfficientViT’s performance on resource-constrained hardware advances practical autonomous navigation capabilities. This work bridges the gap between theoretical architectures and deployable solutions. Our analysis highlights the trade-offs between compression, transfer speeds and deep learning in real-world applications.

C. Implications

These results are particularly relevant for three groups: autonomous vehicle engineers utilizing our compression findings to optimize their perception pipeline, reducing latency thus improving real time decision making; computer vision researchers gain new insights into compression impacts on embedded vision system, thus enabling them to better refine their architectures for better performance; and industry teams accelerating off-road perception systems development by integrating our findings into their workflows.

The benefit of JPEG compression and EfficientViT provide a path to more efficient and realistic autonomous systems. By establishing a baseline for understanding the relationship between lossy compression, transfer speed, and deep learning; we provide a foundation for further research and optimization. Additionally, we offer practical insights to autonomous vehicle engineers on the performance of state-of-the-art error-bounded compressors, equipping them with the tools to make informed decisions about balancing compression efficiency and perception accuracy in real-world deployments.

D. Future Work

UGVs typically deploy a wide variety of perceptual systems beyond that of just colored imagery, such as LiDAR and RADAR. As LiDAR and RADAR data are typically large, this shows a promising avenue as compression trained architectures containing a large amount of data are seen to have higher accuracy. However, this would also bring challenges as data like LiDAR may introduce 3D data that is less optimal for error bounded compressors like SZ3 or ZFP while at the same time the floating point nature of LiDAR data may greatly improve SZ3 and ZFP’s compression performance. Additionally, the expansion to new compression algorithms could provide better compression ratios or faster compression and decompression times. For example, the utilization of video compression may be able to achieve a superior compression ratio on images that resemble the frames of a video feed, rather than compressing each image individually. Another example is with a modification of the SZ3 compressor, SZ3 is interchangeable in such a way that the wavelet transform portion is capable of being swapped for the discrete cosine transform (similar to JPEG) to help increase compression speeds. Furthermore, the use of a different transfer protocol could provide a potential speed boost; for example, UDP or the use of web-sockets could help to remove the time taken as when using TCP each packet requires sending an acknowledgment back to the sender, thus enabling even faster transfer speeds. Additionally, the use of lossy compression as preprocessing to boost accuracy may be studied further as in this study, the accuracy at certain compression levels is improved when compared against all compression levels.

VII. Acknowledgments

Clemson University is acknowledged for their generous allotment of compute time on the Palmetto Cluster. This work is sponsored by the United States Defense Advanced Research Projects Agency under agreement HR00112320008. The content of the information does not necessarily reflect the position or the policy of the US Government, and no official endorsement should be inferred. This material is based upon work supported by the National Science Foundation under Grant No. SHF-1943114 and SHF-2312616. We acknowledge Adam Pickeral for his mentorship and contributions to this paper.

References

 

  1. Beycimen, D. Ignatyev, and A. Zolotas, “A comprehensive survey of unmanned ground vehicle terrain traversability for unstructured environments and sensor technology insights,” Engineering Science and Technology, an International Journal, vol. 47, p. 101457, 2023.
  2. Pavitha, K. B. Rekha, and S. Safinaz, “Perception system in autonomous vehicle: A study on contemporary and forthcoming technologies for object detection in autonomous vehicles,” in 2021 International Conference on Forensics, Analytics, Big Data, Security (FABS), vol. 1, pp. 1–6, 2021.
  3. Sandu and I. Susnea, “Edge computing for autonomous vehicles a scoping review,” in 2021 20th RoEduNet Conference: Networking in Education and Research (RoEduNet), pp. 1–5, 2021.
  4. -D. Decotignie, “Ethernet-based real-time and industrial communications,” Proceedings of the IEEE, vol. 93, no. 6, pp. 1102–1117, 2005.
  5. Rahman, M. Islam, J. Calhoun, and M. Chowdhury, “Real-time pedestrian detection approach with an efficient data communication bandwidth strategy,” Transportation research record, vol. 2673, no. 6, pp. 129–139, 2019.
  6. Rahman, M. Islam, C. Holt, J. Calhoun, and M. Chowdhury, “Dynamic error-bounded lossy compression to reduce the bandwidth requirement for real-time vision-based pedestrian safety applications,” Journal of Real-Time Image Processing, vol. 19, no. 1, pp. 117–131, 2022.
  7. Zhang, D. J. Love, J. V. Krogmeier, C. R. Anderson, R. W. Heath, and D. R. Buckmaster, “Challenges and opportunities of future rural wireless communications,” IEEE Communications Magazine, vol. 59, pp. 16–22, December 2021.
  8. Bhat, “Evaluation of lossless compression technique,” 04 2015.
  9. Di, Sheng, et al. “A Survey on Error-Bounded Lossy Compression for Scientific Datasets.” arXiv preprint arXiv:2404.02840 (2024).
  10. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, 2015.
  11. H. Faykus, B. Selee, J. C. Calhoun, and M. C. Smith, “Lossy compression to reduce latency of local image transfer for autonomous off-road perception systems,” in 2022 IEEE International Conference on Big Data (Big Data), pp. 3146–3152, IEEE, 2022.
  12. Marpe, T. Wiegand, and G. J. Sullivan, “The h. 264/mpeg4 advanced video coding standard and its applications,” IEEE communications magazine, vol. 44, no. 8, pp. 134–143, 2006.
  13. Alted, “Blosc, an extremely fast, multi-threaded, meta-compressor library,” 2017.
  14. Hidayat, “Fastlz, free, open-source, portable real-time compression library,” URL http://www. fastlz. org, 2019.
  15. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Transactions on information theory, vol. 23, no. 3, pp. 337–343, 1977.
  16. Collet and M. Kucherawy, “Zstandard compression and the application/zstd media type,” tech. rep., 2018.
  17. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.
  18. Collet, “Lz4 lossless compression algorithm.” https://lz4.org/
  19. Zhang and B. Bockelman, “Exploring compression techniques for root io,” in Journal of Physics: Conference Series, vol. 898, p. 072043, IOP Publishing, 2017.
  20. Deutsch and J.-L. Gailly, “Zlib compressed data format specification version 3.3,” tech. rep., 1996.
  21. Liang, K. Zhao, S. Di, S. Li, R. Underwood, A. M. Gok, J. Tian, J. Deng, J. Calhoun, D. Tao, and F. Cappello, “Sz3: A modular framework for composing prediction-based error-bounded lossy compressors,” IEEE Transactions on Big Data, 2021. Available online.
  22. Lindstrom, J. Hittinger, J. Diffenderfer, A. Fox, D. Osei-Kuffuor, and J. Banks, “ZFP: A compressed array representation for numerical computations,” The International Journal of High Performance Computing Applications, 2024.
  23. Lindstrom, “Fixed-rate compressed floating-point arrays,” IEEE transactions on visualization and computer graphics, vol. 20, no. 12, pp. 2674–2683, 2014.
  24. Grosset, C. M. Biwer, J. Pulido, A. T. Mohan, A. Biswas, J. Patchett, T. L. Turton, D. H. Rogers, D. Livescu, and J. Ahrens, “Foresight: Analysis that matters for data reduction,” in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15, 2020.
  25. Wallace, “The jpeg still picture compression standard,” IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
  26. G. Cerf and R. E. Kahn, “A protocol for packet network interconnection,” IEEE Transactions on Communications Technology, vol. 5, pp. 627–641, May 1974.
  27. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Lecture Notes in Computer Science, vol. 9351, pp. 234–241, 2015, doi: https://doi.org/10.1007/978-3-319-24574-4_28.
  28. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and L. Fei-Fei, “Imagenet large scale visual recognition challenge,” CoRR, vol. abs/1409.0575, 2014.
  29. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  30. Cai, J. Li, M. Hu, C. Gan, and S. Han, “EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction,” arXiv e-prints, p. arXiv:2205.14756, May 2022.
  31. Sharma, L. Dabbiru, T. Hannis, G. Mason, D. W. Carruth, M. Doude, C. Goodin, C. Hudson, S. Ozier, J. E. Ball, and B. Tang, “Cat: Cavs traversability dataset for off-road autonomous driving,” IEEE Access, vol. 10, pp. 24759–24768, 2022.
  32. Underwood, V. Malvoso, J. C. Calhoun, S. Di, and F. Cappello, “Productive and performant generic lossy data compression with libpressio,” in 2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-7), pp. 1–10, IEEE, 2021.
  33. van der Walt, J. L. Schonberger, J. Nunez-Iglesias, F. Boulogne, J. D.¨ Warner, N. Yager, E. Gouillart, T. Yu, and the scikit-image contributors, “scikit-image: image processing in Python,” PeerJ, vol. 2, p. e453, 6 2014.
  34. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  35. S. Pickeral, “Using efficient vision transformers to improve perception systems in autonomous off-road vehicles,” Master’s thesis, Clemson University, 2024.
  36. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2017.
  37. Yang, W. Xiao, M. Zhang, S. Guo, J. Zhao, and F. Shen, “Image data augmentation for deep learning: A survey,” 2023.
  38. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” in International Conference on Learning Representations, 2017.
  39. Rosebrock, “Intersection over union (iou) for object detection pyimagesearch,” PyImageSearch, 2016.
  40. Postel, “User Datagram Protocol.” Request for Comments 768, Aug.1980.
  41. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” in IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 337-343, May 1977, doi: 10.1109/TIT.1977.1055714.
  42. Huffman, D. A. (1952). A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9), 1098–1101
  43. Duda, J. (2014). Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with compression rate of arithmetic coding. arXiv preprint arXiv:1311.2540v2
  44. Mehta, Dushyant, et al. “Simple and efficient architectures for semantic segmentation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

Author Biographies

Ethan Marquez is an undergraduate student at Clemson University who is majoring in BS Computer Science with a minor in Physics and A.I.. His current research is with Dr. Mellissa Smith’s Machine Learning and Big Data Lab specializing in Deep Learning application. He is an author on a journal paper on Efficient Deep learning vision architectures for off road perception accepted to Scientific Research Publishing.

Adam Niemczura is an undergraduate at Clemson University. Majoring in BS Computer Science and a BS in Mathematics, his current area of research is in computer vision for autonomous off-road vehicle navigation. He is currently a member of Dr. Melissa Smith’s Machine Learning and Big Data Lab. He contributed to a journal paper on Efficient Deep learning vision architectures for off road perception which was accepted to Scientific Research Publishing. He has also presented this work at the Clemson Creative Inquiry Forum.

Cooper Taylor is an undergraduate at Clemson University. He is currently pursuing a BS in Mathematics and Computer Science. His research interests are in abstract math and machine learning. He is a member of the Clemson Watt AI Creative Inquiry and served as a research assistant for the group during the summer of 2024. During the summer his work entailed using machine learning to combat political misinformation on social media platforms.

Max Faykus III is a Ph.D student in Computer Engineering at Clemson University. His research interests include machine learning on varying levels of data compression. Max has experience working with data fusion using LiDAR and camera data for pixel-wise classification, Lossy and lossless OCT compression, and exploring how lossy compression distortion affects ML architectures. As a graduate student, he has taken several courses related to HPC including fault tolerance, parallel systems, GPUs, parallel computing and data compression. embedded computing, computer vision, parallel architecture, tracking systems. He also has taught at an undergraduate level related to topics including big data and machine learning.

Melissa C. Smith is a professor of Electrical and Computer Engineering at Clemson University with over 25 years of experience developing and implementing scientific workloads and machine learning applications across multiple domains, including 12 years as a research associate at ORNL. Her current research focuses on performance analysis and optimization with emerging heterogeneous computing architectures (GPGPU- and FPGA-based systems) for various application domains, including machine learning, high-performance or real-time embedded applications, and image processing. Her group collaborates with researchers in other fields to develop new approaches to the application/architecture interface, providing interdisciplinary solutions that enable new scientific advancements and capabilities.

Jon C. Calhoun directs the Future Technologies in Heterogeneous and Parallel Computing (FTHPC) Laboratory. The FTHPC Laboratory has broad interest in the advancement of high-performance computing systems and development of large-scale scientific software. Currently, the group’s primary efforts are dedicated to the development and integration of lossy and lossless data compression algorithms inside scientific workflows to remove key data generation, movement, and storage bottlenecks. His research group has been nominated and won the best paper and poster awards at prominent venues such as IEEE Cluster and the ACM Student Research Competition.

ITEA_Logo2021
  • Join us on LinkedIn to stay updated with the latest industry insights, valuable content, and professional networking!