MARCH 2025 I Volume 46, Issue 1

Graph illustrating the performance metrics of an AI model in a test and evaluation conte

An AI model performance benchmarking harness for reproducible performance evaluation

Jakob Adams

Jakob Adams

Principle Software Engineer
Computer Science & Data Analyst

Venkat R. Dasari

Dr. Venkat R. Dasari

Generalized Neural Network
DEVCOM Army Research Laboratory

Manuel Vindiola

Dr. Manuel M. Vindiola

Cognitive Scientist at DEVCOM
Army Research Laboratory

 

DOI: 10.61278/itea.46.1.1005

Abstract

AI models are complex and are often designed to solve domain-specific tasks on resource-constrained platforms. The resource constraints on edge devices, such as available memory, disk space, and processing power, require optimization before deployment. Optimizations, such as quantization and pruning, can effectively reduce model size or latency, but often at the cost of accuracy. A well-designed, adaptive, and scalable AI benchmark harness is needed to test the models before and after optimizations are applied to establish if the models maintain acceptable performance. In this paper, we design and develop a comprehensive and generalized benchmark harness and test its functionality against optimized artificial intelligence (AI) models, measuring several performance metrics.

Keywords: Optimization, Model Performance, Compilers, Benchmarking, Inference Accelerators

1. Introduction

Artificial intelligence (AI) has made significant advances in the last few years, with deep learning models finding use in computer vision, navigation, biology, medicine, language understanding, and playing games. These AI models are complex, some are domain specific, and often their performance is limited by computational resources. Knowing how well these models perform for a given task is an important metric before their deployment on target platforms. Most of the AI model performance evaluation tests today only look at small or specific areas of model performance metrics, missing out on the wide range of measures needed for their full performance assessment.

An AI benchmark tool aims to fix this by providing a single interface to evaluate diverse AI models on many tasks, data sets, and methods to evaluate their performance against hardware and computational constraints. By enforcing a common inferencing methodology, this tool can compare different AI designs, how they learn, and how they work in real-world settings. In this paper, we propose a comprehensive and generalized AI benchmark harness that is compatible with convolutional neural networks (CNNs), as well as large language models (LLMs).

2. Related Work

The development of a comprehensive benchmark harness is very important to evaluate the performance of a model before its deployment onto its target platform. Designing a comprehensive benchmarking tool that can scale to benchmark a variety of models that include both CNNs and LLMs is challenging due to architectural differences. This literature review examines recent efforts towards the development of AI model benchmark tools.

The AIIA DNN Benchmark Overview1 provides a comprehensive benchmark to evaluate the performance of deep neural networks (DNN) in various hardware and software configurations. It focuses on measuring inference speed, accuracy, and resource utilization for different DNN models. The benchmark suite enables comparisons between frameworks like TensorFlow, PyTorch, and ONNX. This tool is particularly useful for developers and researchers in optimizing AI workloads. By standardizing evaluation metrics, it promotes transparency in performance analysis. Deep Learning Inference Framework Benchmark by Pochelu2 compares popular deep learning frameworks such as TensorFlow Lite, Core ML, and NCNN. The study evaluates their performance on mobile devices, emphasizing latency, memory usage, and model accuracy. Highlights the trade-offs between speed and resource consumption for different platforms. The results suggest that TensorFlow Lite often outperforms competitors in terms of speed, but may consume more memory. This benchmark helps practitioners choose the most suitable framework for their use case. Benchmarking Simulation-Based Inference by Lueckmann et al.3 introduces a systematic approach to evaluate simulation-based inference methods like likelihood-free inference. The study proposes metrics such as computational efficiency, accuracy, and robustness to noise. It compares techniques like approximate Bayesian computation (ABC) and neural posterior estimation. Results demonstrate that neural methods often outperform traditional ABC in high-dimensional settings. This work establishes best practices for benchmarking in simulation-based inference. Nyarko et al.’s4 AI/ML Systems Engineering Workbench Framework presents a modular platform for designing, testing, and deploying AI systems. The framework integrates tools for data preprocessing, model training, and performance evaluation. It supports collaborative development by enabling version control and experiment tracking. Case studies demonstrate its effectiveness in reducing development time and improving model quality. This work bridges the gap between research and production environments. Xie et al.’s5 Performance Evaluation of Deep Learning Frameworks analyzes frameworks like Caffe, TensorFlow, and MXNet on GPU and CPU platforms. The study measures training time, memory usage, and scalability for large datasets. Findings reveal that TensorFlow and PyTorch generally outperform others in multi-GPU settings. However, MXNet shows better single-GPU efficiency. This benchmark provides insights for optimizing framework selection based on hardware constraints. Wu et al.’s6 Machine Learning-Enabled Performance Model proposes a predictive model to estimate the performance of DNN applications on AI accelerators like GPUs and TPUs. The model leverages historical data to forecast execution time, power consumption, and throughput. Experimental results show high accuracy in predictions compared to actual measurements. This approach enables efficient resource allocation and workload scheduling. It is particularly valuable for cloud providers optimizing AI workloads. Jha et al.’s7 Benchmarking Analysis of CNN Architectures evaluates popular CNN models like VGG, ResNet, and MobileNet on diverse datasets and hardware platforms. The study focuses on metrics such as inference time, memory usage, and energy consumption. The results indicate that MobileNet achieves the best trade-off between accuracy and efficiency for mobile devices. However, ResNet excels in scenarios that require high accuracy at the cost of computational overhead. This analysis helps practitioners in selecting optimal architectures based on deployment constraints.

3. Need for uniform AI model performance metrics

A large research study consisting of diverse AI models targeted for different target platforms constraints and performing a standardized set of benchmarking tests will allow us to compare their performance objectively. Having a set of standardized metrics will also help researchers interpret the impact of optimizations to the model, such as pruning and quantization. This will also provide a deeper understanding of how the resources are utilized by the models during both their training and inference, a key metric that allows researchers to apply appropriate optimization approaches for the models to meet their resource constraints, yet maintain their performance thresholds. Moreover, having standardized tests for benchmarking allows researchers to easily adapt their research findings to different target platforms or constraints. This adaptability enables them to develop solutions that are highly adaptable and flexible, allowing organizations to deploy these models across various environments and use cases.

We define inference session of an AI model as the activity of preprocessing input data into a model-specific format, generating predictions via an inference engine, and then scoring the predictions against either a set of labels provided with the input data or some additional metric. Input data, such as images, audio files, or structured tabular data, are preprocessed from their raw data format, such as PNG, MP3, or CSV, into a format defined by the model for ingestion. This may include reducing the height and width of an image, removing columns from structured tablular data, or clipping an audio file to a shorter length of time. Once preprocessed, the input data can be accepted by the input layer of the AI. The AI then processes the data through the layers of its network and produces an output, or predictions. Depending on the type of AI, the predictions may be a single decimal value, such as a regression model, a list of probabilities, such as classifying what object is contained within an image, or a list of pixel coordinates, such as a bounding box to detect the location of an object within an image. Once the predictions are obtained, they can be scored against any labels provided with the image data.

During the prediction generation phase of the inference session, an inference engine is used to facilitate the computation of the AI. Base AI frameworks, such as PyTorch and TensorFlow, provide an inference engine to compute predictions. These inference engines may provide support for specialized hardware, such as an Nvidia Graphics Processing Unit (GPU), but may not provide optimal performance on that hardware. To facilitate increased performance over the base inference engines, specialized inference engines have been developed to accelerate performance across a variety of key metrics, such as increasing the number of predictions that can be generated in a given time frame or reducing the size of the AI in memory. These specialized inference engines, such as Apache TVM and Nvidia’s TensorRT, compile the model from their base format into a hardware-optimized format. Additional optimizations, such as layer-fusing, the act of combining two layers into one to reduce overall computation, may also be performed during the compilation phase. While TensorRT is designed specifically by Nvidia to accelerate AIs on their GPUs, Apache TVM is a more generalized tool that provides specialized tuning against a variety of CPUs and GPUs.

4. Technical Approach

We develop a Performance Benchmark Harness (PBH), utilizing a unified inference loop that allows consistent comparison of model performance across model frameworks and architectures. Figure 1 illustrates the key components of our PBH architecture. It is compatible with a wide variety of AI models developed using various base frameworks, such as PyTorch and TensorFlow, converts or compiles the base model to the format required by the inference engine, and then performs inference. Finally, we generate a performance profile with details of the inference session. Algorithm 1 details the overall process.

Our PBH provides a unified inference loop that can be easily extended with various inference acceleration engines for comparison of how different tools accelerate optimized models. We also track multiple performance metrics, such as accuracy, throughput, and memory usage. These metrics are documented in a Performance Profile so that optimized model performance can be compared regardless of the base format and inference tool. We calculate the number of floating point operations (FLOP) of a model’s inference of a single input and compare to the theoretical number of floating point operations per second (FLOPS) of a given target hardware to provide a baseline of theoretical performance.

Figure 1: AI Performance Benchmark Harness (PBH) Architecture

Figure 1: AI Performance Benchmark Harness (PBH) Architecture

4.1 Input

Input to the PBH includes an AI model and a test dataset. To provide state-of-the-art performance, inference engines compile base models from AI frameworks, often using Onnx as an intermediary format. Based on the specified inference engine, the input model is converted and/or compiled to the format required by the acceleration engine. The dataset is loaded into the appropriate format that is compatible with the chosen inference engine and any required preprocessing is performed.

4.2 Benchmark Harness

4.2.1 Warmup

The primary component of the PBH is the Inference Pipeline. The Inference Pipeline can be executed multiple times, with statistical data, such as average and standard deviation, generated for the performance results. It begins with a series of warmup batches to ensure measured performance is consistent.

4.2.2 Performance Tracker

Once the warmup cycle is complete, the Performance Tracker (PT) is started. The PT tracks and controls metrics required for measuring performance, such as annotating the start and stop time of the inference session. Inference is then performed by iterating over the evaluation dataset. Once all data in the evaluation dataset has been inferenced, the PT is stopped and a Performance Profile is generated.

4.2.3 Inference Session

To measure these performance metrics, we integrate multiple inference acceleration engines. Selected engines can take advantage of specific hardware, such as Intel’s OpenVino, or optimization techniques, such as Nvidia’s Ampere GPUs supporting 2:4 N:M Structured sparsity. Supporting benchmarking abilities against multiple inferencing engines allows a more diverse problem space to be searched, ensuring optimal tool selection for deployment of AI capabilities.

For the actual inference of the model, we separate the prediction generation and scoring methods from the inference loop itself. This allows evaluation data to be converted from one type to another, such as from a PyTorch DataLoader to NumPy, to support the various integrated inference acceleration engines. This also allows us to support multiple scoring techniques for a single model. For example, Faster R-CNN provides predictions for detected objects, bounding boxes for those objects, and confidence scores of the detections.

We determine the performance impact of optimizations applied to the model by generating a Performance Profile for the inference session. Using the dense version of the model and base inference capability of the model’s framework, we can compare performance to determine how optimizations applied to a model can increase performance. Coupled with various inferencing engines, we ensure that diverse deployment environments are included in the benchmarking space.

Where supported, we measure not only the model’s overall performance, but gather performance metrics on each layer of the model. For example, PyTorch’s Forward Hooks are used to gather runtime metrics on a per-layer level. This allows profiling of potential bottlenecks in a model’s performance, allowing for further optimization investigation.

4.3 Performance Profile

We document the model and it’s layers performance into a Performance Profile, hereafter referred to as Profile. The primary output of the Profile are the performance metrics tracked during inference via the Performance Tracker, such as throughput, latency, and memory usage. Because the PBH can execute multiple inference sessions during a given experiment, the Profile not only tracks the individual results for each target performance metric, but also includes the average, standard deviation, and variance of those metrics. The Profile aggregates not only the performance metrics generated by the PBH, but also contains information about the model and its layers, such as model frameworks (PyTorch, TensorFlow, etc.), model family (MobileNet, ResNet, etc.), layer input shape, layer output shape, and layer type (Linear, Conv2d, etc.). We also include system information, including CPU/GPU type and manufacturer, number of cores, clock speed, and system information such as Python version, inference engine version, OS type, and OS version. Also included is any information provided about optimization techniques performed on the model, such as sparsity level or quantization data type.

Algorithm 1 AI Model Benchmark Algorithm

Post-Optimized AI Model, P
Dataset, D
Inference Acceleration Engine, E
PerformanceTracker, PT

if NOT compat(E, P) then
M = compileModel(P)
end if

PT.start()

for batch in D; do
data = batch[data] labels = batch[labels] predictions = predict(P, data,E)
batchScore = score(predictions, labels)
batchScores.append(batchScore)
end for

PT.stop()

perfProf = procRes(PT.results(), batchScores)

return perfProf

5. Experimentation

We demonstrate the effectiveness of the PBH by performing a series of inference experiments across three different models: MobileNet v2, ResNet 50, and the Vision Transformer (ViT). All three models were trained on ImageNet-1k.8 Model weights are the default ImageNet-1k weights from PyTorch.9 For each model, we evaluate the dense version, along with various sparse versions. We utilize Optimal Brain Compression (OBC)10 to perform unstructured and N:M pruning in a post-training setting. For unstructured pruning, we prune to 25% and 75% sparsity. For N:M pruning, we prune to 1:2, 2:4, and 4:8, each resulting in 50% sparsity. Because OBC targets specific layers, primarily linear and convolution, to optimize, Table 1 shows the overall sparsity of each model tested.

To measure key performance metrics, we integrate multiple inference acceleration engines. These include: PyTorch, TorchScript, OnnxRuntime, TVM, and TensorRT. Table 2 details the hardware support for each inference acceleration engine. Models were exported to the Onnx format using the PyTorch Onnx API. Onnxformatted models were then compiled to the respective formats required by TVM and TensorRT; no additional tuning or optimization was specified for either TVM or TensorRT. For TVM and TensorRT, the size of the input batch is required for compiling the model. This is performed inside the inference loop, pausing the tracking of any performance metrics during compilation. Inference occurred on a High-Performance Computing (HPC) platform. Inference sessions were conducted utilizing a single compute node with 8 CPU cores. GPU sessions utilize a single Nvidia GPU. Table 3 details the CPU/GPU specifications.

 

Model Target Sparsity Sparsity Type Actual Sparsity
MobileNet v2 25% Unstructured 25%
MobileNet v2 75% Unstructured 74%
MobileNet v2 50% N:M 48%
ResNet50 25% Unstructured 25%
ResNet50 75% Unstructured 75%
ResNet50 50% N:M 50%
ViT 25% Unstructured 17%
ViT 75% Unstructured 52%
ViT 50% N:M 35%

Table 1: Overall Model Sparsity

Inference Engine CPU Support GPU Support
OnnxRuntime Yes Yes
PyTorch Yes Yes
TensorRT No Yes
TorchScript Yes Yes
TVM Yes Yes

Table 2: Inference Engine Hardware Support

Hardware Cores Type
CPU 8 Intel Xeon Cascade Lake
GPU   Nvidia A100 – 40 GB

Table 3: HPC System Specifications

6. Results

Figures 2-4 show the results of how sparsity affects the throughput, the number of input data points the model can process per second. The x-axis denotes the sparsity of the model from the original dense model through a model with 75% sparsity. The y-axis denotes the number of input data points the model can process in a second. This value is averaged over batch-processing of the entire dataset. a), b), and c) represent three models: MobileNet v2 – figure 2, an small, optimized convolution-based model for image classification, Resnet50 – figure 3, a state of the art convolution-based image classifier, and ViT – figure 4, a Transformer-based image classifier. We also compare three different inference engine setups for each sparse model, PyTorch on CPU, PyTorch on GPU, and TensorRT. Utilizing a GPU offers a 2x increase in throughput performance across all models and sparsity levels. TensorRT does not offer much improvement over PyTorch with GPU, even performing worse for some sparsities of MobileNet v2, despite TensorRT compiling the model to a GPU-optimized format. We also observe that the sparsity does not affect the performance of the model’s throughput, even though the higher sparsity levels contain less over computations over the less sparse and dense model.

Figure 2: MobileNet v2 - Throughput vs Sparsity

Figure 2: MobileNet v2 – Throughput vs Sparsity

Figures 5-7 shows how accuracy degrades as sparsity level increases. MobileNet v2, figure 5, and ResNet 50, figure 6, follow similar patterns. 25% sparsity has little effect on the accuracy, even giving ResNet 50 a slight improvement. For the three levels of N:M pruning, ther larger the N and M values are, the less accuracy loss is observed. Considering more weights, the M value, when determining which weights should be pruned allows weights that contribute more to a given layers output a stronger chance of being retained. For high sparsity levels, ResNet 50 suffers an approximate 10% accuracy loss. MobileNet v2, an already small optimized model, losses nearly all of its predictive power at 75% sparsity.

For the Vision Transformer, figure 7, we see a much larger loss at low sparsity and an essential complete loss of predictive ability at larger sparsity levels. OBC’s Unstructured and N:M pruning specifically target convolution and linear layers, present in all three models. ViT is unique from the other two models in that it relies on the Transformer Encoder architecture, not convolution, as the primary building block of the network. While the Transformer Encoder does contain a multi-layer perceptron component with linear layers, the core component of the MultiHeadAttention component is not considered. This partial Transformer pruning is most likely the reason that ViT does not exhibit the same, or any, resilience to being pruned a high sparsity levels.

Figure 3: ResNet 50 - Throughput vs Sparsity

Figure 3: ResNet 50 – Throughput vs Sparsity

Figure 4: Vision Transformer - Throughput vs Sparsity

Figure 4: Vision Transformer – Throughput vs Sparsity

Figure 5: MobileNet v2 - Accuracy vs Sparsity

Figure 5: MobileNet v2 – Accuracy vs Sparsity

Figure 6: ResNet 50 - Accuracy vs Sparsity

Figure 6: ResNet 50 – Accuracy vs Sparsity

Figure 7: Vision Transformer - Accuracy vs Sparsity

Figure 7: Vision Transformer – Accuracy vs Sparsity

7. Discussion

AI optimization methods, such as pruning and quantization, can be used to increase throughput, decrease latency, or decrease model memory size. While optimizations can be used to obtain target performance, they come at a cost of model accuracy. Using the experimentation setup in Section 5, we highlight two common performance metrics generated from our PBH. Using the two sets of performance metrics, one can find the optimal balance between optimization level and accuracy degradation. Not only can the PBH aggregate throughput performance against accuracy, but the Performance Tracker can be updated to include additional performance metrics. This allows a multi-dimensional trade-off evaluation to be performed for an optimized model. Not only does the PBH allow monitoring of multiple performance metrics during inference, we also provide granular controls over how the Inference Pipeline is executed. These are given as input to the PBH Python run script. Execution of the PBH can be scripted to perform multiple executions while varying the input parameters between executions. While benchmarking against a large evaluation dataset is helpful for determining a model’s performance, real- time deployment scenarios may only inference data in small or single-input batches. To ensure the PBH can properly benchmark these scenarios, we provide the ability to determine the batch size, number of data samples to evaluate, and number of times to execute the Inference Pipeline.

A benefit of having a PBH is the ability to monitor a model’s accuracy degradation as optimizations are applied to it. Looking at Figures 2-4, we see the measured accuracy loss compared to the sparsity level for various models. For each sparsity level and type, we evaluated against three different inference engines. These results allow one to not only monitor how a model’s performance is affected by optimizations but also to highlight which inference engine is optimal for a given target hardware or model architecture. While increasing performance of AI models is an important area of consideration for deployment, maintaining a model’s accuracy is critical. For our PBH experiments, we highlight each model’s accuracy in Figures 5-7. This shows how various optimizations can affect the model’s performance. We also consider the related tools mentioned in Section 2. Previous work is closel related to our PBH and even provided inspiration for our work. However, our benchmark harness is developed to be a highly generalized architecture and has capabilities to profile diverse AI models including Transformer and Large Language models. Our tools is also aware of target hardware constraints, which can be quickly added to our Performance Tracker, like computational capacity and adjust its configuration to profile models for their target hardware platforms. It is also aware of computational graph compatibility of heterogeneous target hardware taking those constraints into consideration while generating a performance profile of a given model as input. Researchers and developers can use our AI PBH to quickly evaluate and compare the performance metrics of a variety of AI models while planning their deployment strategy. By modularizing the inference session into prediction generation and prediction scoring components, we can integrate new model types and inference acceleration engines. Our Performance Tracker can also be easily extended to include tracking of additional performance metrics related to either the model itself or the target inference environment. This ensures that our PBH is not only valid today but can continue to be supported in the future.

8. Conclusions

We developed an architecture for a unified performance benchmark harness. We generate initial results by comparing the performance of various model architectures, optimization methods in the form of pruning, and inference acceleration engines. Through the results generated, we show how including various performance metrics can aid researchers in determining how a model can be optimized for a given target hardware platform while meeting any runtime constraints. In order to expand the generality of our PBH, we identify the following potential areas for future enhancement. First, we add support for new model types beyond image classifiers, specifically LLMs. Second, integrate TensorFlow support as a base AI framework. Third, investigate how inference acceleration engines can add additional acceleration for sparse and quantized models and continue integration of additional engines. And finally, integrate additional performance metrics, such as data transfer latency, network latency, etc. The key element to the PBH is versatility, which offers benefits that make it an indispensable tool for AI model performance validations across diverse domains. It is scalable and compatible with a variety of AL models and plays a key role in the study of AI model optimization research.

Acknowledgment

This research is supported by DEVCOM Army Research Laboratory.

References

[1] A. I. I. Alliance, “Aiia dnn benchmark overview,” https://github.com/AIIABenchmark/
AIIA-DNN-benchmark, 2021.

[2] P. Pochelu, “Deep learning inference frameworks benchmark,” 2022.

[3] J.-M. Lueckmann, J. Boelts, D. S. Greenberg, P. J. Gon¸calves, and J. H. Macke, “Benchmarking simulation-based inference,” 2021.

[4] K. Nyarko, P. Taiwo, C. Duru, and E. Masa-Ibi, “Ai/ml systems engineering workbench framework,” in 2023 57th Annual Conference on Information Sciences and Systems (CISS), 2023, pp. 1–5.

[5] X. Xie, W. He, Y. Zhu, and H. Xu, “Performance evaluation and analysis of deep learning frameworks,” in Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition, ser. AIPR ’22. New York, NY, USA: Association for Computing Machinery, 2023, p. 38–44. [Online]. Available: https://doi.org/10.1145/3573942.3573948

[6] R. Wu, M. Li, H. Li, T. Chen, X. Tian, X. Xu, B. Zhou, J. Chen, and H. An, “Machine learning-enabled performance model for dnn applications and ai accelerator,” in 2022 IEEE 24th Int Conf on High Perfor- mance Computing Communications; 8th Int Conf on Data Science Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud Big Data Systems Application (HPCC/DSS/SmartCi- ty/DependSys), 2022, pp. 25–34.

[7] N. Jha, P. Rawat, and A. Tiwari, “Benchmarking analysis of cnn architectures for artificial intelligence platforms,” in Proceedings of Emerging Trends and Technologies on Intelligent Systems: ETTIS 2021. Springer, 2022, pp. 61–76.

[8] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.

[9] J. Ansel, E. Yang, H. He, N. Gimelshein, A. Jain, M. Voznesensky, B. Bao, P. Bell, D. Berard, E. Burovski, G. Chauhan, A. Chourdia, W. Constable, A. Desmaison, Z. DeVito, E. Ellison, W. Feng, J. Gong, M. Gschwind, B. Hirsh, S. Huang, K. Kalambarkar, L. Kirsch, M. Lazos, M. Lezcano, Y. Liang, J. Liang, Y. Lu, C. K. Luk, B. Maher, Y. Pan, C. Puhrsch, M. Reso, M. Saroufim, M. Y. Siraichi, H. Suk, S. Zhang, M. Suo, P. Tillet, X. Zhao, E. Wang, K. Zhou, R. Zou, X. Wang, A. Mathews, W. Wen, G. Chanan, P. Wu, and S. Chintala, “Pytorch 2: Faster machine learning through dynamic python bytecode transformation and graph compilation,” in Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ser. ASPLOS ’24. New York, NY, USA: Association for Computing Machinery, 2024, p. 929–947. [Online]. Available: https://doi.org/10.1145/3620665.3640366

[10] E. Frantar, S. P. Singh, and D. Alistarh, “Optimal brain compression: A framework for accurate post-training quantization and pruning,” 2023.

For additional information, contact Venkat Dasari, venkateswara.r.dasari.civ@army.mil

Author Biographies

Jakob Adams is a principle software engineer with over 10 years of experience. Expertise includes machine learning pipeline design and development, computer vision, neural architecture search, and model optimization. He also has experience in the development of automated testing systems, network automation, and high performance computing. He holds B.S. in Mathematics and Computer Science from The University of Virginia’s College at Wise and a M.S. in Data Analytics from The University of Maryland University College.

Dr. Venkat Dasari is the project lead for Generalized Neural Network model optimization project at DEVCOM Army Research Laboratory (ARL), primarily conducting research in the development of a model and platform agnostic neural network inference acceleration algorithms and architectures for resource constrained edge computing platforms. He received his Ph.D in Immunology from Osmania University, India, and a Master’s Degree in Computer Sciences from Temple University, Philadelphia.

Manuel M. Vindiola is a Cognitive Scientist at DEVCOM Army Research Laboratory in the High Performance Computing Center where he provides computational science support for projects in reinforcement learning, machine learning, and neuromorphic computing.

ITEA_Logo2021
ISSN: 1054-0229, ISSN-L: 1054-0229
Dewey Classification: L 681 12

  • Join us on LinkedIn to stay updated with the latest industry insights, valuable content, and professional networking!