Volume 46, Issue 1- ITEA Journal March 2025 | International Test and Evaluation Association

MARCH 2025 I Volume 46, Issue 1

Editorial – ITEA Journal – March 2025

Dr Keith Joiner

Dr Keith Joiner

CSC, CPPD, CPEng,
Group Captain (retired)

TEST BEDS AND MACHINE LEARNING

Introduction

This edition is serendipitously all about test beds and artificially intelligent machine learning (AI ML). The first ITEA event of 2025 is in Washington on 19 March concerning testing of AI ML, followed quickly by three events, all with AI ML testing coverage:

  • Test instrumentation Workshop – Las Vegas – 15-17 Apr;
  • Defense and Aerospace Test and Analysis Workshop (DATAWorks) – Potomac Yard, Alexandria – 22-24 Apr
  • Accelerating the pace of T&E – Portsmouth, United Kingdom, 20-21 May

Design engineers and testers know that fidelity and optimisation in any capability domain are inherently multi-factor and multi-output. The days when assuming there were four dominant factors and testing only for the influence of those to create a crude look-up table are long over. Designing and testing for twenty to thirty influencing factors gives enormous advantages and is enabled through sparse test designs like combinatorial high throughput testing. Optimising and interpreting the outcomes in regression-based predictions based on such test designs has been the normal for several decades but is now being replaced by deep neural networks (DNNs). DNNs use many computational layers to create algorithms to predict performance, adjusting quickly for all the perceivable factors and the effects of noise. While AI ML is often superior in prediction, much of the assurance of the traditional systematic analysis is lost or is much harder to discern. Certainly, you can examine the factor representativeness of the ML training [1-3], however, discerning the priority of factors used by the neural network to decide test cases requires explainable AI (XAI) approaches — another AI interpreting the first. A great example of XAI in safety-critical domains is the article just published by Nanyonga, et al. [4]. Nanyonga, et al. [4] first used a Variational Autoencoder (VAE) model to augment under-represented class instances and address the class imbalance of a training distribution, similar in concept to the synthetic data augmentation covered in our last edition [5]. Second, Nanyonga, et al. [4] used two XAI techniques to explain and thus help assure the algorithm; namely, SHapley Additive exPlanations (SHAPs) and Local Interpretable Model-agnostic Explanation (LIME). For future editions, our journal is interested in any instances of testers comparing XAI analysis to prior factorial knowledge to assure AI-enabled systems.

Technical Articles

To warm up, our edition first looks at a test bed creation at an organisational level and thus the system-of-systems level, rather than just capability-by-capability and thus incrementally outsourced contractor after contractor.  The article by Emily Pozniak et al. from Johns Hopkins University concerns the development of such a test bed by the Department of Homeland Security (DHS). The testbed is aptly named the Independent Automated Verification and Validation (INOVVATION) testbed. As we know from ITEA events, DHS has been developing increasing T&E maturity across the last decade and investing in such digital test infrastructure should fundamentally help them address the emergence from the SoS phenomenon caused by integrating multi-proprietary, multi-generational and multi-security capabilities [6-8]. The acceptance of this integration test responsibility at the final level of integration (DHS) addresses the heterarchical nature of such system-of-systems; removing surprises caused by over-dependence on individual contractors guessing poorly documented upper layers of capability integration.

For the second article we chose an application of AI ML to test capability by Lucy Green, Rebecca Jones, and Dr Alexander Milroy at Qiniteq in the UK. Their article concerns developing ML to examine defects in Solid-propellant rocket motors (SRMs) commonly used in spaceflight and by the military for missile defence systems. Following non-destructive testing on SRMs, Industrial Computed Tomography (ICT) imagery is used to support manual defect analysis. Their paper proposes a two-step ML solution for the automatic detection and characterisation of selected defects. Those interested in seeing a presentation on this capability should attend the UK ITEA event in May.

The third technical article begins to examine another assurance concern of AI ML — whether the wherewithal for modified training can be deployed and if the trained algorithms can be efficiently used on ‘edge computing.’  The article is by our current ITEA Chairman, Dr Michael Barton, Thomas Kendall and Jamie Stack and his associates at the Army Research Laboratories (ARL) and concerns lessons learned from a first-of-a-kind mobile, deployable, turnkey high-performance computing (HPC) system known as SuperComputing OUTpost (SCOUT).

Continuing the theme of assuring efficient use in ML, our fourth article is by Ethan Marquez, Adam Niemczura, Cooper Taylor, Max Faykus and others from Clemson University in South Carolina. Their research explores the effects compressed training images have on the performance of deep learning segmentation architectures. Their testing finds lossy compression is a practical solution for providing real-time transfer speed for autonomous vehicle perception systems. Reading their article and many others since Christmas, it is obvious modern testers must learn the basics of performance metrics in ML to readily interpret outcomes like theirs:

Of the two deep learning architectures tested, EfficientViT outperforms U-Net for all lossy compressors at all levels of compression. EfficientViT achieves a peak mIoU of 95.5% at a JPEG quality level of 70. While U-Net peaks with an mIoU of 90.683% at a JPEG quality of 40.” 

The fifth and final technical article is by Dr Jakob Adams, Dr Venkat Dasari, and Dr Manuel Vindiola and it threads together themes from the others and develops a test bed for AI ML, focused primarily on ensuring any optimisations like quantization and pruning to be able to operate with reduced computational capacity on edge computing do not unacceptably reduce performance rigour. They design and develop a comprehensive and generalized benchmark harness and test its functionality against optimized AI models, measuring several performance metrics.

Conversations with Experts

For those testers reading about assuring AI-enabled systems for the first time, you may still be reeling from the complexities of cybersecurity test processes. Take heart from our interview this edition with Dr. Bill D’Amico. Our esteemed tester began in 1968 designing largely mechanical recording instrumentation for projectiles to measure spin yet he is now defining frameworks and test benches for AI ML as part of consulting on autonomous systems behaviour and performance. His significant adaptations to numerous technologies across more than fifty years will encourage you to persevere and he identifies several universal themes.

Conclusion

Faced with integrating multi-proprietary, multi-generational and multi-security capabilities that exhibit the system-of-systems phenomenon [7], some organisations have still not invested in the all-important integrating test infrastructure to shift-left risks from each of their individually acquired systems and upgrades [9]. Positive examples and outcomes are emerging for those who embraced digital engineering and digital acquisition early with underpinning test infrastructure [10, 11]. Put simply, Network Integration Centres (NICs) are critical to informed shift-left testing to integrate and sustain any portfolio of interdependent capabilities. The old adage ‘not testing is not knowing’ could be ‘if you don’t have a test bed at the top layer of integrating you will be surprised by something new every day and will have to discard a significant proportion of all that you buy.’ Given that the system-of-systems phenomenon is now occurring at the software level due to multi-proprietary, multi-generational software code reuse [7] and APIs [12], the acquisition of software-intensive systems will be a wicked problem without adaptable and pervasive acquisition and supporting test [13].  The articles in this edition highlight there is a progressive evolution for a network integration test centre into coordinating agile software testing on a common test bed (Article 1) through to ML assurance (Article 5) — all of course increasingly accredited and focused on cyber-secure operations. AI-enabled systems will likely further deteriorate the ability of middle-level companies and countries to determine stable and predictable capabilities with any sovereignty wherever they have not invested in the basic network integration centre as deep test and certification hubs. Put another way, the decade of 2010-2020 was challenged by cybersecurity — the decade of 2020-2030 is being challenged by AI-enable systems —the answer to both is evolving and knowledgeable test infrastructure.

I commend the authors of the articles in this edition; namely, the companies, laboratories and agencies who worked with the listed universities to invest in better test infrastructure to evolve. Organismic theory of complex systems and their governance predicts these agencies and universities will inherit a better future [8].

References

[1] E. Lanus, L. J. Freeman, D. R. Kuhn, and R. N. Kacker, “Combinatorial testing metrics for machine learning,” in IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Online, April 2021: IEEE, pp. 81-84.

[2] T. Cody, E. Lanus, D. D. Doyle, and L. Freeman, “Systematic training and testing for machine learning using combinatorial interaction testing,” in IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Online, 4-13 April 2022: IEEE, pp. 102-109.

[3] J. Chandrasekaran, T. Cody, N. McCarthy, E. Lanus, L. Freeman, and K. Alexander, “Testing Machine Learning: Best Practices for the Life Cycle,” Naval Engineers Journal, vol. 136, no. 1-2, pp. 249-263, 2024.

[4] A. Nanyonga, H. Wasswa, K. F. Joiner, U. Turhan, and G. Wild, “Explainable Supervised Learning Models for Aviation Predictions in Australia,” Aerospace America, vol. 12, no. 3, p. 223, 2025. [Online]. Available: http://dx.doi.org/10.3390/aerospace12030223.

[5] M. Felter, J. Sustaita, and J. Starling, “Synthetic Data for Target Acquisition,” ITEA Journal, vol. 45, no. 4, 2024. [Online]. Available: https://itea.org/journals/volume-45-4/synthetic-data-for-target-acquisition/.

[6] J. Dahmann and D. DeLaurenits, “Unique Challenges in System of Systems Analysis, Architecting, and Engineering,” in Systems Engineering for the Digital Age: Practitioner Perspectives., D. Verma Ed.: John Wiley & Sons, Inc., 2023, ch. 28.

[7] F. H. Ferreira, E. Y. Nakagawa, and R. P. dos Santos, “Reliability in Software-intensive Systems: Challenges, Solutions, and Future Perspectives,” in 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), September 2021: IEEE, pp. 54-61.

[8] P. F. Katina, “System Acquisition Pathology: A comprehensive characterization of system failure modes and effects,” Int. J. of Critical Infrastructures, vol. Accepted, 2020.

[9] K. Joiner and M. Tutty, “A tale of two allied defence departments: new assurance initiatives for managing increasing system complexity, interconnectedness and vulnerability,” Australian Journal of Multi-Disciplinary Engineering, vol. 14, no. 1, pp. 4-25, 2018.

[10] E. B. Rogers and S. W. Mitchell, “MBSE Delivers Significant Return on Investment in Evolutionary Development of Complex SoS,” Systems Engineering, vol. 24, no. 6, pp. 385–408, 2021. [Online]. Available: https://doi.org/10.1002/sys.21592.

[11] D. DeLaurentis, A. Raz, and C. Guariniello, “MBSE for System-of-Systems,” in Handbook of Model-Based Systems Engineering, C. C. Madni Ed. Cham: Springer International Publishing, pp. 987–1015.

[12] A. Gomez and A. Vesey, “On the Design, Development, and Testing of Modern APIs,” Software Engineering Institute, Online, 2024. [Online]. Available: https://insights.sei.cmu.edu/library/on-the-design-development-and-testing-of-modern-apis/

[13] J. Weiss and D. Patt, “Software Defines Tactics: Structuring Military Software Acquisitions for Adaptability and Advantage in a Competitive Era,” Hudson Institute, Online, 2022. [Online]. Available: https://www.hudson.org/national-security-defense/software-defines-tactics-structuring-military-software-acquisitions

ITEA_Logo2021
  • Join us on LinkedIn to stay updated with the latest industry insights, valuable content, and professional networking!