Test and Evaluation of AI-Enabled Systems

This edition of the ITEA Journal is my first as your Editor in Chief, replacing Research Professor Laura Freeman. Over several years, Laura did an enormous job digitizing the journal and making it open-access. Her journal reforms will benefit the T&E Community for decades. She was thorough in providing a handover for me. I must also acknowledge Danielle Kauffman, Kathi Swagerty, and our production team for their efforts in training up a new Editor in Chief and pulling this transitional edition together.

I’ve moved the Editor in Chief’s opening to be more ‘editorial’ and not just covering the former ‘Issue at a Glance.’ The editorial will still introduce our articles and contributions in each edition; however, where possible, I’ll add some informed commentary about where the research fronts in these articles need to go or what the other implications are for the T&E community. My other change has been to create new categories for submissions in our ‘2024 Call for Papers’ at this site:

The ITEA Journal of Test and Evaluation

The new submission categories are ‘Extended Presentations’ and ‘Extended Posters’. These submissions aim to lower the barrier for members of the test community to be published. We always have many outstanding presentations at our symposia that blend advances in theory, research, and practice. Still, most of these presentations are never formalized into articles, especially for contributors in industry or Government. The artistry of most presentation slides and the accompanying transcripts are outstanding and, if captured, can have the clarity, originality, and impact of more formal articles. If you have an excellent presentation from symposia concerning T&E, please make the effort to put the slides into an MS Word document, transcribe your words about each slide below each slide, add your biography and any references, and submit it to us. Similarly, if you have a great poster on T&E from a symposium, break out the poster into segments (i.e., quartiles or panels) and put these into an MS Word document with your descriptive transcript below each segment picture.

I also remind our members that they are most welcome to be T&E reporters at our symposia. For several decades, Mr Dave Duma delivered a summarising critique at the annual symposium, which should have been published for the benefit of all members. Please consider if you would like to be a reporter for any of our symposia you attend. If we publish your coverage, we hope to offer a bonus registration at a symposium of your choice in the following year; I am seeking approval for this from the Board. Early and late-career professionals often take away different perspectives from symposia, so we may publish more than one report on a symposium.

The theme of our March Edition is the ‘T&E of AI-enabled systems’ and the Publication Committee and editorial staff have delivered a jam-packed edition.

We start with our book review. Dr Mark London reviews the book ‘Systems Engineering for the Digital Age: Practitioner Perspectives’ [1]. Digitization and digitalization [2] underpin all aspects of any AI-enabled system [3]. In testing and evaluation, digitization can be functionally mapped and tested; however, digitalization requires usability testing and the wherewithal for humans to exploit the capabilities, including malicious use as a cyber threat [4]. ITEA regularly runs events on the impact of T&E on digital engineering practices. I’d encourage any of our community still transitioning to the ‘Digital Age’ to read Dr London’s book review and consider buying this tome.

Our first great research article covers the need for various levels of fidelity when assuring systems using AI Machine Learning. Purdue University academics researched the article with assistance from the prolific Dr Kristen Alexander in the U.S. DoD, DOT&E, and our former Editor in Chief (Virginia Tech). As with many research articles in this new and challenging area (i.e., [3, 5]), they adapt quintessential skills and processes for software assurance to account for the evolving intelligence in systems using machine learning. They identify limitations in the usual acquisition processes to achieve the levels of fidelity needed to build and sustain factorial representativeness in machine learning curricula and the trustworthiness of operators. They conclude with five principles to transform T&E management to better deal with AI-enabled systems.

The second of our research articles nicely dovetails with the technical assurance of AI-enabled systems by defining trustworthiness and what measures are needed. This article by Dr. Yosef Razin (IDA) and Dr. Kristen Alexander (US DoD, DOT&E) ‘highlights trust’s relational and context dependence and how this gives rise to varying testing requirements for different stakeholders, including users, regulators, testers, and the general public.’ Dr Razin and Dr Alexander follow many researchers in the ethics of AI (i.e., [6]) to observe that ‘trustworthiness and trust cannot be tested separately from their users and other stakeholders; nor can they be assessed just once, but require continuous assessment.’ Their codification of trustworthiness and recommendations to educate the test community are essential to achieving human-autonomy teaming for modern AI-enabled robotic and autonomous systems [7].

Like intelligent humans, AI-enabled systems need to continue learning, which is increasingly enabled and assured by digital twins [3, 8-10]. The third article reports on an ITEA Workshop for defining digital twins. Erwin Sabile worked diligently and persistently with many contributors from industry and government to better define what a digital twin is and how they enable the T&E to assure modern systems through life. The illustrations from this workshop convey as much as the formal words. In future editions, we hope to report how such twins are established in test-intensive industries like shipyards [11] and for specialist test domains like cybersecurity [12].

Our fourth article is a challenging research article from Stuart Harshbarger, Dr Rosa Heckle, and Michael Collins. They take the three concepts above and refer to such capabilities as complex adaptive AI-enabled systems (CA2IS), or if you like, seeing intelligent systems that learn as ‘organismic’ and thus with the true complexity for emergent behavior [13]. They go further and are concerned with evaluating autonomous multi-agent systems (AMAS) [14]. They state that the ‘underlying statistical basis of AI/ML-enabled systems necessitate the adoption of novel approaches to system T&E’ and propose using ‘Consensus-enabled Distributed Ledger Technology (C-DLT)’ for in-situ testing of such systems. To conceptualize what Stuart, Rosa, and Michael suggest, I relied on my limited understanding of blockchain technology and its assurance in the cybersecurity of business and logistics [15, 16]. However, I appreciate that their proposal extends such accountable recording and attribution into AI-enabled systems and the realm of consensus in human-autonomy teaming where human speed and machine speed must logically blend. Their article proposes new research to investigate this potential approach for assurance in AI-enabled systems.

The fifth article in this Edition comes from Dr Suzanne Beers (Mitre) and concerns improving capability life cycles to achieve better decision-making. Here, the Test Community’s development of a single, integrated framework to guide data collection from live, virtual, constructive (LVC) test and modeling and simulation (M&S) events (the Integrated Decision Support Key) has been extended beyond ‘the strict confines of a system acquisition to the full capability lifecycle’ using a ‘Decision Support Evaluation Framework.’ This initiative takes the simple concept of being ‘test-led’ and pushes that approach into through-life decision-making within the acronym and process-heavy U.S. DoD. Dr Beers is commended for carefully explaining a critical initiative to help simplify innovation in the U.S. DoD. There are real generational challenges in innovating towards AI-enabled systems [17], and having tailorable test-led processes like Dr Beers articulates should help harness and team ‘Boomers’ through to ‘Gen-Alfa’ [18].

Our sixth and final article this Edition is on developing the future T&E workforce. This inspiring article captures how to inculcate and motivate new generations into the nexus of innovation and testing; test-led development. Ginny To kindly curated an overview of the Summer Student Team research experiences at the Army Research Laboratory (ARL)-Maryland Robotics Center. Students describe the challenge of safely deploying uncrewed ground vehicles (UGVs) from uncrewed aerial vehicles (UAVs) to team with soldiers and maneuver into superior fighting positions—the challenges this student team overcame and the lessons learned are given in an exciting article; a reassurance that Gen Z has enormous value to bring to T&E. Any organization recruiting the next generation of testers, be it Gen Z or from 2028 Gen Alfa, should read this article and reflect on the leadership approaches articulated by [18].

Our next edition will be themed on the ‘Values of T&E.’ Organizations increasingly need to articulate their values to attract and retain staff, such as the illustration in Figure 1. As many of those with long careers in T&E know, there are strong values in test agencies that imbue testers, or vice versa, with values like independence (objectivity). If you would like to learn more about the potential values of the T&E Community, please take the following survey:

https://unsw.au1.qualtrics.com/jfe/form/SV_80oH9W83USB1Vqe

Figure 1: Example Principles in T&E with inherent values

If you would like to participate in defining ITEA’s T&E values and later consider our principles and a code of practice for T&E, please get in touch with Dr Malcolm Tutty (m.tutty@unsw.edu.au) who is undertaking this valuable work for the ITEA Board and Community.

References

[1] Systems Engineering for the Digital Age: Practitioner Perspectives. John Wiley & Sons, 2024.

[2] R. A. Teubner and J. Stockhinger, “Literature review: Understanding information systems strategy in the digital age,” The Journal of Strategic Information Systems, vol. 29, no. 4, p. 101642, 2020.

[3] A. Tolk, P. Barry, and S. C. Doskey, “Using modeling and simulation and artificial intelligence to improve complex adaptive systems engineering,” International Journal of Modeling, Simulation, and Scientific Computing, vol. 13, no. 2, pp. 2241004.1-19, 2022.

[4] K. F. Joiner, A. Ghildyal, N. Devine, A. Laing, A. Coull, and E. Sitnikova, “Four testing types core to informed ICT governance for cyber-resilient systems,” International Journal of Advances in Security, vol. 11, 2018.

[5] J. Weiss and D. Patt, “Software Defines Tactics: Structuring Military Software Acquisitions for Adaptability and Advantage in a Competitive Era,” Hudson Institute, Online2022, Available: https://www.hudson.org/national-security-defense/software-defines-tactics-structuring-military-software-acquisitions.

[6] A. Jobin, M. Ienca, and E. Vayena, “The global landscape of AI ethics guidelines,” Nature Machine Intelligence, vol. 1, pp. 389–399, 2019.

[7] K. J. Yaxley, K. F. Joiner, H. Abbass, and J. Bogais, “Life-learning of smart autonomous systems for meaningful human-autonomy teaming,” in A Framework for Human System Engineering: Applications and Case Studies, H. Handley and A. Tolk, Eds.: IEEE Wiley, 2020, pp. 43 – 61.

[8] E. VanDerHorn and S. Mahadevan, “Digital Twin: Generalization, characterization and implementation,” Decision Support Systems, vol. 145, p. 113524, 2021.

[9] A. Fuller, Z. Fan, C. Day, and C. Barlow, “Digital twin: Enabling technologies, challenges and open research,” IEEE access, vol. 8, pp. 108952-108971, 2020.

[10] A. Rasheed, O. San, and T. Kvamsdal, “Digital twin: Values, challenges and enablers from a modeling perspective,” IEEE Access, vol. 8, pp. 21980-22012, 2020.

[11] T. Y. Pang, J. D. Pelaez Restrepo, C.-T. Cheng, A. Yasin, H. Lim, and M. Miletic, “Developing a digital twin and digital thread framework for an ‘industry 4.0’ shipyard,” Applied Sciences, vol. 11, no. 3, pp. 1-23, 2021.

[12] F. Flammini, “Digital twins as run-time predictive models for the resilience of cyber-physical systems: a conceptual framework,” Phil. Trans. R. Soc. A, vol. 379, p. 20200369, 2021.

[13] C. B. Keating and P. F. Katina, “Complex system governance: Concept, utility, and challenges,” Systems Research and Behavioral Science, vol. 36, no. 5, pp. 687-705, 2019.

[14] (2022). No. 11, Report on Applied Research Directions and Future Opportunities for Swarm Systems in Defence. Available: https://researchcentre.\\army.gov.au/library/occasional-papers/swarming-and-counterswarming

[15] X. Liang, S. Shetty, D. K. Tosh, J. Zhao, D. Li, and J. Liu, “A reliable data provenance and privacy preservation architecture for business-driven cyber-physical systems using blockchain,” International Journal of Information Security and Privacy (IJISP), vol. 12, no. 4, pp. 68-81, 2018.

[16] L. Alevizos, M. H. Eiza, V. T. Ta, Q. Shi, and J. Read, “Blockchain-Enabled Intrusion Detection and Prevention System of APTs Within Zero Trust Architecture,” IEEE Access, vol. 10, pp. 89270–89288, 2022.

[17] C. S. Kayser and R. Cadigan, “The future of AI: Generational tendencies related to decision processing,” Journal of AI, Robotics & Workplace Automation, vol. 1, no. 2, pp. 157–172, 2021.

[18] V. Ramirez-Herrero, M. Ortiz-de-Urbina-Criado, and J.-A. Medina-Merodio, “Intergenerational Leadership: A Leadership Style Proposal for Managing Diversity and New Technologies,” Systems (Basel), vol. 12, no. 50, 2024.