Best 2024 Research Articles in T&E | ITEA Journal

DECEMBER 2024 I Volume 45, Issue 4

Best 2024 Research Articles in Test and Evaluation

Dr Keith Joiner

Chief Editor, International T&E Association’s Journal of T&E

Introduction

Each year ITEA grants its Publication Award to an outstanding publication advancing test and evaluation research or practices. This year the widely read members of the Publications Committee of ITEA considered the many articles from the 12-month qualification period and sought nominations from leading contenders. According to Google Scholar there are about 3045 articles with the full term ‘test and evaluation’ per year (i.e., 6090 since 2023 as at 2 December 2024). The leading contenders were invited to nominate and open nominations were also accepted. These contenders were evaluated equally on these four criteria:

  • Research Rigour
  • Importance of Topic for T&E
  • Logic and Language
  • Educational Value for T&E Community

This article overviews the winning publication by Chandrasekaran, et al. [1], two equal runner-up publications by Lardieri, et al. [2] and Graebener [3], and the publication that came third, Gomez and Vesey [4]. These overviews, including the author biographies, are drawn directly from the nominations without disclosing the nominators. Sincere thankyou to the nominators.

These articles are clear exemplars among the over three thousand produced every year concerning T&E. ITEA encourages all T&E professionals and organisations to read these influential articles and promote the growth of knowledge and practices through all organisations that research or practice T&E. Importantly, please promote your best research and practices through publication and nominations in 2025.

Winner – Testing Machine Learning: Best Practices for the Life Cycle

Dr Jaganmohan Chandrasekaran, Dr Tyler Cody, Dr Nicola McCarthy, and Dr Erin Lanus from Virginia Tech National Security Institute researched the article, ‘Testing Machine Learning: Best Practices for the Life Cycle,’ published in the Naval Engineers Journal [1]. The article provides a timely, extensive, and rigorous overview of test and evaluation (T&E) best practices for machine learning for the defense community. Perhaps its most distinguishing characteristic is its framing of best practices using the machine learning lifecycle, which includes operational T&E, operations and sustainment. This point of view was not present in the literature before, making the team’s contribution especially impactful for test engineering. The team’s work represents a culmination of a concerted effort to address gaps in T&E frameworks and methods for machine learning from the point of view of the defense acquisition process. Together, the team has developed a track record of contributions to T&E literature, especially in the development of combinational interaction testing and coverage metrics for machine learning. The team’s recent article provides a distillation of those insights in light of a rigorous review of existing best practices.

Jaganmohan Chandrasekaran, Ph.D, is a Research Assistant Professor at the Sanghani Center for AI and Data Analytics, Virginia Tech. His research is at the intersection of Software Engineering and Artificial Intelligence, focusing on the reliability and trustworthiness of AI-enabled software systems. His current research involves developing approaches to test and evaluate ML-enabled systems across its lifecycle. He received his Ph.D. in Computer Science from the University of Texas at Arlington.

Tyler Cody, Ph.D., is a Research Assistant Professor at Virginia Tech National Security Institute and an Affiliate Faculty in Industrial and Systems Engineering. His research interest is developing principles and best practices for the systems and test engineering of machine learning and artificial intelligence. His research has broadly applied to engineered systems, including hydraulic actuators, industrial compressors, rotorcraft, and communication networks. He received his Ph.D. in systems engineering from the University of Virginia for his work on a systems theory of transfer learning.

Nicola McCarthy, Ph. D., is a Research Assistant Professor at Virginia Tech National Security Institute and Affiliate Faculty in the Statistics Department. Dr. McCarthy’s current research interests include modeling complex adaptive systems, data and advanced analytics applications (remote patient monitoring), machine learning, explainable AI, and AI transformation. Having worked in the private sector at the intersection of AI, enterprise transformation, and healthcare, she received her Ph.D. from the Stevens Institute of Technology, where she has served as a Distinguished Research Fellow.

Erin Lanus, Ph.D., is a Research Assistant Professor at the Virginia Tech National Security Institute and Affiliate Faculty in the Computer Science Department. Her research leverages combinatorial interaction testing and data coverage as well as metric and algorithm design for the test and evaluation of AI/ML with a focus on data assurance. She received her Ph.D. in Computer Science from Arizona State University.

Equal 2nd – A Statistical Review of the Cyber Test Process

Patrick Lardieri, David Harrison, Sharif Hassan, Govindra Ganesh, and Michael Hankins developed their research article on, ‘A Statistical Review of the Cyber Test Process’ [2]. The article presents multiple options for cyber test teams to measure, track, and communicate success.  Using the statistical principles of test coverage and test power, the team developed metrics for the performance of cyber tabletop teams, penetration testing, and more.  Examples from the work of the Lockheed Martin Red Team on multiple defense contracts provided data to show examples of how the metrics can be applied, while also bringing up additional ideas for future research.  The metrics proposed can help the T&E community plan cyber testing scope and duration based on statistical measures enabling more effective use of their resources.

There remains a real struggle to estimate the amount of cybersecurity testing a priori and to decide which metrics to use. Unfortunately much of the industry do not share these approaches and both Government and Academia struggle to stay at the forefront of such processes. Lockheed’s article “A Statistical Review of the Cyber Test Process” is a rare exception intended to bring better rigor and standards to this important test science. The article kicks off a review of multiple cyber test processes (Table Top, Penetration testing, and more) with the fundamental aspects of STAT including test power and coverage.  The architecture of the article allows for a creative review of each aspect in an orderly fashion.

After the theoretical review, the team gathered data from actual defense department cyber test programs and put the proposed metrics to the test.  A critical evaluation of how long a penetration test effort last (answering to the test power of the effort) showed that a linear approximation was the better model than the originally proposed exponential model. Two other examples came from the data showing an analysis of test team size versus efficiency and a rule-of-thumb for test effort per machine in the cyber system.  These provide original research for problems that all cyber test teams face.

The critical success of this article is the novel approach to evaluation of cyber testing at a time when cyber testing has matured to the point where the review is applicable (5 or 10 years ago the cyber test process was not as accepted as it is today).  The addition of actual data makes the initially theoretical paper applicable to reality.

Patrick Lardieri is a Senior Fellow at Lockheed Martin Rotary and Mission Systems and was the co-creator of the DoD Cyber Table Top process. He was the technical lead for LM National Cyber Range and active in the cyber T&E community for over 10 years. He has a Masters of Science in Electrical Engineering from University of Pennsylvania

David Harrison is an Associate Fellow at Lockheed Martin Space specializing in the application of Scientific Test and Analysis Techniques (STAT) and uncertainty quantification to engineering and business problems.  He has a Bachelor of Science in Mechanical Engineering from Kansas State University, a Master of Science in Materials Science from the Colorado School of Mines, and a Master of Engineering in Engineering Management from the University of Colorado at Boulder.  He is also a Certified Test and Evaluation Professional and a Lean/Six-Sigma Master Black Belt.

Sharif Hassan is the Senior Manager and founding member of the Lockheed Martin Red Team.  Sharif’s leadership has strengthened cyber defenses by uncovering numerous vectors of attack, ultimately reducing the risk posture for diverse advanced technologies, enterprise ecosystems, and national security weapon systems including DoD platforms such as the F-35.  Sharif received both his Ph.D. in Computer Science and a B.S. in Management Information Systems at UCF and holds an M.S. in Computer Science from the Florida Institute of Technology.

Michael Hankins is a Lockheed Martin Fellow specializing in space cybersecurity, cyber risk assessment, and integration of cybersecurity into business capture processes.  He has 19 years of experience as a cyber architect, cyber intel analyst, information assurance engineer, system administrator, and circuit board assembler and tester.  He has a Bachelor of Science in Computer Science from the University of Colorado-Colorado Springs.

Govindra Ganesh is an Associate Fellow on the Lockheed Martin Red Team with 15+ years of experience executing penetration testing and leading teams performing cyber testing. His work led to creating a framework which helps Red Teams contextually rate security findings from testing. He has a Bachelor of Science in Information Systems Technology from the University of Central Florida and a Master of Science in Computer Information Systems from Florida Tech.

Equal 2nd – Formal Methods for T&E: Reasoning over Tests, Automated Test Synthesis, & System Diagnostics

In her doctoral thesis, Dr. Josefine Graebener [3] introduces new approaches for testing autonomous systems, leveraging formal methods to address the unique challenges posed by these emerging technologies. Her work presents new methodologies designed to enhance the efficiency and effectiveness of testing processes, offering innovative solutions that have significant potential for advancing safety and reliability in this new era of autonomous systems.

Dr. Graebener developed a framework using assume-guarantee contracts to structure and specify tests for autonomous systems. This framework allows for the creation of detailed test structures, enabling the combination, splitting, and comparison of tests. It also characterizes conditions under which tests can be combined and identifies when temporal constraints are required. The practical application of this framework includes strategies such as winning sets and Monte Carlo tree search to optimize test agent strategies. Another significant aspect of her research involves synthesizing test environments that incorporate static and reactive obstacles along with dynamic test agents. Dr. Graebener’s approach uses linear temporal logic to define the desired test behavior and objectives, ensuring that tests are feasible for correct systems to pass. The framework constructs virtual product graphs and system graphs to represent possible test executions and the system’s perspective. The routing problem for these test environments is formulated as a network flow optimization problem, represented as a mixed integer linear program. A counterexample-guided search approach using GR(1) synthesis is proposed to determine test agent strategies. This framework has been validated through various simulations and hardware tests using a pair of quadrupedal robots.

Dr. Graebener also developed a framework for diagnosing system-level faults by identifying the responsible components. This framework uses assume-guarantee contracts and Pacti, a tool for compositional system analysis, to create a diagnostics map that traces system-level guarantees to potential causes. The approach reduces the number of statements that need to be checked, improving the efficiency of the diagnostics process. This method has been illustrated with abstract examples and real-world inspired case studies, demonstrating its practical utility in fault diagnosis.

This research has opened up new questions and areas of exploration in the formal testing and certification of autonomous systems. By introducing novel methodologies, Dr. Graebener’s work sets the foundation for the development of new tools and approaches, contributing to ongoing efforts to enhance the safety and reliability of these technologies.

Dr. Josefine Graebener is a Postdoctoral Researcher at the California Institute of Technology’s Department of Computing and Mathematical Sciences, where she specializes in the testing of autonomous systems through network flow methodologies. Her research aims to ensure the reliability and safety of autonomous systems, a critical area as these technologies become increasingly integrated into various sectors. Dr. Graebener holds a Ph.D. in Space Engineering, with a minor in Computer Science, from the Graduate Aerospace Laboratories at Caltech (GALCIT). Her doctoral research was centered on the application of formal methods to enhance the testing process of autonomous systems, setting the stage for her current focus on automated testing of high-level decision-making modules. Her contributions during this time were recognized with the Ernest E. Sechler Memorial Award in Aeronautics, an honor awarded to students who have made significant contribution to the teaching and research at GALCIT.

Prior to her doctoral work, Dr. Graebener earned a Master of Science in Space Engineering from Caltech in 2019. She was awarded a Fulbright Fellowship to support her studies. She also received a Bachelor of Engineering in Aerospace Engineering from Aachen University of Applied Sciences in Germany, graduating at the top of the department. Her undergraduate thesis, which focused on designing and predicting a test for a novel approach to temperature stabilization for geostationary weather satellites, received an award from the German Society for Aeronautics and Astronautics. During her doctoral studies, Dr. Graebener also co-led a team to organize the Caltech Space Challenge 2022, an international student competition where 32 students from around the world were invited to design a space mission with guidance from experts in academia and industry. This experience allowed her to contribute to the broader aerospace community and mentor the next generation of engineers and scientists.

3rd – On the Design, Development, & Testing of Modern APIs

Dr Alejandro Gomez and Mr Alex Vesey developed their White Paper for the Software Engineering Institute, ‘On the Design, Development, and Testing of Modern APIs’ [4]. This outstanding paper explores current research and best practices in the design, development and testing of modern Application Programming Interfaces (APIs) within software-based systems, encompassing both web APIs and broader software APIs. APIs serve as crucial components in systems that facilitate code reuse, composition of programs, and abstraction of implementation details. Although API calls comprise over 50% of all internet communications (CloudFlare), they are an enticing and often poorly defended attack surface. Since APIs expose system functionality, they must be thoroughly tested and evaluated to ensure they operate as expected and are trustworthy to its users. To accomplish this objective, the article discusses considerations of API design based on architectural characteristics, findings from usability research, and methodologies for creating quality software. This article discusses modern industry practices in incorporating evolvability and interoperability to APIs. Moving beyond design considerations, the paper discusses the secure API development process using DevSecOps and refined through methodologies developed at the Software Engineering Institute. The testing of APIs is broken down into different models for testing and evaluating software, looking at the most common API vulnerabilities and providing solutions based on attributes of confidentiality, integrity, authentication & authorization, and non-repudiation. In addition to the API software, the article also highlights the need to properly test and evaluate the supporting architecture deployed alongside APIs, such as load balancers, proxies and Content Delivery Networks, as these are additional avenues of attack. This article draws on research of Zero Trust architectures to provide developers with a framework to test their APIs to conform to Zero Trust attributes as well as latest findings of Large Language Model testing. By aggregating the latest research, U.S government guidelines and industry best practices of APIs, the authors hope to provide a holistic understanding of APIs for modern developers and architects along with an extensive resource list provided for additional reading.

This team’s white paper covers the following topics:

  • What is an API?
  • What factors drive secure API design?
  • What qualities do good APIs exhibit?
  • How are APIs tested, from the systems and software security patterns point of view?
  • What cybersecurity and other best practices apply to APIs?
  • How can we verify the APIs we we build operate as expected?

The team studied and researched the API T&E space to inform this comprehensive body of API T&E with a focus on security T&E for APIs, helping to draw attention to a gap in T&E.

Alejandro Gomez[1] is a software engineer at Carnegie Mellon University’s Software Engineering Institute. He has served as a tech lead in multiple Department of Defense projects, bringing technical excellence, bridging communication between management and software teams as well as teaching and mentoring other developers. Prior to joining CMU, Alejandro worked as a software engineer at Vanguard and IBM. He has an MS in Software Engineering from Villanova University and a double Bachelors in English and Economics from the University of Miami, FL. He lives in Pittsburgh with his wife and two children.

 Alex Vesey[2] is a software engineer with more than 6 years of experience developing systems for the DoD both at the SEI and formerly Lockheed Martin. His undergraduate studies included degrees in information science, security and risk analysis from Penn State and he holds a master’s degree from Worcester Polytechnic Institute in System Engineering. His research interests include modeling, simulation, and operations analysis.

References

[1] J. Chandrasekaran, T. Cody, N. McCarthy, E. Lanus, L. Freeman, and K. Alexander, “Testing Machine Learning: Best Practices for the Life Cycle,” Naval Engineers Journal, vol. 136, no. 1-2, pp. 249-263, 2024.

[2] P. Lardieri, D. Harrison, M. Hankins, S. Hassan, and G. Ganesh, “A Statistical Review of the Cyber Test Process,” The ITEA Journal of Test and Evaluation, vol. 45, no. 2, 2024.

[3] J. B. M. Graebener, “Formal Methods for Test and Evaluation: Reasoning over Tests, Automated Test Synthesis, and System Diagnostics,” PhD, California University Of Technology, Online, 2024.

[4] A. Gomez and A. Vesey, “On the Design, Development, and Testing of Modern APIs,” Software Engineering Institute, OnlineJuly 30 2024, Available: https://insights.sei.cmu.edu/library/on-the-design-development-and-testing-of-modern-apis/.

1https://insights.sei.cmu.edu/authors/alejandro-gomez/
2https://insights.sei.cmu.edu/authors/alex-vesey/

ITEA_Logo2021
  • Join us on LinkedIn to stay updated with the latest industry insights, valuable content, and professional networking!