AGENDA & BIOS
Agenda link here.
Bios link here.
OVERVIEW
This forum will focus on:
- AI T&E Policies and Guidance: Sharing the latest information.
- AI-Enabled Systems: Focusing on new technology being integrated into military capabilities.
- Integrating and Validating: Focusing on primary challenges and goals of the T&E community: how to safely and effectively incorporate these complex systems into existing platforms and ensure they work correctly.
- Digital Engineering (DE): DE and Model-Based Systems Engineering (MBSE). Using virtual proving grounds, digital twins, and authoritative data sources is seen as the key to accelerating processes and reducing costs.
- Decision Advantage: Connecting the technical work of T&E and DE to the ultimate strategic goal for the warfighter: faster, better-informed decisions on the battlefield, which is a core tenet of the DoD’s AI adoption strategy.
SPEAKERS
Congresswoman Jen Kiggans, Virginia Second District, US House of Representatives – Invited
Dr. Amy E. Henninger, Senior Science Advisor for Advanced Computing, Science and Technology Directorate
U.S. Department of Homeland Security, “Adversarial and Counter AI: Why it Matters Now”
Dr. James Sharp, Defense Science and Technology Laboratory (Dstl), Ministry of Defense, UK
Matt Maroofi, Senior Director of Product Development, Shield AI
Dr. Sandeep (Sandy) Patel, KBR, AI/ML & Space Enterprise Manager and Deputy Program Manager for DIA/DOT&E/TETRA Contract
Dr. Jeremy S. Werner, Defense Tech Architect & Ambassador, Cadence Design Systems, Crossing the Valley of Death: Shifting Left using AI and Hardware-Accurate Digital Twins to Accelerate Acquisition
Dr. Kerianne Hobbs, Senior Engineering Specialist, Vehicle Autonomy & System Trust, The Aerospace Corporation
Abstract: The rapid evolution of AI-enabled autonomy is reshaping operations across multiple domains. This talk presents an emerging integrated framework combining guardrails, watchdogs, live-virtual-constructive (LVC) testing, human-autonomy teaming (HAT), traditional processor/hardware-in-the-loop (PIL/HIL) methods, and unique approaches to test case generation to accelerate the responsible deployment of AI-enabled autonomy without sacrificing safety. As autonomous systems undertake critical decision-making, it is essential to establish clear behavioral boundaries, manage risk with structured representative testing at scale, and integrate human oversight with machine decision-making.
PANEL DISCUSSIONS
Overview: T&E Transformation
Moderated by: Daria Stafford, Technical Director, Director Operational Test and Evaluation
Digital Engineering (DE), Artificial Intelligence (AI), and Acquisition Transformation necessitate a fundamental rethink of our traditional processes. Led by Daria Stafford, Technical Director for DOT&E, this panel brings together leaders to explore how Test and Evaluation is shifting from a late-stage “final gate” to a continuous engine of discovery and learning. The panel will cover how T&E can provide decision advantage, the ability to make faster, better-informed choices in a digitally competitive world. The panel will discuss moving T&E from a series of discrete events at the end of the “V” to a mission-engineering continuum. The panel will discuss the potential for leveraging Digital Engineering, MBSE, spanning LVC environments, moving towards an authoritative data environment that validates complex AI-enabled systems earlier in the lifecycle. However, while DE and MBSE offer a path toward more agile validation, the panel will also address the significant technical and cultural hurdles of creating truly useful digital test environments. This includes the difficult work of integrating authoritative data sources with operational test characteristics, such as representative users and realistic combat environments, to ensure digital models provide a high-fidelity reflection of the battlefield. The discussion will also tackle the unique challenges of AI-enabled systems. Panelists will share insights into the use of guardrails and Bayesian models to build confidence in AI performance across the acquisition lifecycle. Finally, the panel will discuss policy and guidance updates that align with these technology advances. Ultimately, this session challenges T&E professionals to refine their role in ensuring a lethal, effective, and AI-ready force through more integrated, data-driven outcomes.
Panelists
Dr. Laura Freeman, Deputy Director, Virginia Tech National Security Institute, Assistant Dean for Research, College of Science, and ITEA Fellow
To Be Announced, Chief Digital and Artificial Intelligence Office
To Be Announced, Military Service Representative
Testing and Evaluation Strategies with and for AI in Complex Systems
Moderated by John Frederick, Director, Innovation and Testing Strategies, Veracity Engineering
This panel will explore how T&E and organizational cultures must adapt to assess complex, nondeterministic AI and ML enabled safety and security critical systems. Panelists will discuss technical challenges, system characteristics, and metrics for integrating and testing AI, focusing on how verification and validation (V&V) evidence builds decision confidence, supports certification, and ensures operational suitability. The panel will examine the design and validation of data ontologies, decision support models, and data governance as essential enablers of AI in complex environments. The role of digital engineering, including the integral relationship between digital twins and AI/ML capabilities, will be highlighted. Finally, panelists will explore how AI and ML methods can enhance the effectiveness, efficiency, and coverage of V&V.
Panelists
Dr. Ian Levitt, Distinguished Board Member, National Aerospace Research & Technology Park
Eman Kawas, Independent Advisor, Decision Assurance using AI-Enabled Digital Twins
Dr. Antonios Kontsos, Henry M. Rowan Foundation Professor, Director of the Digital Engineering Hub
Jonathan Dziok, Systems Engineer, Veracity Engineering
Artificial Intelligence: An Industry Perspective
Moderated by Bryan Vandrovec, Chief Technologist, Autonomous and AI Systems, Booz Allen Hamilton
This panel examines how a Digital Proving Ground overcomes the limitations of traditional physical testing for complex AI-enabled systems. Industry experts will discuss leveraging a generative AI-powered knowledge assistant for automated test planning, high-fidelity test range reconstruction, physics-based digital twins to generate synthetic data, and interpretable runtime guardrails to assess machine reasoning. Together, these capabilities accelerate evaluation processes, realize significant cost savings, and establish the calibrated trust required for modern autonomous systems development.
Panelists
Judy Brown Stoer, Autonomy Test Team Lead, Weather Gage Technologies
Johannes Waldstein, Founder & CEO, PiLogic Inc.
Dr. Policarpio Soberanis, Synopsys Inc.
Nelson Santini, Senior Vice President, Edge Case Defense
AUDIENCE PARTICIPATION
Sara Jordan, Institute for Defense Analyses (IDA)
The Audience will have an opportunity to review and provide comments to the latest version of the Practical Strategies for Design and Execution of Test and Evaluation of AI Enabled Systems (AIES)
TRACK: Academic & Government Voices at the Forefront of AI T&E
This technical track features independent research perspectives on adversarial AI assurance, mission-centric evaluation design, digital twin readiness, and auditable AI reasoning — advancing the science behind trusted, responsible AI for the warfighter. Abstracts +
Steve Robert Crews II, PhD., Georgia Tech Research Institute
A Digital Twin Maturity Model for Digital Engineering and Test of AI-Enabled Space Systems
Digital twins and virtual proving grounds are rapidly becoming the backbone of digital engineering and model-based systems engineering for complex, AI-enabled systems. In practice, however, “digital twin” still covers everything from a simple playback tool to a high-fidelity, closed-loop mission twin wired into a digital thread. Test and evaluation teams have no consistent way to describe how mature a given twin is for specific uses across the lifecycle, or to decide when it is ready to live inside a government digital environment.
This presentation introduces a Digital Twin Maturity Model (DTMM) aimed at defense space systems and their supporting kill chain. The DTMM defines six dimensions, each with five maturity levels, that characterize how a twin behaves as a digital-engineering asset rather than a one-off model: (1) Twin Functional Capability; (2) Data & Connectivity Integration; (3) Model Fidelity & Realism; (4) Lifecycle Integration & Operations; (5) Verification, Validation, & Trust; and (6) Interoperability with Government-Controlled Environments. Each cell is defined in observable, DE-friendly terms such as sim-ready deliverables, authoritative data sources, participation in virtual ranges, VV&A artifacts, and links to MBSE models.
The model is anchored in current DoD and USSF digital-engineering and M&S guidance on digital twins, authoritative data, and VV&A expectations, and informed by external work such as Digital Twin Consortium capability frameworks and industry virtual-proving-ground practices. It is designed to align with the forum’s Digital Engineering (DE) and MBSE focus on using digital twins, high-fidelity models, virtual environments, and trusted data sources to accelerate testing, improve quality, and reduce lifecycle costs. The talk will (1) walk through the six dimensions and five levels per dimension, emphasizing how they map to common DE/MBSE artifacts—system models, environment and threat models, digital threads, and test data pipelines; (2) illustrate scoring examples for AI-enabled orbital warfare scenarios, showing what “Level 2 vs. Level 4” looks like in practice for autonomy-on-orbit, AI-assisted C2, and synthetic-data generation; and (3) demonstrate how the DTMM can be used pragmatically in T&E planning and acquisition language—for example, to set minimum maturity thresholds for using a twin in AI-in-the-loop testing, to prioritize model improvements that unlock reuse across design, test, and training, and to compare alternative solutions on a common, transparent scale. By treating maturity as a structured, multi-dimensional property of digital twins and their digital-engineering context, this work offers the community a practical tool for deciding when and how to rely on twins in the AI-intensive test campaigns that DE and MBSE are enabling.
Dr. Rachel Brower-Sinning, Carnegie Mellon Software Engineering Institute
Using MLTE to Support Integrated T&E for ML-Enabled Systems
Delays in fielding of systems in the DoW is a known issue, with problems found during developmental (DT) and operational test (OT) noted as causes, and integration of machine learning (ML) capabilities in DoW systems expected to further increase these delays. Current practice for testing ML capabilities during development is largely limited to testing model properties, such as model performance, without consideration of mission and system requirements. This can lead to failures in model integration, deployment, and operations. Discovery of problems attributed to ML capabilities in a system context in OT is problematic as fixing the problem might require additional data collection and retraining, further delaying fielding. Delays may be exacerbated because test and evaluation (T&E) organizations may be segregated: OT organizations work independently from DT organizations which can lead to uninformed and inefficient testing; and model developers doing contractor testing (CT) may not have access to mission and system requirements and therefore fail to adequately address the real-world operational environment. Integrated test and evaluation (T&E) strives to bring DT and OT earlier in the testing process, ensuring that mission and system requirements are considered during development, to minimize costly fixes and delays.
MLTE (ML Test and Evaluation) is a process and tool that enables negotiation, specification, and testing of an ML component’s functional and non-functional requirements. Designed to support integrated T&E efforts, MLTE produces evidence of testing activities that can be shared with model acquirers and integrators to inform integration and T&E activities from CT to DT to OT, thus enabling traceability of requirements, data, and test results throughout the T&E process.
MLTE integrates a quality model that defines ML component quality through a set of characteristics that correspond to quality attributes (QAs), which are measurable or testable properties of a component that are used to indicate how well the component satisfies its system-derived requirements beyond the basic function of the component. The quality model serves as a guide for requirements elicitation and negotiation, providing a common vocabulary to specify system-derived requirements and focus testing efforts. Evaluations of MLTE in practice show the value of artifacts generated and maintained during the model development and testing process: (1) the Negotiation Card identifies a larger number of relevant model requirements early in the development process; (2) the Test Catalog supports the development and reuse of test code for these validating these requirements, and (3) the Test Code and MLTE Report provide evidence of testing which increases trustworthiness of ML models. MLTE is open-source and available at https://github.com/mlte-team/mlte.
Kelli Esser, PhD., Chief Strategy Officer, Virginia Tech National Security Institute (VTNSI)
A Mission-Centric Approach to AI T&E: Extending Mission Engineering for AI-Enabled Systems
Co-Author: James D. Moreland Jr., PhD., Owner / Principal Engineer, MEI Innovative Solutions, Inc
As artificial intelligence (AI) becomes increasingly embedded in defense systems, confidence in AI-enabled capabilities can no longer be established through component- or model-level testing alone. Current acquisition, test, and governance practices often assess AI performance in isolation, disconnected from the mission context, system-of-systems (SoS) integration effects, and operational uncertainty that ultimately determine mission success. This gap presents a growing challenge for test and evaluation (T&E) organizations, program managers, and senior leaders responsible for ensuring the safe, responsible, and effective fielding of AI-enabled systems. This presentation introduces a Mission Engineering & Integration for AI-Enabled Systems (MEI-AIES) Framework that extends established Department of Defense (DoD) mission engineering principles to address the unique characteristics of AI-enabled capabilities. MEI-AIES re-centers AI T&E and assurance on mission outcomes rather than isolated technical metrics, providing a structured approach to defining, measuring, and governing AI performance across the lifecycle. The framework is aligned with emerging DoD policies and guidance that emphasize SoS thinking, continuous assurance, and mission-focused evaluation for advanced and autonomous systems.
At the core of the MEI-AIES framework is a distinction between (1) stable, mission-anchored definitions of performance and (2) cross-cutting measures used to evaluate performance as AI-enabled capabilities evolve in autonomy, integration, and operational complexity. Stable performance definitions ensure that mission intent remains constant over time, even as AI models adapt, are retrained, or are deployed in new contexts. Cross-cutting measures address how performance, behavior, dependencies, and uncertainty are assessed as capabilities mature and interact with other systems and human decision-makers.
The presentation illustrates the MEI-AIES Framework using a representative intelligence, surveillance, and reconnaissance (ISR) use case structured across three illustrative tiers: passive single-domain analytics, passive multi-domain integration, and active cross-domain autonomy. These tiers demonstrate how AI-enabled capabilities deliver increasing mission value – such as improved situational awareness, accelerated decision timelines, and adaptive tasking – while simultaneously introducing new challenges for traceability, assurance, and uncertainty management. Importantly, the framework highlights how uncertainty evolves in character, not just magnitude, as AI capabilities move from assistive roles to autonomous mission execution under human supervision.
A central contribution of MEI-AIES Framework is its explicit linkage of Measures of Performance (MoP), Measures of Effectiveness (MoE), and Measures of Success (MoS) across tactical, operational, and strategic levels of warfare. This traceability enables T&E practitioners and decision-makers to understand how local variations in AI performance propagate to mission outcomes and strategic risk—supporting more informed decisions about deployment, integration, and governance. The framework also reinforces the need for continuous, lifecycle-based assurance rather than one-time certification events, consistent with best practices emerging across DoD AI policy and guidance. By providing a disciplined, mission-centric analytic structure, MEI-AIES offers a practical path forward for integrating AI T&E with broader DoD mission engineering, acquisition, and governance processes. The framework is intended to support early pilot applications, experimentation, and policy-aligned implementation—helping organizations move beyond technology-centric evaluation toward trusted, mission-effective employment of AI-enabled systems.
Josef B. Schaff, DSc., Chief Scientist, Cyber Dominance Group (A4J), Non-Kinetic Warfare Branch, Johns Hopkins Applied Physics Lab
Current System-of Systems (SoS) are increasing in complexity, requiring advances in testing methodology. Some SoS dynamically adapt to environmental changes, either requiring algorithms that are not fully predictable, or nonlinear control feedback loops. To test such SoS requires systems that can adapt or predict the upcoming states of these systems. Some of these require the use of Machine Learning (ML) to effectively “learn” the behaviors of such systems, thus making the testing faster, better (more comprehensive), and cheaper. I can discuss a suite of algorithms developed for predicting system destabilizations, and their utility in testing both edge-cases / constraint parameters, as well as discovering the overall system’s performance.
POSTER PRESENTATIONS
We received many outstanding abstracts through our Call for Papers. With space available for only one track of presentations, we invited several highly qualified authors to consider presenting their work as poster papers. We are delighted with the strong response and can confidently say these poster submissions represent exceptional technical quality.
We encourage you to review the poster abstracts in advance and make a point to visit with the authors during the event. Their work reflects significant expertise and innovation, and your engagement will make these sessions even more valuable for everyone involved.
Assuring Frontier AI Under Operational Constraints: A Behavior-Based Approach to Scalable AI Evaluation
Author:
Misty Blowers, Ph.D., Datalytica
The rapid operational adoption of frontier generative and agentic AI systems, driven by the U.S. Department of Defense’s commercial-first strategy under the Chief Digital and Artificial Intelligence Office (CDAO), introduces a critical assurance challenge: how to evaluate the reliability and mission suitability of advanced AI capabilities deployed under constrained-access conditions. As commercially developed AI models are integrated into operational, intelligence, and enterprise workflows, traditional evaluation approaches—largely focused on nominal performance or static compliance testing—are insufficient to characterize system behavior under realistic operational stress and adversarial interaction.
This paper presents REVEAL, an interaction-driven AI assurance and red-teaming framework designed to assess AI-enabled and agentic systems without reliance on internal model access or proprietary details. The approach emphasizes structured interaction with deployed systems to characterize externally observable behavior under a range of adversarial and operational conditions, producing reusable behavioral artifacts that support comparative assessment and benchmarking. Rather than attempting exhaustive verification or internal inspection, the framework is designed to scale alongside commercial AI adoption and evolving tradecraft, while remaining compatible with bounded assurance techniques where formal reasoning is appropriate. Aligned with CDAO’s objective to accelerate AI adoption at the speed of operational necessity, this work outlines a practical assurance paradigm that enables early, repeatable evaluation of frontier AI systems while preserving intellectual property, supporting governance, and maintaining operational relevance.
UNIFIED: A Multi Regime AI Framework for Predictive C5ISRT; Unified Nonlinear Inference Framework for Integrated Evaluation and Dynamics
Author:
William Alexander Reed, Astrion
Modern defense systems increasingly rely on AI‑enabled architectures to interpret heterogeneous data, maintain situational awareness, and predict threat behavior across multiple physical regimes. Yet most Test and Evaluation (T&E) frameworks still treat atmospheric flight, orbital motion, re‑entry, and cis‑lunar dynamics as isolated domains, limiting the ability of AI systems to reason coherently across regime boundaries. This work presents UNIFIED, an AI‑enabled systems framework that uses a generalized inference engine to integrate multi‑regime dynamics, multi‑INT sensing, and maneuver‑centric uncertainty into a single predictive architecture for C5ISRT applications.
At the core of the framework is a regime‑agnostic inference engine that maintains a unified state and parameter representation for maneuvering objects. Instead of relying on separate models for each domain, the system uses a shared state structure augmented with higher‑order dynamics to capture thrust ramping, jerk‑limited maneuvers, and uncertain mass depletion. Latent physical parameters—such as drag, lift, reference area, specific impulse, thrust limits, and maneuver budget—are treated as uncertain variables with bounded priors. This allows the inference engine to reason jointly about platform capability, intent, and feasible future trajectories, even when observations are sparse or intermittent.
A multi‑layer sensing model provides a consistent measurement interface for EO/IR, radar, RF/SIGINT, and space‑based sensors. By mapping all sensor outputs into a common likelihood structure, the framework enables AI‑enabled systems to fuse heterogeneous data streams without regime‑specific logic. Regime‑dependent dynamics models—including atmospheric flight equations, orbital mechanics, re‑entry drag models, and CR3BP formulations—are encapsulated behind the inference engine, allowing AI algorithms to maintain continuity as objects transition between regimes.
To support T&E activities, the framework introduces a maneuver‑centric initialization strategy based on three canonical epochs (pre‑maneuver, maneuver, post‑maneuver). This structure enables evaluation of AI inference systems under uncertain maneuver timing, incomplete observability, and variable sensing conditions. A predictive reachable‑set module further supports threat envelope analysis, maneuver feasibility assessment, and cross‑regime intent estimation.
The primary contribution of this work is a non‑commercial, technically rigorous AI‑enabled systems framework that unifies sensing, dynamics, and inference for multi‑regime C5ISRT. By providing a consistent mathematical structure, a regime‑agnostic inference engine, and a maneuver‑aware initialization strategy, UNIFIED enables government, industry, and academic practitioners to conduct more realistic, repeatable, and analytically grounded Test and Evaluation of AI‑enabled sensing, tracking, and prediction systems.
Decision Assurance for AI-Enabled Mission Systems: From Test Evidence to Operational Authority
Author:
Eman Kawas, Independent Advisor
AI-enabled systems are increasingly integrated into safety-critical and security-critical mission environments. Yet, traditional verification, validation, and test and evaluation (T&E) approaches often struggle to produce decision-relevant assurance for complex and adaptive behaviors. In many programs, testing generates model performance metrics or system demonstrations but does not meaningfully constrain or inform the operational decisions that govern certification, authority, and deployment. For AI-enabled complex systems, assurance must evolve beyond retrospective validation toward faster learning cycles, shifting V&V left, building robust feedback loops, and establishing smarter decision gates so deployments are risk-aware, value-driven, and executed with “eyes wide open.”
This paper proposes a decision assurance framework for integrating and validating AI-enabled mission systems across the lifecycle. The approach focuses on aligning V&V and T&E evidence to the specific decisions that matter in complex systems: entry and exit criteria for autonomy, confidence thresholds for mission use, escalation conditions for human override, and governance boundaries for operational authority. The methodology combines three elements. First, decision-centered test design defines test objectives in terms of operational decisions rather than model outputs alone. Second, lifecycle confidence thresholds are established to quantify “sufficient confidence” relative to mission risk, uncertainty, and consequence. Third, digital engineering infrastructure, including high-fidelity digital twins, simulation environments, and learning-based operational scenarios, is used to accelerate coverage while preserving traceability between test evidence, system behavior, and mission context.
The paper further addresses the tension between speed and assurance in AI fielding. As programs push toward rapid iteration, decision assurance provides structured guardrails that enable decision velocity without eroding lifecycle confidence. Rather than treating testing as retrospective validation, the framework positions T&E as an active mechanism for shaping and governing AI-enabled capability insertion. The expected contribution is a practical, non-commercial structure for connecting AI T&E outcomes to operational authority and mission impact. By translating test evidence into decision confidence, the approach supports more rigorous integration, clearer certification pathways, and sustained decision advantage for the warfighter in increasingly complex AI-enabled systems.
Systems Engineering Command Center (SECC): An AI-Powered Assurance Agent for Validating Complex Defense System Integrations
Author:
Mehran Irdmousa, MZI Aviation
Modern defense acquisition programs face unprecedented integration complexity. When multiple vendors simultaneously develop interdependent systems such as radars, communications networks, automation platforms, and command-and-control infrastructure. The number of potential interface conflicts grows exponentially. Traditional systems engineering verification methods, relying on manual document reviews that typically cover only 10-15% of artifacts, cannot scale to meet compressed timelines demanded by initiatives like the Department of Defense’s Combined Joint All-Domain Command and Control (CJADC2) or rapid acquisition authorities. This paper introduces the Systems Engineering Command Center (SECC), an AI-powered assurance agent that continuously validates requirements, architectures, and interface specifications across distributed vendor ecosystems before they inform digital twin simulations or production decisions. Unlike conventional requirements management tools that validate within a single source, SECC operates at the system-of-systems level, ingesting artifacts from multiple vendors, formats, and tools, including legacy PDFs, Model-based Systems Engineering (MBSE) models, Interface Control Documents (ICDs), and requirements databases while validating them collectively for consistency, completeness, and compliance.
Methodology:SECC integrates three core AI technologies: (1) vector databases enabling semantic search across requirement statements, (2) transformer-based language models for conflict detection and ambiguity identification using Retrieval-Augmented Generation (RAG), and (3) a Multi-Attribute Utility Theory (MAUT) scoring framework that computes system health based on issue severity and category. The architecture classifies findings across four categories: syntax, traceability, semantic conflicts, and cybersecurity gaps, with severity-weighted scoring that provides engineers with actionable, risk-prioritized insights through an interactive dashboard.
Validation and Results:SECC was developed and validated through a George Mason University SEOR graduate capstone project (Fall 2025) sponsored by MZI Aviation. The research team built a functional prototype and tested it against synthetic systems engineering artifact sets, aviation, drone, and smart home domains, each seeded with known inconsistencies across ConOps, requirements specifications, and design documents. Three LLM backends were evaluated (Llama 3.2, Qwen3, GPT-5 mini), with GPT-5 mini demonstrating superior detection of semantic conflicts and traceability gaps. Monte Carlo simulation modeling projected an 89% cost reduction compared with manual review processes. The prototype demonstrated cross-document conflict detection, automated severity classification, and real-time health scoring, capabilities directly applicable to defense programs managing hundreds of interface documents across distributed vendor teams.
Expected Contribution: SECC addresses a critical gap in digital engineering pipelines: ensuring that digital twins and simulations are built on verified, consistent data rather than propagating hidden conflicts at simulation speed. By positioning SECC as a validation gateway within the digital thread, defense programs can achieve higher confidence in integration outcomes and reduce costly late-stage discoveries. The methodology applies across DoD, NASA, and federal civilian agencies executing complex multi-vendor modernization efforts.
Human Oversight in AI-Generated Test Artifacts: Implications for Independent Verification and Validation
Author:
Andrew Pollner, American Software Testing Qualifications Board (ASTQB)Generative AI is rapidly reshaping software engineering, and software testing is no exception. Tools now generate test conditions, test cases, and test data directly from user stories, requirements, APIs, and even source code. Organizations report dramatic productivity gains: faster draft creation, improved formatting consistency, and the ability to scale test artifact production across large backlogs. However, a critical question remains: does AI-generated testing inherently improve software quality — or merely accelerate the testing process? Generative AI operates through probabilistic pattern recognition rather than contextual reasoning or risk awareness. While it can efficiently produce plausible test cases, it does not inherently understand domain constraints, regulatory implications, architectural dependencies, or the nuanced tradeoffs between coverage depth and mission risk. As a result, AI-generated artifacts often emphasize straightforward “happy path” scenarios while underrepresenting boundary conditions, negative testing, state transitions, and non-functional risks unless explicitly prompted. This shift reframes the tester’s role. Instead of primarily creating test cases, testers increasingly review, evaluate, and refine AI-generated outputs.
They must:•Identify gaps in coverage•Detect ambiguity or logical inconsistency•Apply structured test design techniques•Align testing depth with risk exposure•Ensure traceability and meaningful coverage metricsIn this new paradigm, foundational testing knowledge becomes more—not less—important. Techniques such as boundary value analysis, equivalence partitioning, decision tables, state transition modeling, and risk-based testing are essential for both crafting effective prompts and validating AI results. Without these competencies, teams risk deploying superficially complete test suites that provide false confidence.
The central argument presented here is that AI does not replace testing expertise; it amplifies the need for it. Organizations that treat AI as a shortcut may experience increased velocity without corresponding improvements in quality. In contrast, organizations that combine generative AI with disciplined testing knowledge can significantly enhance both efficiency and effectiveness.
The future of testing is not “AI vs. testers.” It is AI-augmented testers who possess deep professional skills and can strategically govern intelligent tooling. The competitive advantage will belong to teams that understand how to integrate AI into a robust quality engineering framework rather than simply automating artifact creation.
Generation, Verification and Validation of Synthetic Data for Aviation Security Equipment
Author:
Duane Karns, PhD., Computation Physics and Synthetic Data Lead, US Department of Homeland Security
The mission of the Transportation Security Laboratory in the Department of Homeland Security plays a critical role in national aviation transportation security by enhancing and validating solutions for detecting and mitigating concealed threats. This mission guides the development of detection technologies to effectively meet the requirements of government stakeholders in support of the homeland security enterprise.
Vendors of detection technologies are increasingly utilizing artificial intelligence models to replace traditional engineered automated target detection algorithms. These automated target recognition artificial intelligence models are steadily more complex and require substantial amounts of data for training, testing, and certification. Meeting these data demands with real data alone is becoming more challenging, leading to the use of synthetic data to supplement real data. The ability to create synthetic data in support of training, testing and certification is essential to the Test and Evaluation mission at the Transportation Security Laboratory, and the mandates outlined in the Aviation and Transportation Security Act.
In the current age of big data, high performance computational tools are commercially available and necessary for organizations to gain valuable insight into their data of interest. Specifically, in this case of Test and Evaluation data generated by the Transportation Security Laboratory. Powerful hardware, optimized for processing huge volumes of information as well as enabling the generation of synthetic data and multi-physics modeling, are essential to advancing the state-of-the-art of Governmental Test and Evaluation techniques, and is essential for the Transportation Security Laboratory to meet the needs of its stakeholders.
The development of synthetic data for both X-ray CT and AIT systems is a priority for the Transportation Security Laboratory to meet the challenges of developing and testing artificial intelligence based automated target recognition models. This presentation describes the current work at the Transportation Security Laboratory to develop synthetic data for both X-ray CT and AIT systems. Synthetic data generation techniques include full physics models combined with threat insertion techniques and dual energy decomposition augmentation. Verification and validation methodologies will be discussed, as well. These will include the use of autoencoders for synthetic data verification, and the probability of detection and false alarms for validation.
An AI-Enabled Framework for Advancing Test and Validation Across the Systems Engineering Lifecycle
Author:
Muhammad F. Islam, Ph.D. MITRE and George Washington University; Co-Authors:
Tomi Esho, Ph.D. and
Jyotirmay Gadewadikar, PhD. MITRE
Modern systems engineering efforts often have to deliver detailed test and validation outcomes under strict timelines with limited resources. This can result in insufficient testing efforts, release delays, production issues or even critical failures after launch. This research proposes an AI-enabled framework that automates requirements-based test case generation while keeping human test engineers in the decision loop. At first, a set of requirements is provided to a large language model (LLM)-based AI framework and to test engineering subject matter experts, each of whom generates test cases independently. The resulting test cases are then evaluated blindly using a common set of criteria. The test cases are compared for traceability to requirements, reproducibility, objectivity of expected results, precision of test data, and robustness under boundary and non-standard conditions. This framework enables a comprehensive assessment of how AI-generated tests perform relative to human-developed tests. The proposed framework aims to enable test engineers performing rapid test and validation activities, while also providing empirical evidence on where it is effective and where it can be improved. This research seeks to reduce manual testing efforts, enable rapid delivery, and minimize failed test efforts for mission-critical systems in defense, aerospace, and other high-assurance domains.
Integrating AI-Based Computer Vision into Military Test & Evaluation
Author:
Steve Seiden, Acquired Data Solutions Inc. .
The Department of War (DOW) is increasingly evaluating artificial intelligence as an enabling capability within complex weapon, vehicle, and sustainment systems. In the field of vehicle maintenance, most technicians must perform manual visual inspection to identify and diagnose instances of wear, corrosion, or damage. Manually identifying issues requires both time and experience on the part of the maintainer, and diagnostic errors or repair delays may affect equipment availability and readiness. Augmenting historically manual inspections with AI-based computer vision represents a means to enhance the accuracy, reliability, and repeatability of vehicle wear analysis and damage detection, while increasing maintainer productivity by reducing the time required to identify and diagnose issues.
This abstract presents a technical approach for incorporating AI-enabled perception systems into military Test & Evaluation (T&E) environments, using an AI-based computer vision application for ground vehicle tread wear analysis as an illustrative example (through a Cooperative Research and Development Agreement (CRADA) with Devcon at Picatinny Arsenal). While the system discussed focuses on track pad analysis, the same AI-based computer vision approach may be applied to a wide range of visual assessment driven maintenance processes, including powertrain wear analysis, corrosion detection, and damage assessment.
This presentation will discuss a repeatable framework for developing, integrating, and evaluating AI-enabled computer vision systems within military T&E pipelines. By treating AI capabilities as measurable, instrumented components of a test architecture, the approach supports defensible assessment of autonomy-adjacent technologies, AI-assisted maintenance, and future human-machine teaming concepts. This framework provides a pathway for transitioning AI from experimental capability to testable, certifiable military systems
Audience Benefit: Attendees will gain practical insight into how AI-enabled computer vision can be structured, measured, and validated within existing military T&E processes. The presentation equips T&E professionals, engineers, and program stakeholders with a defensible framework for evaluating AI-assisted maintenance technologies, reducing risk in AI adoption, improving test repeatability, and accelerating the transition of AI capabilities from prototype to operationally relevant systems that directly impact readiness and sustainment outcomes.
Toward an Integrated T&E Framework for AI-enabled Systems: A Conceptual Model
Author:
Karen O’Brien, Modern Technology Solutions, Inc.
The classic DoW T&E paradigm (Operational Effectiveness-Suitability-Survivability-Safety) benefits from 40 years of formalism and refinement and has produced numerous specialized testing disciplines (e.g., the -ilities), governing regulations, and rigorous analysis procedures. T&E of AI-enabled systems is still new, and the laboratory of ideas is a constant source of new testing options, the majority of which focus on performance. The classic gap between measures of performance and measures of effectiveness is very much present in AI-enabled systems and the over emphasis on performance testing means we might miss the (effectiveness) forest for the (performance) trees. Stepping back from confusion matrix performance metrics reveals a much larger landscape of evaluation issues that derive from various policy sources — issues that are at risk of being overlooked. Issues like safety, transparency, robustnessBorrowing from the classic “integrated survivability onion” conceptual model that made T&E within Systems of Systems tractable, we propose a set of integrated and nested evaluation questions for AI-enabled systems that covers the full range of classic T&E considerations, plus a few that are unique to AI technologies within the military operational environment and implied by the DoW Responsible AI Guidance. All requirements for rigorous analytical and statistical techniques are preserved and new opportunities to apply test science are identified. We hope to prompt an exchange of ideas that moves the community toward filling significant T&E capability gaps – especially the gaps between performance and effectiveness – and advancing test activities across the nested hierarchy.
Bridging MBSE and Test Execution Using Digital Twins for Early Test and Evaluation Performance Assurance
Author:
Emily Mills, Ph.D., Deputy Director RDT&E, Design Interactive, and Co-Authors
Dillon Gilbert and
Lane Odom, et.al, Intuitive Research and Technology Corporation
Modern test and evaluation (T&E) environments face mounting challenges as autonomous, AI-enabled systems increase in complexity and operational risk. Traditional test processes cannot provide the timely, actionable insights needed to validate system performance and assure mission readiness before deployment. Despite mandates for Department of War organizations to adopt Model-Based Systems Engineering (MBSE), practitioners struggle to transform MBSE artifacts into operational T&E value. Models often remain static documentation rather than living, authoritative sources that drive simulation, test planning, and risk-informed decision-making. This disconnect between model creation and downstream utility limits digital engineering impact and leaves T&E communities unable to validate expected behaviors or provide early performance assurance across operational contexts.A new digital pipeline is presented which demonstrates the art of the possible with respect to MBSE models, digital twin assets, and virtual range operations. The framework transforms MBSE artifacts into executable simulation assets, allowing modelers, planners, analysts, and safety engineers to validate system designs, verify requirements, and assure operational readiness before committing physical resources. By integrating model-driven scenario construction with virtual range environments, the ecosystem provides repeatable validation workflows that identify performance gaps, assess mission-critical vulnerabilities, and assure compliance with safety and operational constraints throughout the testing lifecycle.The framework comprises several integrated components: (1) automated MBSE model ingestion that translates standardized system representations into validated simulation assets, (2) scenario development and orchestration tools that support both MBSE-driven and human authored scenario definitions, (3) a data integration and analytics layer capable of synchronizing geospatial, environmental, sensor, and test relevant data, (4) a simulation integration layer (e.g., connecting to AFSIM, JSE, Ansys), and (5) a visualization environment for executing, analyzing, and reviewing simulated and real test outcomes. Embedded analytics provide interpretable assessments of risk, coverage, and cost, while establishing a foundation for advanced capabilities such as AI-enabled anomaly detection, behavior deviation scoring, optimization, explainability indicators, and verification of decision pathways.The technical implementation addresses critical operational considerations including data provenance for validation traceability, data schema alignment for consistent performance metrics, automated verification workflows, and enterprise integration patterns. Advanced capabilities will enable AI-driven anomaly detection for validating edge-case behaviors, deviation scoring to assure alignment with expected performance envelopes, and explainability indicators to build trust.This framework demonstrates how MBSE can evolve from documentation to become the authoritative foundation for scalable, continuous validation and performance assurance. By connecting digital engineering artifacts directly to executable test environments, organizations gain earlier verification of system behaviors, quantifiable assurance evidence, and reduced risk in operational deployment. The open architecture foundation supports integration of AI-enabled validation tools while maintaining human-centered assurance workflows, positioning T&E communities to confidently validate and assure increasingly complex autonomous systems before mission-critical deployment.
VENUE & HOTEL ACCOMODATIONS
Location
HELIX, Booz Allen’s Center of Innovation
901 15th St NW, Washington, DC 20005
The program will be held on the 1st floor.
Parking
We strongly encourage you to avoid parking at The Helix. Parking at the office building is limited and extremely difficult to navigate. We would urge you to Uber/Lyft/Taxi or take the Metro (The Helix is 1 block from McPherson Square station and Farragut North station is a short walk (0.4 miles). If you decide to drive, please note the IMAPRK Garage is located at the Helix address and will charge a daily fee and closes at 7PM.
Hotel Accommodations
Hotel accommodations should be made on your own to fit your budget.
Timeline – Guide
We expect the following: 8:00 AM Registration Opens | 9:00 AM Program Begins | Tuesday Networking Reception | Wednesday Program Concludes at 5:00 PM – Subject to change
SPONSOR
Sponsorship opportunities are available to highlight your organization before and during the event. From Small Businesses with 10 or less employees to our biggest industry leaders, our price point and return on investment will be just what you need to succeed to gaining visibility within the T&E community.
Levels of Sponsorships
$2,500 | $1,000 | $500
Lunch and Reception $3000
Breaks $1,500
Application and Benefits
Questions? Contact Jenna Reza [jenna@itea.org]
REGISTRATION PRICING
ITEA Member and Full-time Government and Active-Duty Military $350
ITEA Non-Member $450
Category: Speaker/Presenter/Participant $250
Early Career Professional (< 5 years T&E) $350
Full-time Student $95