DECEMBER 2025 I Volume 46, Issue 4

Graph illustrating the concept of Confidence-Based Skip-Lot Sampling in test and evaluation

Advancing DOD Test & Evaluation Through a System Profile

Jake H. Kurzhals

Jake H. Kurzhals

Captain, USAF. Flight Test Engineer, 28 TES, Eglin AFB, FL

John M. Colombi, Ph.D.

John M. Colombi, Ph.D.

Professor of Systems Engineering, Air Force Institute of Technology, Wright-Patterson AFB OH

David R. Jacques, Ph.D.

David R. Jacques, Ph.D.

Professor of Systems Engineering, Air Force Institute of Technology, Wright-Patterson AFB OH

Jordan L. Stern

Jordan L. Stern

Major, USSF. Assistant Professor of Systems Engineering, Air Force Institute of Technology, Wright-Patterson AFB OH

DOI: 10.61278/itea.46.4.1003

Abstract

The Department of Defense (DoD) relies on rigorous Test and Evaluation (T&E) processes to ensure that defense systems meet operational requirements. However, traditional T&E remains largely document-centric, making it difficult to trace technical measures, system requirements, test plans, and test results across Developmental Test & Evaluation (DT&E) and Operational Test & Evaluation (OT&E). While Model-Based Systems Engineering (MBSE) has been applied to system design, its use in integrated testing remains limited. This research addresses this gap by developing a parsimonious T&E profile and schema. The work identified common T&E elements from key DoD test documents and policy, then incorporated them into a System Modeling Language (SysML) profile. The profile was applied to an academic Unmanned Air System (UAS) flight test program to assess its effectiveness in capturing, organizing, and analyzing these digital test models. Results demonstrated that MBSE can provide a centralized, authoritative source of truth, enhance traceability between system requirements and test results, and support all identified use cases for test planning and execution.

Keywords: Digital Engineering (DE), Test Planning, SysML, T&E Profile, Model-based Testing

Introduction

Testing and evaluation (T&E) is a critical part of the acquisition process ensuring that systems meet performance, suitability, and mission requirements. Testing includes the verification and validation of systems throughout the lifecycle from early developmental testing to later operational testing. The fundamental purpose of T&E is to enable the DoD to acquire systems that support the warfighter in accomplishing their mission (OSD, 2020b).

Digital Engineering (DE) and Model-Based Systems Engineering (MBSE) is the formalized application of modeling to support system requirements, design, analysis, verification, and validation activities beginning in the conceptual design phase and continuing throughout development and later life cycle phases (INCOSE, 2023). Compared to document-based traditional approaches, MBSE proports to improve communication, collaboration, and decision-making throughout the system development lifecycle (Henderson & Salado, 2020). A model can store, organize, trace, and communicate stakeholder needs and system requirements. Additionally, MBSE can be used to conduct functional decomposition of behaviors and structure, identify physical components, and run trade studies and other analytics, all tracing back to requirements. The use of MBSE for testing can aid in requirements management/traceability, develop and relate scenarios, test planning and execution, data analysis/trade studies, and more (Friedenthal et al., 2015a).

The Department of Defense (DOD) has been transitioning to digital engineering since 2018, and more recently in acquisition instructions DoDI 5000.88, DoDI 5000.97 (OSD 2018; OSD 2020a; OSD 2023). Digital Material Management (DMM) has further broadened DE beyond just the use of models for engineering, but to all acquisition functionals, including test (DAF 2023). Recent applications of digital techniques have focused on creating a model-based Test and Evaluation Master Plan (TEMP), using reference architectures and ontologies, and integrating existing profiles. To further address this transition, this article focuses on a reduced-order profile to capture, integrate and trace test planning and execution for specific use cases.

Background – Integrated Testing across the Lifecycle

Many DoD programs experience timeline and cost increases because technical risks are not addressed early through prototyping. As a result, early operational testing often identifies issues too late in the process. Long program development cycles frequently result from the pursuit of highly ambitious technical capabilities combined with a program management framework lacking appropriate mechanisms for identifying and reducing technical risk (Van Atta, 2013). Test and Evaluation activities are designed to provide information that reduces the uncertainty about system performance, effectiveness, and suitability, which reduces the technical risk (Bjorkman et al., 2013). However, T&E across the DOD faces several challenges to accomplish its mission as more complicated weapon systems require testing.

These challenges include the increasing pace of T&E which demands greater integrated testing, managing multiple stakeholders’ requirements for the integrated testing, increasing the usage of modeling and simulation to inform system development and test plans, and data management to handle the increasing amount of data collected during test (Pool, 2021). The DoD must change its T&E approach to respond to these challenges. The Defense Science Board (2024) identified digital engineering as a key enabler to the much-needed approach.

The DOD executes DT&E to manage and reduce risks during development, verify that products are compliant with contractual and technical requirements, prepare for OT, and inform decision-makers throughout the program life cycle. (OSD 2020b). DT&E results verify exit criteria to ensure adequate progress is made in the program before further investment commitments or the initiation of the next phases commences. On the other hand, OT&E tests systems and equipment under realistic conditions to assess these items’ abilities to meet user needs (USC 2024). Since OT&E occurs late in the lifecycle, the OT&E community must often justify its test events and test points trace to mission requirements and are nonduplicative to related DT&E activities. There has been a concerted effort to save time through integrated (DT&E/OT&E) testing.

The latest concept of integrated testing is termed Test and Evaluation as a Continuum (TEaaC); it strives to involve OT&E earlier in the systems lifecycle as it combines mission engineering (ME), systems engineering (SE), and test and evaluation (T&E) into parallel, collaborative, and combined efforts. The idea is that by increasing collaboration amongst these three fields there will be an increase in common understanding; especially when it comes to design iterations and testing activities needed to validate mission requirements. For example, mission threads and mission engineering threads identified during early ME could be captured in the system model and used for DT and OT planning (OUSD, 2023). The intent is that this process will lead to early and consistent focus on achieving the required mission performance, enabled through digital system models (Collins & Senechal, 2023).

Research has been conducted by applying or creating models for verification, validation and T&E for several years (Bjorkman, Sarkani & Mazzuchi, 2013). Cook & Schindel (2017) applied system patterns for verification, while Morkevicius et al. (2023) used just SysML without any test-related extensions. Walker & Borky (2020) incorporated MBSE into the test process to promote the organization and structure of test artifacts. Odom et al. (2021) created a profile extending the SysML language to capture relevant test artifacts and technical measures using DoD terminology but built on top of the Unified Test Profile2 (UTP) (OMG, 2022). Recent developments by Arndt et al. (2023, 2025) have focused on a model-based Test and Evaluation Master Plan (TEMP) reference architecture (MB-TEMP-RA) to guide program-specific test modeling. The reference architecture addresses a variety of stakeholders and their needs incorporating several related test modeling efforts. Their approach is guided by “compatibility with as many of these efforts as reasonably possible” including existing meta-models, ontological representations, and compatibility with UTP2 and the Unified Architecture Framework (UAF) (Arndt et al., 2025; Gregory & Salado,2024). This type of encompassing meta-model can address many more scenarios for larger, more complex programs of record. UTP 2.1 alone has 126 elements and 79 relationships. There is a gap in the research to demonstrate both real and simulated test results, use DoD test planning terms, and apply a simple model-based test planning schema. Thus, this article focuses on a small set of test-related use cases (scenarios) with minimal SysML extensions using DoD terminology.

Methodology – A T&E Profile

This section describes how the T&E profile and schema were created in SysML to support and enhance typical DoD Test and Evaluation products, processes, and strategies. First, a use case diagram and specifications were generated to outline what any MBSE effort needs to achieve to provide value to users. Figure 1 displays the use case diagram. The T&E Model is intended to integrate with (extend or use) a System Model. As indicated by its name, a System Model is developed to design the system, capture requirements, system behavior and functionality, structure, interfaces, and possibly parametric performance estimates. The T&E model leverages this design information to focus test planning during the design phase.

A profile extends a reference metamodel with new stereotypes (Friedenthal, Moore, and Steiner 2015b). The schema provides a simple framework for utilizing the T&E profile elements and enables users to best execute their test-related use scenarios. To develop this testing profile and schema, previously generated profiles were examined, including the UML Testing Profile (UTP2) by the Object Management Group (OMG), the DoD testing profile (UTP-D) by (Odom et al., 2023), and the Deloitte Test Framework. This T&E profile aims to be a simplified version of these other digital meta-models, which can be complex to understand and implement. Furthermore, this test profile employs the same terminology used in test policy and instruction, facilitating easier adoption by the community.

Figure 1. DOD T&E Use Cases

DOD Modeling Elements

The Test and Evaluation Master Plan (TEMP) is a document that describes the overall structure and objectives of the T&E program and articulates the necessary resources to accomplish each phase (OSD, 2020b). The common elements from the TEMP and testing community that are captured in the profile are: Critical Operational Issues (COIs), Objectives, Measures of Effectiveness (MOEs), Measures of Performance (MOPs), Measures of Suitability (MOSs), Technical Performance Measurements (TPMs), Key Performance Parameters (KPPs), Key Systems Attributes (KSAs), Critical Technical Parameters (CTPs), Test Requirement, Test Events, Test Cards, and Test Points.

DOD T&E Profile Overview

The T&E profile has two focus areas – Technical Measures, and Test Planning and Design. The Technical Measures section contains the stereotypes to define, assess, and satisfy all measures that guide the test activity. Figure 2 displays the custom stereotypes. Several of these Measures have the same attribute structure. Attributes are the properties (related data) that are stored with each Measure.

The attributes are:

1.Rating
2.Objective
3.Threshold
4.CurrentValue
5.DateUpdated.

The attributes are inspired by Odom’s test profile (Odom, et al., 2023), which has similar attributes for KPPs. The Objective and Threshold attributes are used to express minimum acceptable level of performance (threshold) and desired acceptable level of performance (objective).

The Technical Measures section also contains enumerations containing the ratings for each technical measure, as shown on the bottom of Figure 2. The rating options for COI, Objective, MOE, MOP and MOS are sourced from an Air Force Test and Evaluation Squadron (TES) report scoring standard and use the same color scheme when the legends are applied to diagrams and tables. The TPM Rating was created using an enumerated list containing the following options: Threshold Met, Objective Met, and Threshold Not Met.

The Test and Design section of the T&E profile contains stereotypes (Figure 3) that can capture one or more Test Plans, Test Events, Test Cards, Test Points, Test Observations, Test Requirements, and the Design of Experiments Statistical (STAT) methodology. This section, like the first, uses attributes and creates a few enumerations for Status, Results and Priority.

Figure 2. DoD T&E Profile, Set of Technical Measures and Ratings

Figure 3. DOD T&E Profile, Set of Test Planning and Design Elements

The creation of the Test Plan stereotype allows for all relevant planning to be traced to it through the schema. The Test Event stereotype has attributes that capture aspects of an event’s schedule, location, and status, which are essential for planning. The Test Card has attributes that capture the objective of the test, status, who ran it (test director), comments on the test, which Test Event it is a part of, and which Test Points should be run. The Test Point contains key information such as the date, the configuration of the test article, the status, the result, the response(s), notes, and any test observations. The Test Observation captures such information on an issue observed during testing, the configuration of the test article, who is responsible for fixing the issue, the priority level for the fix, the solution to the issue, the root cause, the fixed configuration, and date closed. Tables 2, 3, 4 and 5 provide examples and interrelations of Test Events, Cards, Points and Observations.

Many DT&E and OT&E organizations use the term Test Requirement. This element can capture constraints on how a test shall be conducted, or various safety, interoperability or cyber constraints of the test. Finally, the STAT Methodology stereotype allows for the various statistical techniques to sample the test space efficiently. Design of Experiments (DOE) is a statistical methodology for planning, conducting, and analyzing a test that is commonly used by the DOD test community. DOE helps optimize test resources while maximizing the quality of insights gained from testing. The Design of Experiments (DOE) is an Element, shown in Figure 3, with the following attributes:

  • Coded Factor,
  • Level, and
  • Block.

To use this modeling artifact, the Design of Experiments (DOE) stereotype is applied to the relevant Test Points in the containment tree, and then its attributes are filled in accordingly. Thus, Test Points are related and justified by the factors and levels using DOE.

T&E Schema Overview

The T&E schema was developed to convey the implementation of the DoD Test Profile. The schema enables the user to satisfy the profile’s use cases and displays some of the benefits of using MBSE for test and evaluation. Figure 4 displays the schema relationships. The schema relationships enable traceability from the technical measures in the test plan to the Tests and then to system requirements. We rely heavily on custom Tables that follow schema relationships.

In practice, the responsible test engineer generates the necessary technical measures and relationships within the model. Test planning follows the creation of technical measures. All technical measures and the system requirements are added to the Test Planning Requirements Traceability V&V Matrix (RTVM). Next, the test engineer captures the Test Events information in a Test Event Traceability table. A Test Event becomes a central element, which aggregates much of the other planning information, such as Test Cards and Test Resources. Test Resources will vary per Test Event; general categories of Test Resources are provided in the profile, as recommended by Test and Evaluation Master Plan (TEMP) policy. Naturally, the size and complexity of the program determine the number of Test Events needed to adequately assess the system.

Figure 4. T&E Schema

Test Cards are easily created and edited in a custom Table. Test Cards typically contain information such as overall test objective, preconditions, postconditions, an abbreviated sequence of steps, expected results, and test points. The Test Cards are related to the objectives of the test plans. In the T&E profile, Test Card is modeled as a type of behavior, i.e., something that is performed or executed. It also includes a test objective, set of preconditions, inputs, and expected results. A Test Card verifies the technical measures and system requirements. The «verify» relation already exists in SysML. A «validate» relationship could easily be created if more of a distinction is desired by the user but was not necessary for this original work.

Factor(s) can be added as additional attributes of a Test Card and can be traced to the appropriate DOE element. Such factors are not shown in Figure 3 since not all Test Cards are driven by DOE. Also, added attributes can be used for any independent test variables. Test Points were modeled as instances of Test Cards. Therefore, any attributes added to Test Cards as factors are now attributes of each Test Point. Filling out these attributes sets the level of Factor(s) to be collected for the test. Test Points can then be shown on a custom Test Point Matrix. The use of Tables and Matrices to show large amounts of data and traceability is common in MBSE, as opposed to showing lots of complex test data graphically on diagrams. Custom tables capture elements, attributes, and follow relationships between elements using simple or meta-chain navigation. The Test Point Matrix is a custom table which is designed to capture key information on Test Points and related data. It is expected that many tables used today in documents, become live views in the model.

The Response column captures the numerical value(s) and/or qualitative response(s) of the Test Points. The Configuration column is meant to record the instance of the system model used for the Test Point and allows for configuration management of test articles. Tracking configurations of test articles is a crucial element for minimizing duplication of effort between DT&E and OT&E. Since OT&E is obligated to use data from production-representative equipment, this table tracks test configurations, and allows the model to refer to a specific system instance. Ideally the system model has versions, configurations or variation captured, which is typical for physical structures. Note: configuration and variability are more explicit in the SysML 2.0 standard, with the ability to capture variations, variants/variation points, and associated constraints. Tracking test article configuration supports decisions on valid use of data across integrated DT/OT.

The Result column reflects the qualitative assessment of the Test Points and can be one of the following enumerated options: Pass, Fail, Inconclusive, and Not Recorded. The Status column records the state of Test Point execution and can be one of the following enumerated options: Complete, In Progress, Not Started, and Canceled. By default, the Result and Status columns are set to Not Recorded and Not Started, respectively.

Finally, any Test Observation(s) trace to a Test Point. Test Observation(s) are issues identified during testing. This creates a prioritized list of items that need to be addressed/tracked and assigns responsibility for issue resolution. The tracking of the Fixed Configuration provides the benefit of having a record of what configuration had resolved the issue in case later configurations see similar issues. Fixes identified during DT, and really any changes to configuration, should be recorded in the system model as an updated configuration – the authoritative source of truth. The Test Observations Tracker table displays progress made to resolve issues throughout testing. The next section of this paper describes the application of the proposed T&E profile to a real-world flight-test planning and execution cycle.

Applying the DOD T&E Profile

The T&E profile’s utility was validated by applying it to a design and flight test project at the Air Force Institute of Technology (AFIT). During a 3-course sequence, students designed, built, and flight tested a small unmanned aerial system (UAS) for building inspection. The developed solution was a COTS-based hex-rotor with camera and gimbal with the resolution and accuracy to sufficiently identify and geo-locate structural flaws and/or damage. Images from the camera were stitched together using photogrammetry and integrated into an ArcGIS database that captures base infrastructure. The course curriculum already included tasks to develop a SysML model based on a small UAS Reference Architecture.The first step in using the proposed T&E schema is to develop technical measures for the test program. Figure 5 displays the technical measures for the AFIT UAS prototype program. This small UAS prototype program had 1 COI, 2 MOEs, 2 MOPs, and 8 TPMs. The Technical Measures BDD is able to represent a typical test report in model form, though most system model tools have powerful report generation to produce highly formatted documents.

Figure 5. AFIT UAS Prototype Program Technical Measures

The technical measures that were created and the relevant system requirements that were already in the system model. Measures are added to the Test Planning Requirements Traceability V&V Matrix (RTVM). Note that the system requirements added to the RTVM are verified/validated (V&V) through test, analysis, or demonstration. See Table 1 for an extract of the full Matrix. The RTVM captures what test card(s) verify each technical measure and system requirement. It can also trace which test points support the rating of the technical measure allowing stakeholders to make informed decisions. RTVM was adapted to display Rating, Objective, Threshold, CurrentValue, and DateUpdated for the appropriate technical measures. The table format allows for a quick and accurate way to display information that is already in the model that is not replicated in paper-based test planning.

The next step is to create and model Test Events, seen in Table 2. The Test Event Traceability table contains three Test Events. The first two rows (AFIT Flight Test and Model and Simulation1 Test Events) are more DT&E focused, while the OT&E Rooftop Flight Test is the final culminating OT&E Test Event.

Table 1. Example of Requirements Traceability Verification Matrix (RTVM)

Table 2. Exemplar of Test Event Traceability Table

Note the last row (Model and Simulation 1 Test Event) captures runs of the system’s model Parametrics. Parametric diagrams and Constraints are defined in SysML to allow simulations and execution of governing equations integrated with system values. Most modern MBSE tools allow execution of mathematics, coding languages or integration with external simulations.

Stakeholders can view this table and see if all the test events have started relative to the planned start date. This exemplar suggests all the testing is going according to the planned schedule for the AFIT UAS prototype program. Additionally, the Test Resources required for each Test Event are captured in this table.

The next table and set of information in the test planning process captures Test Cards. See Table 3 which contains five Test Cards. The Test Cards capture essential testing details that are typically captured in the document-based AFIT Form 5028 such as the test objectives, the factor(s) being tested, and the test points. This is another example of how the model can represent test planning documentation that is already in use, but more effectively traced to other information. For example, the Test Cards are the link verifying the technical measurements and relevant system requirements to the planned/executed testing and test point results. This traceability is beneficial to the stakeholder’s decision-making process since the requirements and results are no longer in separate documents but contained in the model with the appropriate relationships.

Table 3. Example Test Card Table

The Test Points are also captured in a Test Point Matrix. The exemplar matrix for the AFIT UAS prototype program contained 17 Test Points; Table 4 shows a subset of these. The Test Points supports test planning by capturing the two factors (Altitude and Velocity) at two different levels for the respective factors. In this case, the Altitude levels were 8 meters and 10 meters while the Velocity levels were 1 m/s and 2 m/s. The Raster Pattern Mission involved flying a pathway similar to the one shown in Figure 6.

The configuration of the test article is captured as an instance of the representative system model at the time of test allowing for configuration management. Once the experiment is executed, the test results, status and response(s), and notes are captured for the applicable Test Point. For example, the first Test Point for the Raster Pattern Mission was determined to have an” Inconclusive” result; and the “Notes” explain this was the first flight, so a conservative battery voltage minimum was used for the time to initiate recovery. The first flight-built confidence in the system integration and a more realistic minimum battery voltage was used for the later Test Points. The second flight was determined to fail since the 14-minute flight time was less than the Endurance TPM-5 threshold of 15 minutes. On the other hand, the third Test Point for the Raster Pattern Mission has a “Pass” result since it did meet the 15-minute threshold of Endurance TPM-5. As seen, the Test Point captures critical information from the executed test providing insight in the system under test’s ability to meet technical measures and system requirements.

Table 4. Example Test Point Table

Figure 6. Example Raster Pattern

The results can also be captured from simulations/parametric diagrams in the Test Point Matrix”. For example, the Test Points for the M&S Endurance Test Card can be seen in Table 5. From the M&S Endurance Test Card, the simulation is executed from the M&S Endurance Block. The M&S Endurance Block has the Endurance SysML parametric diagram as an inner element, shown in Figure 7. The Test Points show that the factor is number of batteries and the two levels are 2 batteries and 3 batteries. Note this test was done before the Raster Pattern Mission as part of the process to determine how many batteries should be in the actual design. Based on the results recorded from the Endurance Parametric captured in the Test Point Matrix, the 2-battery configuration was selected. The decision was based on the early parametric estimate of a 19.8-minute flight time that met both the Endurance TPM-5 threshold and the system performance Endurance requirement. Although the 3-battery configuration had a longer flight time, factoring in considerations for the Low Cost TPM-8, Low Mass TPM-7, and System Procurement Cost requirement, the 2-battery configuration could meet the requirements at a more desirable cost and mass. While number of batteries was a factor captured in the Test Card TC-4 for Endurance and corresponding Test points, it was just an independent variable. However, it could have also been one factor in a DOE analysis capturing other factors and levels, randomization, coverage and power of the test.

The model captures planning and results for tests executed outside the model and parametric diagrams within the system model. It is important to note that the parametric diagram overestimated the 2-battery configuration endurance by several minutes when compared to the physical flight testing of the endurance. This means that the model can capture data to support the validation of the model’s parametric diagram results that are also physically tested. In this case, the error in the Endurance parametric diagram results is captured by the model as a Test Observation. In practice, flight test data could then be used to calibrate the parametric constraints. While not explicit in the schema, the constraint blocks and constraints could be updated and «traced» to these Test Points and Test Observations. Alternatively, the tests could «refine» the constraints.

Table 5. Example Test Point Table for the M&S Endurance Test Card

Figure 7. Endurance SysML Parametric Diagram

Test Observations are any issues identified during testing. They are captured in the Test Observations Tracker as seen in Table 6. This tracker creates a prioritized list of items that need to be addressed/ tracked and captures the fixes. Test Observations also capture the configuration of the test article when observations are made and when the issues are fixed. The first row of Table 6 (TO-1 Gimbal mode error test Observation) provides all the information to track which test points had the issue (several in this example), what the issue was, the date reported, what the configuration was when the issue was discovered. To fix the issue the test observation captures the priority level for a fix, who observed the issue in case more details are required to understand the observation, and who oversees fixing the issue. Finally, the solution can be captured in the notes section along with the root cause of the problem and the fixed configuration (instance) for tracking purposes. If future test observations with a similar root cause later appear, the solution and the fixed configuration should be captured in the system model for reference. This is not necessarily a test profile addition; rather, it is a natural iteration and maturation of the design. The system model should be updated (with a new configuration) and future test cards and/or rerun test points will provide results on the fixed system. As seen in TO-2 (Error in M&S Endurance time), the solution for this issue has not been developed yet and the lack of progress can be attributed to its low priority status.

Table 6. Example Test Observations Table

After applying the T&E profile and schema, validation was conducted to ensure the proper relationships were developed in the model and can be traced. Figure 8 shows:

  • Technical measures and system requirements that are verified by a Test Card
  • Test Card is related to a Test Event
  • Test Resources are traced to Test Card and required for the testing,
  • Specific Test Points executed from the Test Card, and
  • Test Observations made during test execution

The profile and schema were able to generate all of these test planning and design elements and their relationships. Figure 8 shows one Test card and all related information.

Figure 8. Example Model Schema Validation

Conclusion

The complexity of modern defense systems necessitates rigorous Test and Evaluation (T&E) processes to ensure operational effectiveness and mission readiness. However, traditional T&E practices in the Department of Defense (DoD) remain document-centric, leading to challenges in traceability, data integration, and collaboration between Developmental Test & Evaluation (DT&E) and Operational Test & Evaluation (OT&E) communities. While Model-Based Systems Engineering (MBSE) has gained traction in various engineering domains, its application to DoD T&E remains fragmented.

This paper addresses this gap by developing a simple T&E profile and schema, specifically tailored for a limited set of use cases. The profile provides a structured way to model and integrate test planning, execution, and results within a SysML framework, enabling traceability between technical measures, system requirements, and test artifacts (test requirements, test events, test cards, and test plans). The profile was applied to an academic unmanned autonomous system (UAS) development, that used MBSE throughout the prototyping cycle.

By transitioning T&E data from static separate documents to a dynamic model, this research highlights the potential of MBSE to improve test planning efficiency, collaboration, decision-making, and data consistency. The findings contribute to advancing the role of MBSE in DoD testing, and bridging the gap between DT&E and OT&E. This work proposed a simple modeling approach, demonstrating the feasibility of adding another profile (for test) to an existing UAS reference architecture (for design).

Application of model-based testing should proceed immediately, as there exist several mature models that should only improve with use. New test profiles and standards will continue to be developed and updated. UTP 2.3 is in beta. SysML 2.0 was recently published in June 2025; this will drive big changes to model-based testing. The Application Programming Interface (API) and Services with SysML 2.0 enables tools to interoperate – so model-based test planning could read, write and query from system models directly.

References

Arndt, C., Anyanhun, A., & Werner, J. S. (2023). Shifting Left: Opportunities to Reduce Defense Acquisition Cycle Time by Fully Integrating Test and Evaluation in Model Based Systems Engineering. Acquisition Research Program.

Arndt, C., Kerr G., Anyanhun, A., Saunders, R., Borton, D., & Werner, J. (2025). Model Based Test and Evaluation Master Plan Technical Introduction. The Journal of Test & Evaluation, 46(2). https://doi.org/10.61278/itea.46.2.1002

Bjorkman, E. A., Sarkani, S., & Mazzuchi, T. A. (2013). Using model-based systems engineering as a framework for improving test and evaluation activities. Systems Engineering, 16(3), 346–362. https://doi.org/10.1002/sys.21241

Collins, C., & Senechal, K. (2023). Test and Evaluation as a Continuum. The Journal of Test & Evaluation, 44(1). www.itea.org

Odom, T, Colombi, J. & Connell, W. (2023). Developing a DOD Testing Profile: Model-based Systems Engineering (MBSE) for Test and Evaluation Strategy. IEEE Systems Conference.

Cook, D., & Schindel, W. D. (2017). Utilizing MBSE patterns to accelerate system verification. INSIGHT, 20(1), 32–41. https://doi.org/10.1002/inst.12142

Defense Science Board. (2024). Test and Evaluation. https://dsb.cto.mil/

Office of the Under Secretary of Defense for Research and Engineering (OUSD). (2023). Mission Engineering Guide, Version 2.0,

Department of the Air Force (DAF). (2023). Digital Product Support Vision.

Friedenthal, S., Moore, A., & Steiner, R. (2015). A Practical Guide to SysML: The Systems Modeling Language: Vol. The MK/OMG Press (3rd ed.). Elsevier.

Gregory, J. & Salado, A. (2024). An ontology-based digital test and evaluation master plan (dTEMP) compliant with DoD policy. Systems Engineering, 27(6), 1012-1026.

Henderson, K., & Salado, A. (2020). Value and benefits of model-based systems engineering (MBSE): Evidence from the literature. Systems Engineering, 24(1), 51-66.

International Council on Systems Engineering (INCOSE). (2023). INCOSE systems engineering handbook: A guide for system life cycle processes and activities (5th ed.). John Wiley & Sons, Inc.

Morkevicius, A., Aleksandraviciene, A., & Strolia, Z. (2023). System Verification and Validation Approach using the MagicGrid Framework. Insight by INCOSE, 26(1), 51–59.

Object Management Group, Inc. (OMG). (2020). UML Testing Profile 2 (UTP 2).  https://www.omg.org/spec/UTP2/2.1/PDF

Office of the Secretary of Defense (OSD). (2018). Digital Engineering Strategy. https://ac.cto.mil/wp-content/uploads/2019/06/2018-Digital-Engineering-Strategy_Approved_PrintVersion.pdf

Office of the Secretary of Defense (OSD). (2020a). DoDI 5000.88, “Engineering of Defense Systems.” https://www.esd.whs.mil/DD/

Office of the Secretary of Defense (OSD). (2020b). DoDI 5000.89, “Test and Evaluation.” https://www.esd.whs.mil/DD/

Office of the Secretary of Defense (OSD). (2023). DoDI 5000.97, “Digital Engineering.” https://www.esd.whs.mil/DD/

Office of the Secretary of Defense (OSD). (2024). DoDI 5000.98, Operational Test & Evaluation and Live Fire Test & Evaluation.

Pool, R. (2021). Key Challenges for Effective Testing and Evaluation Across Department of Defense Ranges: Proceedings of a Workshop. https://doi.org/10.17226/26150

House of Representatives, Title 10 USC 139: Director of Operational Test and Evaluation (2024).

Van Atta, R. H. (2013). Understanding Acquisition Cycle Time: Focusing the Research Problem. https://apps.dtic.mil/sti/pdfs/ADA606685.pdf

Walker, J., & Borky, J. M. (2020). Test Planning, Documentation, and Impact Analysis with SysML. ITEA, 41, 258–266.

Disclaimer

The views expressed in this document are those of the authors and do not reflect the official policy or position of the United States Air Force, the United States Department of Defense, or the United States Government. The document has been reviewed for release and publication by the 88th Air Base Wing Public Affairs Office.

Author Biographies

JAKE KURZHALS Captain, USAF. Jake is a recent graduate of the Air Force Institute of Technology (AFIT), Wright Patterson AFB, Ohio. He earned his Master’s degree in Systems Engineering with specializations in Test and Evaluation/ DOE, Model-Based Systems Engineering (MBSE), and Autonomous systems. He serves in the United States Air Force as a developmental and test engineer within the 28th Test and Evaluation Squadron, Eglin AFB.

JOHN M. COLOMBI, Ph.D. John Colombi (Lt Col, USAF-Ret) is a Professor within the Department of Systems Engineering and Management at the Air Force Institute of Technology (AFIT), Wright-Patterson AFB, Ohio. He served 21 years of active duty in the US Air Force as a developmental engineer, with a variety of research, engineering, and management assignments. His research interests include Model-based Systems Engineering (MBSE) and simulation, system architecture, trade space exploration, and acquisition research. He is a member of INCOSE and a senior member of IEEE.

DAVID R. JACQUES, Ph.D. Dr. David Jacques (LtCol, USAF-Ret) is a Professor of Systems Engineering at the Air Force Institute of Technology (AFIT). During his 42 years of combined military and civil service he has had assignments spanning tactical missile intelligence analysis, ballistic missile test and evaluation, and research and development of advanced munition concepts. His research interests include Model Based Systems Engineering, architecture based evaluation, multi-objective and/or constrained optimal design, and cooperative behavior and control of autonomous vehicles. Dr. Jacques is a member of INCOSE, and an Associate Fellow of the American Institute of Aeronautics and Astronautics (AIAA).

JORDAN L. STERN, MAJOR, USSF. Ph.D. Jordan Stern is a Program Manager at the Space Rapid Capabilities Office, Kirtland AFB, NM. He previously served as an Assistant Professor within the Department of Systems Engineering and Management at the Air Force Institute of Technology (AFIT). He holds a master’s degree in systems engineering from AFIT and a Ph.D. in systems engineering from the Stevens Institute of Technology, Hoboken, NJ.

ITEA_Logo2021
ISSN: 1054-0229, ISSN-L: 1054-0229
Dewey Classification: L 681 12

  • Join us on LinkedIn to stay updated with the latest industry insights, valuable content, and professional networking!