DECEMBER 2025 I Volume 46, Issue 4

Graph illustrating the concept of Confidence-Based Skip-Lot Sampling in test and evaluation

Resource Implications and Benefits of Model-Based Acquisition Planning

Jose L. Alvarado Jr.

Jose L. Alvarado Jr.

AFOTEC Detachment 5, Edwards AFB, California

Dr. Thomas H. Bradley

Dr. Thomas H. Bradley

Colorado State University, Ft. Collins, Colorado

DOI: 10.61278/itea.46.4.1001

Abstract

The Department of Defense (DoD) Test and Evaluation (T&E) community is advancing digital engineering by adopting and developing model-based testing methodologies within the flight test community. This article expands upon the existing grey box model-driven test design (MDTD) approaches and incorporates flight test information into Systems Modeling Language (SysML) models. These models leverage model-based systems engineering (MBSE) artifacts to generate flight test plans. Additionally, this article introduces a methodology developed by the Air Force Operational Test and Evaluation Center (AFOTEC) to guide the operational test and evaluation (OT&E) test planning process. This methodology uses MBSE to create, develop, and maintain the information required for test plans within a digital model construct. These case studies demonstrate the methodology using SysML implemented through an MDTD process and showcase the methodology’s applicability to different systems under test (SUT). The results include a set of System Usability Survey (SUS) metrics that measure the method’s ability to utilize SysML elements within the model and assess its usability and effectiveness in generating flight test plans. Furthermore, the paper also discusses the method’s applicability to various scenarios, the benefits of model-based testing, and its relevance in the context of operational flight testing.

Keywords: AFOTEC, Flight Test, MDTD, MBSE, SUS, SysML, T&E, Test Planning.

This article was prepared by Jose Alvarado in his personal capacity and as a student attending Colorado State University. The opinions expressed in this article are the author’s own and do not reflect the views of the Air Force, the Department of Defense, or the United States government.

Previous Works

Over the course of several studies published in this journal, we have progressively advanced the application of model-based test and evaluation (MBT&E) from foundational concepts to practical implementation through three hypothetical case studies. In our initial contribution, “Developing Model-Based Flight Test Scenarios” [1], we established a theoretical framework and developed a structured process by which test scenarios can be systematically derived from system models. This work was subsequently extended in “A Case Study-based Assessment of a Model-driven Testing Methodology for Applicability and Metrics of Model Reuse” [2], which investigated model reuse, demonstrating the feasibility of reapplying model components across different contexts and proposing an approach to quantify their relative value. The present article builds upon this framework by examining the usability of model-based concepts in flight-test planning by assessing their acceptance within the operational test community through a targeted survey.

Building on these prior developments, this study shifts the emphasis from methodological advancement to community engagement and validation. We present and analyze the results of a comprehensive survey conducted among operational testers, aimed at capturing practitioner perspectives on the acceptability and perceived value of MBTE practices in flight-test applications. By situating these findings within the broader context of our earlier work, this article provides an integrated view of both technical progression and stakeholder perception in the evolving practice of model-based flight-test planning.

This paper is organized as follows. The Background section describes the MBSE framework adopted for this research and integrates relevant findings from prior case studies to establish the technical context for the current investigation. The Methods and Assessment sections delineate the survey design and methodology, including participant demographics, question development, and data collection and analysis procedures. The Discussion section examines the survey outcomes, identifying key trends and insights related to the adoption, applicability, and perceived utility of MBSE methodologies in flight-test planning. The Lessons Learned section characterizes the principal implementation challenges, including initiation of the MBSE process, development of the model architecture, and management of the learning curve associated with model generation and integration. Finally, the Conclusion summarizes the primary findings and offers recommendations to guide future research and the continued maturation of MBT&E practices within the aerospace domain.

Introduction

Flight test plans are developed to “communicate technical details and logistics for executing and reporting the outcomes of flight, ground, and laboratory tests on air vehicles, subsystems, and components” [3]. They establish a connection between test points and specified requirements, ensuring that SUT aligns with system specifications and delivers the intended performance in the proposed operational environment. These plans cover all aspects of conducting tests, collecting data, and performing post-flight analysis. Test plans capture and communicate a broad set of derived requirements necessary for successfully executing flight tests and are an integral aspect of the DoD acquisition process [4]. The conclusions drawn from the post-flight data analysis are then incorporated into reports to communicate and validate the SUT’s flight test performance [5].

Within the DoD digital engineering-driven acquisition process, this research seeks to develop a model-based testing (MBT) methodology using MBSE concepts. This methodology enables the integration of flight test planning into models created using the Dassault Systemes Cameo Modeler™ software. This article expands on existing model-based grey box testing processes [1] and incorporating flight test information into SysML models. The results of these processes create a set of flight-test-specific digital engineering models that represent the product of a DoD MBSE-enabled product development process. These models are scrutinized to assess the adequacy providing the information necessary for test planning. The MBT process identifies missing parts or information by reviewing the native digital template and the resulting draft test plan.

The proposed MBT process seeks to allow for the re-population and re-generation of a new version of the test plan when new information, updates, or technology is inserted into the model. The costs and benefits of this process are assessed by testing the MBT methodology’s ability to be updateable, consistent, and adaptable across acquisition programs [1]. The flight test plans created and developed using this process contain the necessary information to conduct a flight test evaluation of the SUT. The flight test plan collects the required information to populate the three significant sections of the test plan: the front matter, system description and details, and the appendix:

  • The front matter contains all the program information and mission-critical information.
  • The system description describes the overall operations of the system and the significant components that make up the SUT.
  • The appendix contains all the critical operating issues (COI), measures of effectiveness (MOE), measures of performance (MOP), and measures of suitability (MOS) pertinent to SUT.

Background

The existing flight test planning procedures used by the acquisition process are meticulous and heavily focused on developing documentation, requiring extensive configuration management to keep essential documents up to date with the SUT. Flight testing considerations typically emerge late in the acquisition process, often leading to program delays due to the time needed for test planning, instrumentation, test operations, and analysis. Generally, flight test planning relies on consensus and expertise-driven decision-making, consuming significant man-hours and incurring high costs. Innovative frameworks such as MBSE and its subsets, MBT and MDTD, are designed to improve the state of the art in flight testing, relative to the default document-centric OT&E procedures.

The MBSE methodology comprises a set of interconnected processes, methods, and tools operating within a “model-based” or “model-driven” framework [6]. This approach generates digital models that serve as abstract representations of reality [7], recalling and relating associated data through computer models. These models’ elements can describe various aspects of a physical SUT’s desired structure and behavior, business processes, and requirements. In practice, the set of interconnected state machine diagrams, use-case diagrams, sequence diagrams, and mathematical models allow testers to create independent test cases, scenarios, and plans detached from specific design implementations [8] [9].

Under the MBT paradigm, a subset of the emerging MBSE domain, the model can precisely define test parameters and automatically generate detailed test plans for specific test cases. This approach facilitates testing, investigation, and optimization, reducing the reliance on physical assets. An MBSE testing model enables the representation, communication, and parametric optimization of various test parameters, allowing for reuse without the need to rewrite the entire test plan. Additionally, the MBSE improves the management of system complexity by enabling the system model to be examined from multiple viewpoints and analyzing the impact of changes. It also ensures product quality by providing a system model that can be assessed for consistency, correctness, and completeness [10].

A component of MBT is MDTD, which seeks to develop a comprehensive understanding of the relationships and dependencies that govern test design, thereby facilitating a thorough assessment of the testing implications of changes made at any stage of the product’s design cycle [11]. MDTD is asserted to significantly improve the overall testing development process significantly [11]. The MDTD process allows a subset of experienced testers to handle the high-level, statistical, and defensibility aspects of test design and development. Meanwhile, other testers plan the details of Verification and Validation (V&V) criteria, testing operations, and safety. This abstraction allows for the early involvement of designers responsible for the test planning process so that they can incorporate testing requirements into the system design [12]. Ideally, MDTD empowers the operators and test team members to influence and contribute expertise in shaping and refining different aspects of a physical SUT’s intended structure and behavior.

Methods

This section will introduce the two case models used to create and develop the survey test plan and to evaluate the subsequent survey data. Additionally, an overview of the SUS metric applied to evaluate the survey data will be provided.

Case Studies

Two case studies were developed and produced using SysML implemented in MBSE with the Dassault Systemes’ Cameo Systems Modeler™ software package to create, develop, and assess test plans using the AFOTEC Methodology process. The author created the first case study model, the Combat Search and Rescue Locator (CSARL) from a system design and development point of view. The second model, the E-X airborne multi-sensor platform, is a revision and update of an example in the textbook “Effective Model-Based Systems Engineering” [13]. Details of the two case studies used are presented in the following paragraphs1F[ A third case study was created and developed to assess the reusability of these models, but it was not used to evaluate the usability of test plan creation.].

The first case study focuses on AFOTEC’s CSARL system. CSARL is a fictional system developed as a training case for AFOTEC flight test personnel learning to create test plans. It is well-documented, well-accepted in the testing community, and has a thorough test plan “answer key.” The CSARL system is notionally a portable hand-held Global Positioning System (GPS) receiver system offering enhanced digital moving maps and real-time navigational capabilities for downed air crew and search and rescue (SAR) forces during combat and peacetime survival scenarios. Its purpose is to complement current GPS-integrated survival radios, enhancing existing survival and rescue navigation capabilities without relying on traditional map-and-compass techniques [14]. CSARL is an example system demonstrating the proposed methods for relevant flight-test test plan generation. Since the system is well-documented and all AFOTEC personnel are familiar with the application and model, it is a good benchmark for applying the grey box-inspired testing technique along with the AFOTEC Methodology process implemented in an MBSE environment to create a flight test plan without direct subject matter experts (SME) intervention. Moreover, it provides a means to compare outputs between the document-centric and model-centric processes.

The second case study uses a system described in the Effective Model-Based Systems Engineering textbook [13]. The textbook introduces the “E-X Airborne Sensor Platform” as a hypothetical system used to exemplify the development of various diagrams for defining and describing software-intensive aerospace systems using SysML. The E-X airborne multi-sensor platform gathers intelligence, surveillance, and reconnaissance (ISR) data. Operating in collaboration with ground stations and affiliated organizations, it facilitates the collection and dissemination of ISR products. E-X executes ISR missions and performs aerial imaging and sensing functions [13]. The E-X case study is a valuable example of a moderately complex system, utilizing SysML diagrams readily accessible in the textbook [13] and easily replicated. Like the CSARL case study, the E-X system includes comprehensive documentation, making it an ideal standard for applying grey box-inspired testing techniques and the AFOTEC Methodology process within an MBSE environment. Additionally, it offers another platform for comparing outcomes between document-centric and model-centric processes.

Auto-Generated E-X Airborne Sensor Platform Test Plan

The E-X case study model was used to generate the Cameo auto-generated (CAG) draft test plan used to conduct the survey and collect the usability data, which was the hypothetical E-X Airborne Sensor Platform system [13]. Its selection for demonstrating flight test-relevant test plan generation methods was intentional, leveraging readily available SysML diagrams that could be easily implemented and reproduced. Once the pertinent diagrams from the E-X system model were identified and reproduced using the Cameo Enterprise Architecture software, the AFOTEC methodology test structure was incorporated and modified to complete the model for use in the case study. The elements within the structure were assessed for content to ensure that each MOE, MOP, and MOS was linked to their associated COI and were populated with all the essential details to fit the model and meet the AFOTEC test planning guide [5] requirements, as illustrated in Figure 1.

1 A third case study was created and developed to assess the reusability of these models, but it was not used to evaluate the usability of test plan creation.

Figure 1. E-X Model Layout Structure Incorporates the AFOTEC Test Model in a Folder Labeled AFOTEC Methodology.

The generation of the test plan documentation process took advantage of the model-generated artifacts Model Development Kit (MDK) contained within the Government Issued Framework (GIF) that uses the weapons government reference architecture (weapons GRA) developed by The Air Force Life Cycle Management Center (AFLCMC). The MDK was modified using the Velocity Template Language (VTL) to create and develop the template to produce the draft test plan.

The author manually tested the modified VTL template from the MDK. This template generated the document for the baseline CSARL model, and the author verified its accuracy by comparing the VTL template’s output with the original CSARL test plan, confirming that all information was correctly extracted and the output was complete. The modified VTL template was then applied to the E-X test case model to generate a draft test plan.

The E-X model was linked to the MDK as a “project usage” model since the E-X test case model contains all the information required for each COI, MOE, MOP, and MOS associated with the E-X model. The MDK structure was then modified to reflect the sections needed for the AFOTEC test planning guide [5] and other relevant sections. The draft test plan was then developed by running the VTL script, which created the draft test plan document as an output.

System Usability Survey

The System Usability Survey (SUS), developed in 1986, is a questionnaire designed to assess users’ perceptions of a system’s usability [15]. The survey uses a set of objective measures to evaluate the impact of specific changes made to a product or service [15]. “The international standard ISO 9241 breaks the measurement of usability down into three separate components that must be defined relative to the context of use” [15]:

  • Effectiveness: Whether people can complete their tasks and achieve their goals.
  • Efficiency: The extent to which they expend resources to achieve their goals.
  • Satisfaction: The level of comfort the experience provides in achieving their goals.

Thus, the objective of the SUS is to provide a measure of a subject’s perceptions of the usability of a system or product in a short period during the evaluation [15].

The SUS consists of ten statements, alternating between five that indicate strong agreement and five that indicate strong disagreement, to minimize response bias [15]. The statements are rated on a scale from 1 to 5, with a score of 1 for “strongly disagree” and a score of 5 for “strongly agree” [15]. For each positive statement, the individual score is reduced by 1, and these adjusted scores are then summed. The score is subtracted from 5 for each negative statement, and these adjusted scores are also summed separately. The two sums are added and multiplied by 2.5 to obtain the user’s SUS score [15].

Interpretation of the System Usability Survey Score

The SUS score, developed by Bangor, Kortum, and Miller in 2009, ranges from 0 to 100 [16]. They created a scale (figure 4 in reference [16]) that correlates users’ SUS scores with specific adjectives and ratings to provide a grading score, with higher scores indicating better usability. A SUS score of 68 corresponds to the 50th percentile. Scores above 70 are typically considered good, with 85 indicating excellent usability, while scores below 50 suggest poor user experience [16]. Research has shown that SUS can effectively measure a system’s perceived usability with a small sample size (around 8-12 users), providing a reliable assessment of how people view your system or product [15]. AFOTEC uses acceptability ranges and adjective ratings to evaluate a SUT’s usability, effectiveness, and suitability. User comments and surveys conducted during the execution of a SUT program are collected to gather survey data.

Assessment Method

Rating the Cameo Auto-Generated Draft Test Plan

To evaluate the effectiveness, suitability, and usability of the CAG draft test plan, current AFOTEC members from Detachment 5 were issued a survey. These participants are test and evaluation professionals with varying experience in operations, have different maintenance backgrounds, and were composed of both officer and enlisted personnel. The survey, which included 21 participants, aimed to evaluate the usability, effectiveness, and suitability (measured as value and quality) of test plan development and reuse using the MDTD process. The results were compared to the current AFOTEC guidance. The survey was circulated among AFOTEC members with varying knowledge and experience in creating and reviewing test plans. Participants were asked to rate their expertise in test plan creation and their involvement in reviewing test plans. Demographic data collected during the survey provided insights into the participants’ self-assessed levels of knowledge and participation.

First, the survey asked participants to self-judge their proficiency in “coordinating the test planning process” and “reviewing test plans” by selecting from one of the following: beginner, competent, proficient, or expert.” Based on the answers to these two questions, each participant was characterized by an overall experience level used in the survey analysis results. Table 1 shows the stratification of the determined experience level and the number of participants in each category.

Table 1. Determination of Experience Level Based on Participant’s Self-Evaluation.

Level of Proficiency in Overall Experience Level Number of Participants
Reviewing test plans Coordinating the test planning process
Beginner Beginner Beginner 4
Beginner Competent
Beginner Proficient
Beginner Expert
Competent Beginner
Competent Competent Competent 5
Competent Proficient
Competent Expert
Proficient Beginner
Proficient Competent
Proficient Proficient Proficient 6
Proficient Expert
Expert Beginner Expert 6
Expert Competent
Expert Proficient
Expert Expert

The usability assessment section of the survey included the ten standard questions from SUS, as outlined by Aaron Bangor, Philip T. Kortum, and James T. Miller [16]. Additionally, the survey featured four questions specific to the CAG draft test plan. The first question asked participants to evaluate the value of the plan in developing test plans. In contrast, the remaining questions assessed participants’ impressions of the plan’s quality by addressing the test plan’s content.

To conclude the survey, the participants were asked three final questions to measure their views on the usability of the CAG draft test plan and their impressions on whether the CAG test plan can reduce the time needed to prepare a final test plan. First, participants were asked if they believed the CAG draft test plan could shorten the time required to create a test plan. Second, they were asked to estimate the percentage of time savings it would provide. Lastly, participants were asked to give their overall recommendation on using the CAG draft test plan.

Appendix A contains a sample version of the survey. The survey participants were given two weeks to respond to the emailed survey document, and the responses were collated and analyzed. The Institutional Review Board (IRB) at Colorado State University reviewed and approved this research under protocol #5680. It was classified as exempt from full review under the category of exempt human subjects’ research.

Results

Results of System Usability Survey

The ten standard questions used in the SUS were analyzed and organized based on the participants’ experience levels. The participants’ SUS scores were assessed using the adjective rating scale developed by Bangor, Kortum, and Miller [16]. Table 2 illustrates this stratification while Table 3 shows the distribution of the scores by experience level. These data are plotted in Figure 2 using a bar chart to illustrate the distribution and their relationship. These findings suggest that the CAG draft test plan has an overall adjective rating of “Good” or better.

Table 2. Classification of SUS Scores Based on Adjective Rating.

SUS Score Adjective rating
low high
85 100 Best Imaginable
74 84 Excellent
54 73 Good
38 53 Okay
26 37 Poor
0 25 Worst Imaginable

Table 3. Distribution of Participant SUS Scores Based on Adjective Rating and Participant Experience Level.

Rating Beginner Competent Proficient Expert
Best Imaginable 0 4 1 3
Excellent 1 0 0 0
Good 2 0 2 3
Okay 1 1 2 0
Poor 0 0 0 0
Worst Imaginable 0 0 1 0

Figure 2. Bar graph Illustrating the Distribution of the Participant’s SUS Scores Based Adjective Rating and Experience Level.

The SUS results were evaluated using the acceptability rating scale commonly used by AFOTEC usability of SUTs, as shown in Figure 3. The participants’ SUS scores were ranked based on the acceptability rating criteria in Table 4, with Table 5 showing the distribution of scores by experience level. These data are presented in Figure 4 using a bar chart to illustrate the distribution and their relationships. The CAG draft test plan received mixed acceptability ratings (four unacceptable, four marginally acceptable). However, most expert experience level ratings were acceptable.

Table 4. Stratification of SUS Scores Based on Acceptability Rating.

SUS Score Acceptability rating
low high
71 100 Acceptable
61 70 High Marginal
51 60 Low Marginal
0 50 Not Acceptable

Table 5. Distribution of Participant SUS Scores Based on Acceptability Rating and Experience Level.

Rating Beginner Competent Proficient Expert
Acceptable 1 4 2 5
High Marginal 1 0 0 0
Low Marginal 1 0 2 1
Not Acceptable 1 1 2 0

Figure 3. Bar Graph Illustrating the Distribution of the Participant’s SUS Scores Based on Acceptability Rating and Experience Level.

The SUS results were evaluated based on the average of all participants’ scores, resulting in an average SUS score of 67. Figure 4 demonstrates the distribution of these scores. Since a SUS score of 68 is considered average, a score of 69 suggests that the CAG draft test plan has “slightly above-average” usability.

Figure 4. Box Plot Showing the Distribution of the SUS Scores (n=21).

Analysis of Effectiveness, Value, and Quality

The following discussion presents survey results regarding the CAG draft test plan’s effectiveness, value, and quality. The information in the CAG draft test plan is a practical starting point for developing complete test plans that meet AFOTEC’s test planning criteria [17].

Table 6. Effectiveness Data Collected During the Survey and Sorted Based on Responses.

Effectiveness Number of Respondents Based on the Following Question
“Rate the effectiveness of the CAG test plan as a starting point for developing a completed test plan”.
Completely Ineffective 0
Somewhat Ineffective 0
Borderline        2
Somewhat Effective 10
Completely Effective        9

Table 6 shows how the survey data is categorized based on participants’ opinions about the effectiveness of the CAG test plan as a practical starting point for creating a complete test plan. Of the 21 survey respondents, nine rated the CAG draft test plan as “completely effective.” Ten respondents rated it as “somewhat effective,” while only two gave a “borderline” rating. Nine of the 21 survey respondents responded that the CAG draft test plan was “completely effective.” These responses provide evidence that the CAG draft test plan is an “effective” starting point for developing a complete test plan.

Table 7. Value Data Collected During the Survey and Ordered Based on Responses.

Value Number of Respondents Based on the Following Question
“CAG test plan capabilities meet my requirements.” “Using the CAG test plan is a frustrating experience.” “The CAG test plan is easy to use.” “I must spend too much time correcting things with the CAG test plan”.
Strongly Disagree 0 6 1 3
Disagree 0 4 3 4
Slightly Disagree 3 4 3 3
Neither Agree nor Disagree 8 2 3 5
Slightly Agree 4 5 5 6
Agree 5 0 6 0
Strongly Agree 1 0 0 0

Table 7 categorizes the survey data based on participants’ opinions about the value of the CAG test plan as a tool for developing a complete test plan. The value was assessed through responses to four questions measuring participants’ attitudes toward using the CAG draft test plan. The first question asked if the “CAG test plan capabilities meet my requirements.” Of 21 respondents, one “strongly agreed,” five “agreed,” four “slightly agreed,” eight “neither agreed nor disagreed,” and three “slightly disagreed.” The second question asked if “using the CAG test plan is a frustrating experience.” Five “slightly agreed,” two “neither agreed nor disagreed,” four “slightly disagreed,” four “disagreed,” and six “strongly disagreed.” The third question asked if “the CAG test plan is easy to use.” Six “agreed,” five “slightly agreed,” three “neither agreed nor disagreed,” three “slightly disagreed,” three “disagreed,” and one “strongly disagreed.” The final question asked, “I must spend too much time correcting things with the CAG test plan.” Six “slightly agreed,” five “neither agreed nor disagreed,” three “slightly disagreed,” four “disagreed,” and three “strongly disagreed.” These responses provide evidence that the CAG draft test plan is valuable as a tool for developing a complete test plan. Figure 5 illustrates the distribution of the survey participants’ responses to the four questions to assess the CAG draft test plan’s value as a tool.

Figure 5. Bar Graph Illustrating the Results of the Value of the CAG Draft Test Plan.

Table 8. Quality Data Collected During the Survey and Order Based on Responses.

Quality Number of Respondents Based on the Following Question
Section Headings Charts, Figures, Diagrams, etc. Main body contents  Measures section content Overall acceptability of the CAG format 
Completely Unacceptable 1 0 0 1 2
Slightly Unacceptable 2 2 2 6 1
Borderline 3 5 3 0 5
Somewhat Acceptable 7 5 10 8 7
Completely Acceptable 8 9 6 6 6

Table 8 and the bar chart in Figure 6 categorize the survey data based on participants’ opinions of the CAG test plan’s quality. The quality of the CAG draft test plan was evaluated through feedback on five specific sections. These sections focused on participants’ views of the plan’s section headings, the presentation of charts, figures, and diagrams, and the contents of the test plan’s main body and measures section. The final feedback area concerned the overall acceptability of the CAG draft test plan’s format.

The first area where participants were asked for their opinions was the section headings. Of the 21 participants, eight rated them as “completely acceptable,” seven as “somewhat acceptable,” three gave a “borderline” response, two found them “slightly unacceptable,” and one rated them “completely unacceptable.” The second area focused on participants’ views of the presentation of charts, figures, and diagrams. Of the 21 participants, nine found this section “completely acceptable,” five “somewhat acceptable,” five were “borderline,” and two found it “somewhat unacceptable.” Participants shared their impressions of the main body contents in the third section. Six participants rated it “completely acceptable,” ten rated it “somewhat acceptable,” three were “borderline,” and two found it “slightly unacceptable.” The fourth section asked participants about their impressions of the measure section contents. Six rated it as “completely acceptable,” eight as “somewhat acceptable,” six as “slightly unacceptable,” and one as “completely unacceptable.” Finally, participants provided their overall impression of the CAG draft test plan’s format. Of the 21 participants, six found it “completely acceptable,” seven appraised it “somewhat acceptable,” five were “borderline,” one found it “somewhat unacceptable,” and two appraised it “completely unacceptable.” These responses indicate the overall quality of the CAG draft test plan is acceptable while highlighting areas where the format meets expectations and where improvements are needed.

Figure 6. Bar Graph Illustrating the Results of the CAG Draft Test Plan Quality.

Results of Time Reduction and Recommendation of the CAG Draft Test Plan

Table 9 presents the responses to whether using the CAG draft test plan reduces the time required to write a final test plan. All participants, except one, agreed that the CAG draft test plan effectively shortens the time needed to develop a final test plan. When asked to estimate the percentage of time savings the CAG draft test plan could provide, the average response was calculated to be 42%. This metric was further decomposed based on experience level. Table 10 displays the responses inputted by the respondents in terms of the percentage of time savings. This suggests that participants believe the CAG draft test plan can significantly streamline the test plan preparation process.

Table 9. Reduction of Time Achieved by using the CAG Draft Test Plan.

Reduction of Time Number of Respondents Based on the Following Question
Utilizing the CAG example test plan, do you foresee a reduction of time in writing a final test plan?
YES 20
NO 1

Table 10. Reduction Time Achieved by using the CAG Draft Test Plan Decomposed by Experience Level.

Experience Level Percent of Average Time Reduction
Beginner 46%
Competent 49%
Proficient 41%
Expert 30%
Overall 42%

Table 11 displays the responses to the question about recommending the CAG draft test plan for use by a test team in writing a final test plan. When asked whether they would recommend the CAG draft test plan as a practical starting point for creating a complete test plan, only two participants did not recommend its use. Once again, the findings suggest that the CAG draft test plan can be a valued tool that can significantly simplify preparing a final test plan.

Table 11. Opinion on Recommending a Test Team for Use in Writing a Final Test Plan.

Recommendation Number of Respondents Based on the Following Question
Would you recommend that a test team use the CAG test plan?
YES 19
NO 2

Discussion

Benefits of an Auto-Generated Draft Test Plan

Survey responses from testing professionals indicate that the CAG draft test plan represents a practical and recommended framework for the creation and development of test plans. Although the average SUS score placed its usability as “slightly below average” according to the Bangor, Kortum, and Miller scale (figure 4 in reference [16]), respondents nonetheless regarded the CAG draft test plan as both effective and valuable, recommending its integration into the broader test planning process. Furthermore, despite receiving an adjective rating of “good” and an acceptability rating of “high marginal” on the same scale, participants consistently emphasized the overall quality and utility of the CAG draft test plan. Their feedback suggests that the tool provides meaningful support in structuring and improving test planning activities. Accordingly, AFOTEC personnel may employ the CAG draft test plan to develop robust and context-appropriate test plans for executing tests on a given System Under Test (SUT). Overall, the CAG draft test plan constitutes a valuable and constructive addition to the formal test planning process.

The survey data indicated that the participants believe that the CAG draft test plan can reduce the time it takes a test team to develop a complete test plan by an average of 42%. Although this estimate is based on the opinions of various AFOTEC members, these professionals believe the tool can provide significant time savings. Given that test plans typically take three to nine months to create and develop, and using an average of six months, the average time savings would translate to 75 days saved. According to the survey participants’ perception, the CAG test plan results in significant time savings, highlighting the potential efficiency gains it offers. These findings confirm the tool’s effectiveness in streamlining the test plan development process.

Costs of an Auto-Generated Draft Test Plan

The final step in the test planning process is to refine the CAG draft test plan through multiple review sessions, during which the test team finalizes the plan to secure stakeholder approval. Once the draft is complete, it undergoes technical editing to ensure compliance with AFOTEC guidelines and formatting standards before receiving final approval and signatures. The more refined the draft, the quicker this process will be, saving time and resources. Therefore, the closer the auto-generated draft is to a final product, the fewer resources will be required, and the faster the test plan can be approved.

The participants provided valuable feedback through comments and suggestions beyond the survey questions. Although the CAG test plan from this research realized significant potential, the participants identified several issues with the measures section, document formatting, and overall readability. The confusing layout and the unnecessary complexity of modeling MOEs in a Block Definition Diagram (BDD) add to these challenges. In its current state, the tool may reduce “copy and paste” errors, but the time required to fix output and formatting issues might outweigh the benefits.

To address these concerns, the auto-generated test plan template should be reprogrammed with a stronger focus on formatting and structure to better align with AFOTEC standards. Improvements in readability and layout are also needed. The measures section will require careful attention to organizing and formatting tables for better clarity and usability. Once these issues are resolved, participants recommended additional stakeholder engagement and surveying to assess the CAG’s capabilities.

While the auto-generated test plan shows the potential to realize the benefits of MBSE with AFOTEC’s test planning approach, improvements will be required to become a fully functional tool. Addressing these issues, especially the confusing layout, and unnecessary complexities, will enhance the tool’s efficiency. By refining the template to resolve these problems, the next generations of auto-generated test plans can help streamline the process, reduce the need for extensive revisions, and save time and resources.

Costs and Benefits of Implementing an MBSE Approach

The CAG draft test plan was developed using the MDTD approach within an MBSE framework, proving that MBT combined with MBSE is a viable process for creating and developing test plans. The results in this paper indicate that the CAG draft test plan is beneficial and can be effectively implemented using an MBSE approach. Once a model is created with these processes, AFOTEC members can use the CAG draft test plan to prepare suitable test plans for executing tests on a given SUT. This demonstrates the practical advantages of integrating MBT and MBSE into test plan development. Thus, the CAG draft test plan provides a reliable and efficient method for test planning.

By implementing an MBSE approach, acquisition programs can reduce the time needed to create test plans, significantly lowering the effort required to develop test plans for future systems under test. This approach aligns with the goals of the DoD’s digital transformation process. Additionally, it offers tangible benefits in terms of saving effort. Adopting MBSE streamlines test plan development and enhances efficiency, contributing to more effective and timely testing processes.

Implementing the new MBT process within an MBSE framework requires a significant paradigm shift in testers’ mindsets. Training personnel is essential to ensure everyone can interpret diagrams effectively. Despite the initial effort, the benefits are substantial. The main advantage is the ability to reuse components, and several secondary benefits follow. Overall, this innovative approach promises enhanced efficiency and effectiveness in test planning.

Lessons Learned

Implementation of the model-based AFOTEC MDTD methodology within the E-X case study revealed several practical considerations relevant to the broader adoption of MBSE in test and evaluation environments.

Initiating the MBSE Process: Establishing the initial modeling framework required clearly defining the objectives, their interrelationships, and the associated process interfaces. Clarifying the model’s scope and its connection to existing documentation processes underscored the importance of alignment and a well-articulated modeling intent.

Developing Model Architecture: Translating the AFOTEC MDTD methodology into a structured, reusable model architecture presented technical challenges related to traceability and consistency. Ensuring alignment between test objectives, requirements, and model elements demanded iterative refinement and disciplined configuration management to preserve model integrity throughout development.

Managing the Learning Curve: Transitioning from traditional document-based practices to an MBSE-driven approach introduced both a learning curve and a paradigm shift for users. Familiarity with modeling tools, structures, and visualization techniques developed gradually, resulting in measurable improvements in efficiency and standardization. Structured training and continued support were essential to maintaining model quality and user proficiency.

Overall, the lessons learned emphasize that while the initial adoption of MBSE requires a significant investment in process definition, model architecture, and workforce development, these efforts yield enduring benefits in efficiency, consistency, and knowledge retention. The insights gained from this effort provide a foundation for advancing model-based test and evaluation practices across future AFOTEC and DoD test programs.

Conclusion

The study of an auto-generated draft test plan demonstrates that the model-based AFOTEC MDTD methodology can be applied to complex case studies, such as the E-X model. The study aimed to create a model that could utilize the AFOTEC MDTD methodology and generate a test plan. The auto-generated test plan was evaluated through a survey of AFOTEC personnel, and the survey participants’ perceptions indicated that the auto-generated test plan was beneficial and could be used to prepare suitable test plans for executing tests on a given SUT. This demonstrates the practical advantages of integrating MBT and MBSE into test plan development.

The primary advantage of the current AFOTEC planning process lies in its well-established and accepted nature. However, this process also incurs several costs. These include the requirement to gather and analyze numerous documents to formulate a test plan, the inherent subjectivity in interpreting written materials, and the challenge for testers to determine testing approaches based on incomplete information. Consequently, this often results in frequent revisions, wasting time, and resources.

Using the AFOTEC methodology with a model-based approach keeps test planning documentation up-to-date and reduces the time needed to create a test plan. This approach ensures consistency and standardization by applying the same process each time a new test plan is developed. As a result, the time and effort required for test plan creation are minimized. The reuse of the process streamlines documentation and enhances efficiency.

References

[1] J. L. Alvarado. and T. H. Bradley, “Developing Model-Based Flight Test Scenarios,” The Journal of Test and Evaluation, vol. 44, no. 4, 2023.

[2] J. L. Alvarado and T. H. Bradley, “A Case Study-based Assessment of a Model-driven Testing Methodology for Applicability and Metrics of Model Reuse,” The ITEA Journal of Test and Evaluation, vol. 45, no. 4, 2024.

[3] G. VanPeteghem, C. J. Liebmann, S. S. Mailen, J. D. Martin and K. L. Peck, “Test Plan Author’s Guide,” Engineering Directorate, 412th Test Wing, Edwards AFB, 2021.

[4] Air Education and Training Command, “Air Force Instruction 99-103 – Capabilities-Based Test and Evaluation,” Department of Air Force, Washington D.C., 11 August 2020.

[5] AFOTEC A-2/9, “Air Force Operational Test and Evaluation Center Test Design Guide,” AFOTEC, Albuquerque, 20 July 2018.

[6] K. Henderson, T. McDermott, E. Van Aken and A. Salado, “Towards Developing Metrics to Evaluate Digital Engineering,” Systems Engineering, vol. 26, no. 1, pp. 3-31, 2023.

[7] S. Friedenthal, A. Moore, and R. Steiner, A practical guide to SysML: The systems modeling language, Morgan Kaufmann, 2014.

[8] J. Gregory, L. Berthoud, T. Tryfonas, A. Rossignol, and L. Faure, “The long and winding road: MBSE adoption for functional avionics of spacecraft,” Journal of Systems and Software, vol. 160, p. 110453, 2020.

[9] Y. Wang and M. Zheng, “Test case generation from uml models,” in 45the annual midwest instruction and computing symposium, cedar falls, 2012.

[10] INCOSE, INCOSE Systems Engineering Handbook: A Guide for System Life Cycle Processes and Activities, New Jersey: Wiley, 2015.

[11] M. Mussa, S. Ouchani, W. Al Sammane, and A. Hamou-Lhadj, “A Survey of Model-driven Testing Techniques,” in 2009 Ninth International Conference on Quality Software, 2009.

[12] M. b. a. V. G. Sami Beydeda, “Model-Driven Software Development,” Springer, Germany, 2005.

[13] J. M. Borky and T. H. Bradley, Effective model-based systems engineering, Springer, 2018.

[14] Director of Operational Capability Requirements, Capability Development Document (CDD) for Combat Search and Rescue Locator (CSARL), Washington D.C.: Department of Defense, 2018.

[15] J. Brooke, “SUS: A retrospective,” Journal of usability studies, vol. 8, no. 2, 2013.

[16] A. Bangor, P. Kortum, and J. Miller, “Determining what individual SUS scores mean: Adding an adjective rating scale,” Journal of usability studies, vol. 4, no. 3, pp. 114-123, 2009.

[17] AFOTEC A-2/9, “Air Force Operational Test and Evaluation Center Test Design Guide,” AFOTEC, Albuquerque, 20 July 2018.

[18] D. C. Montgomery, Design and Analysis of Experiments, Hoboken: Wiley, 2020.

Author Biographies

JOSE ALVARADO, Ph.D., is a senior test engineer and technical advisor for AFOTEC Detachment 5 at Edwards AFB, California, with over 34 years of developmental and operational test and evaluation experience. His research focuses on improving flight test engineering by applying Model-Based Systems Engineering (MBSE) concepts and implementing Model-Based Test and Evaluation (MBTE) to refine test processes. Dr. Alvarado holds a BS in Electrical Engineering from California State University, Fresno (1991), an MS in Electrical Engineering from California State University, Northridge (2002), and a PhD in Systems Engineering from Colorado State University (2024). He serves as an adjunct faculty member for the engineering and technical education departments at Antelope Valley College. He is a member of the ITEA, Antelope Valley Chapter and the INCOSE Colorado State University Chapter.

THOMAS H. BRADLEY, Ph.D., serves as the Woodward Foundation Professor and Department Head for the Department of Systems Engineering at Colorado State University. He conducts research and teaches various courses in system engineering, multidisciplinary optimization, and design. Dr. Bradley’s research interests are focused on applications in Automotive and Aerospace System Design, Energy System Management, and Lifecycle Assessment. Bradley earned a BS and BS in Mechanical Engineering at the University of California – Davis (2000, 2003) and a PhD in Mechanical Engineering at Georgia Institute of Technology (2008). He is a member of INCOSE, SAE, ASME, IEEE, and AIAA.

ITEA_Logo2021
ISSN: 1054-0229, ISSN-L: 1054-0229
Dewey Classification: L 681 12

  • Join us on LinkedIn to stay updated with the latest industry insights, valuable content, and professional networking!