INNOVATION Independent Automated Verification and Validation Testbed for Test and Evaluation

MARCH 2025 I Volume 46, Issue 1

INNOVATION Independent Automated Verification and Validation Testbed for Test and Evaluation

Emily Pozniak

Emily Pozniak

The Johns Hopkins University
Laurel, MD, USA

David Warren

David Warren

The Johns Hopkins University
Laurel, MD, USA

Christopher Rouff

Christopher Rouff

The Johns Hopkins University
Laurel, MD, USA

Lien Duong

The Johns Hopkins University
Laurel, MD, USA

Kevin Medina Santiago

The Johns Hopkins University
Laurel, MD, USA

Alexander Lee

The Johns Hopkins University
Laurel, MD, USA

Kurt Seidling

The Department of Homeland Security
Washington, DC, USA

 

DOI: 10.61278/itea.46.1.1001

Abstract

The Johns Hopkins University Applied Physics Laboratory (JHU/APL) in collaboration with the Department of Homeland Security (DHS) Continuous Diagnostics and Mitigation (CDM) program office has developed an automated testing infrastructure and testbed to verify the CDM solutions deployed at United States federal departments and agencies. This testing capability is secure, hosted with trusted infrastructure using Amazon Web Services (AWS), and fully customizable to adapt to a variety of use cases. It utilizes open-source tools like Selenium and Jenkins to automate CDM testing, while synthetic data is generated to independently validate
test requirements. It effectively replaces manual testing events, resulting in an increase in efficiency of testing and risk reduction throughout the CDM program. This allows for testing to keep pace with Agile development. This experience paper details the process of creating this capability and adapting it to the unique use case of complex government system test and evaluation.

CCS Concepts: • General and reference→Verification; Validation; Reliability; • Software and its engineering→Software verification and validation.

Keywords: Automated Testing, Independent Verification and Validation, Selenium, Amazon Web Services,
Jenkins, Elastic Stack

1. Introduction

Automated software testing has become critical for quickly developing high quality and reliable software, and has become an integral part of many software development methodologies and DevOps processes. This paper provides an overview of an automated testing infrastructure and testbed for the Department of Homeland Security (DHS) Continuous Diagnostics and Mitigation (CDM) program to verify the layered DHS CDM software system, henceforth referred to as the “CDM solution”. Once tested, the CDM solution is used by the United States federal departments and agencies (D/As) to manage and protect their networks and data. The Independent Automated Verification and Validation (INNOVATION) testbed, henceforth referred to as “the Testbed”, was developed to verify that the CDM solution at a federal agency is capable of:

(1) Collecting all required data from an agency network,
(2) Transmitting and formatting data across each of the CDM layers, and
(3) Displaying the transmitted data on a dashboard as intended in defined acceptance criteria.

The CDM program has used manual testing methods for independent verification and validation (IV&V), but these methods of testing can struggle to keep up with the pace of Agile development and adapt to the increasing complexity of systems. Automated testing has become an industry standard in software development, and tools like Selenium have been used since the early 2000’s to automate testing. This work involves the aggregation of various existing tools in order to create an environment that is adaptable and able to automate testing in the unique use case of test and evaluation of a complex government system.

The CDM program handles data for government agencies related to cybersecurity risks, including hardware and software asset information, hardware and software configurations, asset vulnerabilities, accounts, and privileged users. This data is highly sensitive and important for cybersecurity of government agencies, so the Testbed is required to be secure, data robust, and have high accuracy. This paper describes the experience of creating the Testbed, and applying it to different use cases across the CDM program.

The remainder of this paper is structured as follows: Section 2 provides background information about automated testing, Agile software development methodology, and DevOps. It also discusses current industry trends in the field of automated testing. Section 3 describes the CDM program, including its purpose, architecture, and types of data handled. Section 4 describes the state of the CDM program prior to the implementation of the Testbed, and the motivations for creating the Testbed. It describes two use cases, dashboard testing and data validation testing, in which the Testbed was applied. Section 5 describes the components of the Testbed and how they work together to perform successful automated testing. It also describes the tools used to create the Testbed and how it was implemented for the two use cases described in Section 4. Finally, Section 6 describes the results and benefits of the Testbed, as well as lessons learned and opportunities for future work.

2. Background

DevOps is a software development methodology where software development teams and operations teams collaborate to deliver software in a way that is agile, scalable, and cost effective [16, 22]. The backbone of the DevOps methodology is the continuous integration/continuous deployment (CI/CD) pipeline [16, 19, 22]. The CI/CD pipeline is the process of building, testing, and deploying code, and should be automated to allow for faster testing and development [19].

Agile refers to the model of software development and deployment which focuses on quick turnarounds to provide high quality software on a consistent basis [14]. In the Agile framework, larger tasks are broken down into segments called “sprints”, during which the team develops a functionality or capability and tests against defined acceptance criteria. This method is very effective because it allows for more adaptability and a quicker response to changes. It is typically found to be more productive than traditional waterfall software development and deployment methods. One major part of the Agile methodology is the continuous testing of software, and test automation is critical to maintaining the fast pace of Agile development.

Software testing is built on test cases, which include acceptance criteria to determine whether the system under test is functioning as expected [14]. Manual testing is the process of testing software by having users run through test cases by hand to look for flaws. This requires a large amount of time and effort, and the results are typically not reusable. Manual testing is also prone to human error.

Software test automation is used to test software with minimal or no human intervention [18]. In Agile DevOps, test execution is triggered by a software development milestone, such as a sprint completion or a code freeze. The test results are validated against expected test results using baseline acceptance criteria, requirements, or assertions. The main advantages of test automation are time saved in testing large requirements sets, and improved software quality resulting from more frequent testing. Automated tests find flaws in the software much more quickly, which allows for the flaws to be fixed earlier in the development process, saving time and resources. Automated testing also increases the reliability of the software and ensures that it satisfies all design requirements [14]. Automated tests can be run repeatedly with different inputs to verify expected behavior of the software in a variety of use cases. Test automation is a key part of delivering useful software products, and it is a best practice to run automated tests as often as possible in a CI/CD pipeline [23].

The first practical implementation of software testing surfaced in the 1970’s, and script-based tools for test automation have been around since the early 2000’s [20]. Selenium WebDriver, a tool used in the Testbed, is a popular open-source, script-based automation tool. It tests a web browser graphical user interface (GUI) by automating user control of, and interaction with, the GUI, and by verifying whether the GUI’s responses to user actions are the expected responses. It does this testing by means of test automation functions that can be written in a variety of different coding languages [14].

Recently, no-code and low-code automation tools are rising in popularity because they allowentire teams to contribute to automated testing, regardless of an individual’s coding knowledge [23]. As such, these tools allow Agile development and testing teams to work together seamlessly to improve the speed and quality of automated testing [23]. Artificial Intelligence (AI) and Machine Learning (ML) are also on the forefront of automated testing innovation, and can be used for a variety of purposes to allow for bug prediction, automatic test case generation, and test case prioritization [10].

The development of an automated testbed for the CDM program is unique because of its specialized application. This differs from other software engineering processes, as the testing spans across six CDM task orders, each with their own unique architecture that is comprised of unique software deployments. The CDM solution to be tested by the Testbed therefore has a different architecture for each task order. Furthermore, the sensitive nature of the data adds a further challenge in making sure data stays secure throughout the automated testing process. The automated test routines verify that the CDM solution under test:

• Correctly senses data on a simulated agency network
• Integrates and normalizes the sensed data
• Presents the normalized data to an emulated agency dashboard
• Summarizes the agency dashboard data to an emulated federal dashboard

The remainder of this paper details the experience creating and developing the Testbed to automate CDM testing while ensuring government data in the CDM solution under test remains intact and protected.

3. Continuous Diagnostics and Mitigation Program

The CDM Program is overseen by DHS to provide a government-wide monitoring capability with objectives to reduce attack surfaces, increase visibility into the government’s cybersecurity posture, improve responses to threats, and streamline reporting [7–9, 11, 13]. The program was established to provide federal agencies with tools and capabilities to identify, prioritize, and mitigate cybersecurity risks. The program is structured around four key capabilities described below [7]:

  • Asset management: Provides an inventory of all devices and software applications, their configurations, and their dependencies.
  • Identity and Access Management: Manages account access, privileges, credentials, and authentication; manages security-related training.
  • Network security management: Provides network traffic monitoring, detects and responds to security incidents, and protects against unauthorized access and data exfiltration.
  • Data protection management: Provides data safeguards such as confidentiality, integrity, and availability. Includes encryption, data loss prevention, and secure data handling.

Finally, a dashboard receives, aggregates, and displays information from CDM tools at the agency and federal levels. The CDM capabilities above are continually improved over time and are updated with new cybersecurity technologies as they become available.

3.1 CDM Layered Architecture

Figure 1 shows the CDM system architecture [7]. The architecture includes four distinct layers: Layers A, B, C, and D. Layer A consists of tools and sensors that detect and retrieve data from the agency network. Layer-A data includes data for asset management, identity and access management, network security management, and data protection management. Examples of Layer-A data include:

• Hardware and software asset information
• Hardware and software configurations
• Asset vulnerabilities
• Mobile device management information
• Accounts and privileged users
• System Boundary information
• Remediation status of detected hardware and software vulnerabilities

Layer-A tools send their retrieved data to integration tools that aggregate and normalize into object-level data. The integration tools comprise Layer B. The Layer-B tools send their object-level data to presentation tools that display the received data in a user-friendly format. The presentation tools comprise Layer-C. They typically consist of the agency dashboard and the database used by the dashboard to store its data. The agency dashboard calculates and displays an Agency-Wide Adaptive Risk Enumeration (AWARE) score, which gives agencies situational awareness of their current cybersecurity risk. Finally, the agency dashboard data is sent to the federal dashboard (Layer D) where it is further visualized and analyzed, and a federal AWARE risk indicator score is calculated. This allows CDM federal dashboard users to identify areas where cybersecurity can be improved across the federal civilian government. Data flows bidirectionally between the agency and federal dashboards, with the federal dashboard sending information about threat intelligence and requests for information, and the agency sending agency dashboard summaries with calculated AWARE scores [6, 7].

4. Problem Statement

Integration of CDM components is done by System Integrators (SIs). They are also responsible for the deployment, testing, and maintenance of the CDM System deployed within their assigned agencies. This includes the Layer A tools and sensors, the Layer B integration collection system, and the agency dashboard. The majority of integration tests are performed manually by the SIs and observed by the CDM test team.

The CDM dashboard software developers utilize the Scaled Agile Framework (SAFe) test and development methods to develop and build the agency and federal dashboards [12]. Dashboard testing is conducted over 2 week sprints during the development cycle, and regression testing is performed at the end of development cycles. This in turn requires a quick turnaround for System Integrators (SIs) to conduct integration testing, again, which is performed manually for the most part.

Fig. 1. CDM Systems Architecture

Fig. 1. CDM Systems Architecture

To address the fast turnaround time utilized by the dashboard developers, and data quality issues that arose with the integration of multiple tools and sensors, the Testbed was developed as an IV&V resource to support functional testing of the agency and federal dashboards during the development process. During development, the dashboards must be tested to ensure that functional requirements are met prior to release to the SIs for integration at the D/As. This testing is referred to as dashboard testing.

Before the introduction of the Testbed, the test team had been manually executing the developer-provided dashboard test cases in a sandbox test environment, and/or the developer demonstrated the test cases under test team observation. Manual tests were not able to keep pace with the Agile development cycle.

As a first method of automated testing, the test team attempted to use the dashboard developers’ automated test processes for IV&V purposes. While attempting to use these processes, the test team encountered issues that did not support the IV&V process. The most critical issues included:

  • There were no test scripts (test cases and steps) to verify proper procedures during testing,
  • Some automated test results did not provide sufficient detail for IV&V of test steps, test cases, user stories, and associated acceptance criteria,
  • The sandbox environment was also used by users outside of the test team, who would modify datasets for other purposes (e.g., training, unrelated testing, etc.), and
  • Low-fidelity data did not support adequate verification of the acceptance criteria under test.

The Testbed was developed in response to the above and other issues to provide IV&V testing of agency and federal dashboard functionality. The Testbed has capabilities to independently develop automated test scripts, generate the necessary test data, execute test scripts, generate test artifacts for verification, and output standardized IV&V technical reports of test results with sufficient detail for verification. Robust data generated by the Testbed was leveraged to successfully conduct functional testing of the agency and federal dashboards.

Given the success of using the Testbed for testing dashboard functionality, the test team decided to leverage the technology to conduct data validation testing. Data validation testing occurs when SIs compare the data in the integration layer (Layer B) to the data in the agency dashboard (Layer C). Each SI utilizes different software and tools to achieve this, but the main focus is assuring the data quality and accuracy between the two layers. Data validation testing was identified as a useful application of the Testbed’s capabilities after the successful use of the Testbed for dashboard testing.

Data validation testing is currently a fully manual process. It involves the SIs developing test cases to validate each requirement, and sending the test cases with test steps to the test team team to review. The test team reviewers must ensure that each test case validates the appropriate requirements. The process takes approximately 5 weeks. Afterwards, the SI must prepare their test environment and ensure that all necessary data is in the test environment. This process takes an additional 3-5 weeks. There is then a Test Readiness Review (TRR) with the test team and the SI that involves the two parties meeting to ensure that the tests and test environment are ready to proceed. Finally, the tests are carried out manually during a test event by the SI and observed by the test team. This process typically takes 2-5 days. Finally, the test team generates an IV&V report which documents the results and observations of the test event. Overall, the process of manually conducting data validation testing takes 8-10 weeks.

The Testbed’s technology has been successfully applied to the use case of data validation testing in the form of a proof-of-concept. The adoption of the Testbed’s technology for data validation testing would eliminate the need for test case reviews. It would also allow for testing to be done automatically and independently by the test team, so it would eliminate the need for SI-run test events and for the SIs to prepare test environments. This would reduce data validation test and evaluation processing time from 8-10 weeks to 1-2 days, increasing the efficiency, repeatability and the accuracy of tests.

5. Methods

This work presents an independent and automated dashboard test environment to verify requirements more efficiently, thus keeping pace with Agile development. The Testbed is an IV&V testing process in direct support of the CDM program’s integrated test and evaluation strategy. It is a full software solution for test and evaluation, relying entirely on autonomous, code-driven systems to verify and validate CDM dashboards. The Testbed’s capabilities support the demands of the Agile development and test environment as well as significantly improve test processes.

The Testbed utilizes Selenium automation, Jenkins CI/CD tool, the Elastic stack and Python code to run test cases without the need for human intervention. The tests are run against emulated Kibana dashboards, which are set up to look exactly like the real agency and federal dashboards. Due to the sensitive nature of the data, the dashboards consist of test data from the SI, or custom generated data that is formatted to look and behave like the actual normalized agency object-level data. Test scripts are written for each test case and executed by the Jenkins tool. When each test script is run through the system, an automated browser opens and executes each step of the test script. When the test has finished running, the Testbed outputs a test report which includes screenshots of each step, any other necessary saved information from the step, and a score of pass or fail. These reports either inform test script modifications needed to address bugs or further IV&V, in which case more tests are run, are sent to the developer to make improvements to the software, or are reported as a deliverable to DHS. The Testbed has been applied to dashboard testing successfully, and has been applied to data validation testing in a proof-of-concept.

5.1 Testbed Architecture

Figure 2 shows the architecture of the Testbed. The Testbed consists of an automated test driver (referred to as the Test Automator) powered by Selenium using Python code. It also consists of emulated federal and agency Kibana dashboards, which are hosted on AWS instances. It includes a Python module (referred to as the Data Generator) to generate synthetic data that looks and behaves like actual normalized agency object-level data. A PostgreSQL intermediate database hosts data as it is generated. This data is sent to Elasticsearch and visualized in the federal and agency Kibana dashboards using the Elasticsearch Python client.

In the case of data validation testing, Layer B data from the SI is also sent to Elasticsearch and visualized in the agency and federal dashboards. The Layer B data is stored in the Testbed for comparison during tests.

The Testbed ingests test cases, which include all the necessary steps to validate a requirement. The test cases are turned into test scripts, which map each step to a Python function in the Test Automator. There are a wide variety of existing functions from previous test deployments, but new functions are written as needed. The Test Automator simulates the browser clicks and UI actions to carry out the test case in the agency and federal dashboards. It compares to the previously mentioned Layer B data as needed for validation. Finally, the Testbed outputs automated test reports, which include screenshots of each step and a score of pass or fail, along with other metrics and information related to the test.

Testbed Architecture Diagram with Tool UtilizedFig. 2. Testbed Architecture Diagram with Tool Utilized

5.2 Tools Utilized

Various open-source tools were utilized to create the Testbed. Figure 2 shows where each tool or programming language was used in the architecture of the Testbed. Selenium web driver was chosen because it is one of the most popular open-source tools for automated testing, and it allows for writing reusable functions in Python code to automate user interface (UI) actions. The seamless integration with Python makes it easy to carry out actions in code and compare them to the results of actions in the browser. Python was chosen as the main coding language for this project due to its simplicity and flexibility, allowing for many different developers to work together seamlessly.

Jenkins is another open-source tool, that was chosen because of its flexibility which allows for automation of each different component of the Testbed. The Elastic stack is a combination of Elasticsearch, Logstash, Kibana, and Beats that has the ability to store, search and visualize data [1, 3–5]. It was chosen for the Testbed because it is used in the agency and federal systems under test. The agency and federal Kibana dashboards are hosted in the Amazon Web Services (AWS) cloud environment, which allows for the ability to quickly stand up, tear down, and store test environments. Finally, a PostgreSQL database was chosen to store data as it is generated. PostgreSQL is a well known, highly advanced open source database [15]. These are safe, secure and dependable tools that are widely used in industry and allow for maximum automation of test processes. These tools may be added or updated based on dashboard configurations being emulated in the testbed environment. In future implementations, new tools could replace these tools while maintaining the same framework.

5.3 AWS Architecture

The Testbed is hosted in the Amazon Web Services (AWS) cloud environment. This was chosen because a key benefit of using cloud computing is the ability to rapidly stand up, tear down, and store the test environments, which will support meeting Agile software test and development timelines. A cloud environment was also the most cost-effective and efficient method based on the computational resources needed and duration of utilization.

The Testbed is deployed inside the AWS secure GovCloud network, which is restricted to U.S. Government D/As, as well as contractors and other U.S. customers that support them. GovCloud is a Federal Risk and Authorization Management Program (FedRAMP) offering that is high baseline certified and meets Department of Defense (DoD) Cloud Computing Security Requirements Guide Impact Levels 2, 4, and 5 [21]. All data traffic is encrypted at the network layer as well.

5.4 Elastic Stack and Kibana Dashboards

The Elastic stack refers to a stack compromised of Elasticsearch, Logstash, Kibana, and Beats. The Testbed utilizes Elasticsearch for data storage and search and Kibana for data visualization. Elasticsearch is a search and analytics engine and Kibana is a data visualization and exploration tool [1]. Elasticsearch provides an official Python client which is used in this work to send data to Elasticsearch using Python.

Most testing that occurs as a part of the CDM program is testing data in Kibana dashboards. These Kibana dashboards emulate the agency and federal dashboards, and visualize test data which is stored in Elasticsearch. The dashboards are kept in sync with the actual agency and federal dashboards through periodic updates, referred to as releases. Dashboard testing occurs with each release to ensure functionality of the dashboards. When conducting dashboard testing, the dashboards are populated with synthetic data that is generated to look like actual normalized agency object-level data. When conducting data validation testing, the dashboards are populated with test data sent from the SI’s and supplemented with synthetic data. The Kibana dashboards are hosted on Amazon Elastic Compute Cloud (EC2) instances in the AWS cloud environment.

5.5 Test Automator

The Test Automator is the component of the Testbed responsible for parsing test cases into test scripts, test script execution, and outputting a test report. It consists of Selenium functions written in Python, which are mapped to all necessary actions a user would carry out to complete a test, including actions to log in to Elasticsearch, perform queries, read results, assert one variable is equal to another variable, and even compare data found in Elasticsearch to outside data. It also contains Python code to appropriately calculate metrics, including whether each step passed or failed based on expected test results or requirements. These metrics, along with screenshots and any other saved text are included in the automatically generated test reports. The automatically generated test reports provide thorough documentation of the test process.

As the Test Automator executes a test script, it opens an automated web browser, navigates and logs in to the federal or agency dashboard, and executes each action to complete the test. In the case of data validation testing, the test automator uses a file previously sent by the SI to compare to the data in Elasticsearch.

Data Generation ProcessFig. 3. Data Generation Process

5.6 Data Generator and Acquisition of Layer B Data

The Data Generator is a Python module which generates synthetic data to populate the dashboard. It utilizes the Faker Python package to generate data based on the known schema that the actual normalized agency object-level data takes on. This data looks and behaves in a realistic manner, but is entirely synthetic. When the data is generated, it is stored in a PostgreSQL intermediate database before being sent to the dashboard via Python code utilizing the Elasticsearch Python client and visualized in the agency and federal dashboards. Figure 3 shows the process of generating data, storing it in the PostgreSQL database, and then sending to the dashboard.

For the purpose of data validation testing, it is required to use data from the SI in order to ensure that the data coming from the Layer B tool matches the data in the dashboards. In this case, the SI sends the Layer B data via a secure channel, and the Layer B data is mapped to the appropriate location in the dashboard using a previously sent Requirements Traceability Matrix (RTM). The RTM includes the expected mappings of Layer B data to the dashboard data field targets. For the prototype data validation test implementation, the SI sent a spreadsheet with the data in various columns, and the team used the RTM to determine where each column of data belonged in the dashboard. Once the data is mapped, it is sent to the dashboard using the Elasticsearch Python client. The Layer B data is then used to compare the data in the dashboards to perform the data validation test. For the prototype implementation, the original spreadsheet containing Layer B data was saved and used for comparison to the data output in the dashboard. This process is explained in further detail in section 5.10.

5.7 Jenkins

Jenkins is a CI/CD tool that can execute a pipeline consisting of one or many jobs either automatically or manually. It is easy to install and configure, and can distribute work across multiple platforms [14]. It also supports many plugins, allowing for automating all parts of the build, integration, and deployment process [22].

For this work, Jenkins is used to run multiple test scripts through the Test Automator without the need for human intervention. It is also used to automatically parse test cases into test scripts.

5.8 Test Case Writing Tool

The test cases are written by the SI or dashboard developer and sent to the test team. These test cases include test steps as instructions for a human to execute the test manually. In order to automate these test cases, each step must be mapped to a Python function in the Test Automator. For example, if a test step says “Click the button that says “Discover” to navigate to the Discover page of the dashboard,” it must be mapped to the function in the Test Automator that clicks the Discover button. This mapping is in the form of a test script, which is a file that has a set of instructions for the Test Automator.

So far, this mapping has been done manually. The test team created various tools to leverage Natural Language Processing (NLP) to automatically parse and auto-generate test scripts for execution. Alternatively, Large Language Models (LLMs) were investigated as an option for this functionality, but were not chosen due to LLMs being unreliable because of hallucinations, or instances where the model generates inaccurate or fictitious information [17]. It is important that the process of converting test cases to test scripts is ensured to be completely free of errors.

The test case writing tool is a prototype which assists the test case author to write test cases in a format that enables automation of test script generation. The tool leverages existing functions from the current code-base to standardize test case nomenclature to enable automated parsing of test cases.

The prototype tool is a user-interface that allows the test case author to pick from a drop-down menu of steps to expedite the writing of a test case. Depending on the step, additional drop down menus will appear to add required information. For example, a step could be “Log in”, or it could be “Change to the { } Dashboard”, in which case another drop down would appear to choose Dashboard 1, Dashboard 2, or Dashboard 3. This standardization allows for simple, accurate automation of test cases to test scripts, because they will all have matching language.

If the test case author needs to add a step that is not in the existing knowledge-base of known functionality, there is an option to add a custom step.

(1) In this case, there would be additional testbed development necessary to support a new function.

(2) As the Testbed cycles grow, the functional knowledge-base grows. Eventually, test script generation can develop into being a fully automated process once all possible functionality is eventually defined in the Testbed knowledge-base.

(3) Hands-on testbed development would only be needed for any emerging new functionality and therefore would minimize the amount of human-in-the-loop activity in the midst of automated testbed processes. With the continued development and eventual adoption of this tool, individuals who have no coding experience will be able to write automated tests, keeping pace with the current industry trend for no/low-code test automation. This will also allow for new test cases to be written quickly as new requirements are developed in response to new functionality or known issues.

5.9 Use Case 1: Dashboard Testing

The Testbed was used successfully to verify and validate functionality of the dashboard releases in their entirety, effectively replacing manual tests. Figure 4 shows the data flow architecture for dashboard testing. Dashboard testing must occur every time a new dashboard update is released by the dashboard developer. The updated dashboard comes to the test team without any data, alongside test cases with steps to ensure that the functional requirements are met.

Prior to the existence of the Testbed, the team had been manually executing the test cases to validate functional requirements using low-fidelity test datasets provided by the developer. In the case of dashboard testing, the Testbed is used to independently develop automated test scripts, generate the necessary test data, execute test scripts, generate test artifacts for verification, and output standardized IV&V technical reports of test results with sufficient detail for verification.

Fig. 4. Data Flow ArchitectureFig. 4. Data Flow Architecture

The process first begins with setting up the Testbed environment to emulate the current dashboard configuration under test. Then, synthetic data and test scripts are generated. The Data Generator generates random synthetic data in the format of emulated agency data, then pushes the data to the dashboard to populate it.

Next, the test cases are parsed to create test scripts. This requires that each human-readable step is mapped to a function in the Test Automator. Currently, this process is manual, but it would be replaced by the Test Case Writing Tool outlined previously to skip the step of test case writing and move directly to test scripts. Once the test scripts are prepared, they are each input to the Test Automator using Jenkins. The Test Automator opens a browser and pulls up the dashboard to automatically carry out each step. After each step, it takes a screenshot for the final report.

Once the test is complete, the Test Automator outputs a report with a description of each step, a screenshot of the browser before and after carrying out the step, and a score of pass or fail. A report is generated for each test script, and these reports are read by the test team to evaluate the performance of the dashboard. Often times, this process resulted in the team making suggestions to improve the test cases and achieve more robust verification and validation overall.

Dashboard testing was completed once for each updated version of the dashboard, with a regression test at the end of the development cycle to re-run all previous test cases. After the regression test, a final report with outcomes from all the tests is generated. Testing often during the development cycle is key for complex systems, so the goal for future testbed development is to test during sprint development cycles to further inform development of any issues, reduce risk of critical errors, and expedite the development process to final deployment.

Data validation testing processFig. 5. Data validation testing process

This saved time in completely eliminating manual tests. The process of generating custom data successfully solved the issue of corrupted and low-fidelity data that was encountered while using the developers’ built in processes. It also eliminated the issue of unavailable or outdated test scripts because custom test scripts could be written using the provided test cases. Finally, test reports can be customized to IV&V needs and report granular details with accuracy as needed by the test team. All of this allowed for testing to align with Agile development.

5.10 Use Case 2: Data Validation Testing

Data Validation Testing ensures data quality and accuracy after SIs integrate a dashboard deployment to their existing CDM solutions at the agency. This is when SIs compare the data in the integration layer (Layer B) to the data in the dashboard (Layer C). Each SI has different tools to achieve this, but the main focus is the data quality and accuracy between the two layers. Figure 5 shows the process of data validation testing. Layer B data is sent to Layer C to be displayed in the dashboard. The red line indicates the data validation test, which ensures that the data in the dashboard visualizations matches the original Layer B data.

Given the success of using the Testbed for dashboard testing, the test team decided to leverage the same technology to conduct data validation testing. Data validation testing is currently a completely manual process, so the usage of the Testbed to conduct it is only a proof-of-concept; however, it shows potential for greatly increasing efficiency and the quality of tests.

The current manual process of data validation testing takes approximately 8-10 weeks to complete. It involves the SIs developing test cases to validate each requirement and sending the test cases to the test team for review. The test team reviewers must ensure that the test cases validate the appropriate requirements, a process that takes approximately 5 weeks. Next, the SI must prepare their test environment and add all necessary data. Finally, there is a TRR with the test team and the SI to ensure that the test environment is correct and the tests are ready to proceed. Once all of this is completed, the manual tests are carried out with the test team observing, followed by the test team writing an IV&V report.

The process of data validation testing with the Testbed is much simpler and can be accomplished in two days rather than ten weeks. First, the SI provides the test cases to the test team. The test team must then parse the test cases to create test scripts. This process is manual currently, but could be replaced by the Test Case Writing Tool, which would simplify the process further by eliminating the test cases and having the SIs easily generate test scripts.

Next, the SI provides the test team with the Layer B data from their individual Layer B tool. The data transfer is done via a secure external share drive or S3 bucket. Next, the RTM is used to map the Layer B data to the Kibana dashboard. This mapped data is then sent to Elasticsearch using the Elasticsearch Python client and visualized in the Kibana dashboard. The Layer B data is also saved as a CSV file. Additionally, the Data Generator can be used to supplement the Layer B data with additional synthetic data in the dashboard.

Finally, the test scripts are each input into the Test Automator using Jenkins. Each test is carried out in the same manner as dashboard testing, and the reports are generated with screenshots and information about whether each step passed or failed.

Carrying out this proof of concept implementation required only minor changes to the existing Testbed, which demonstrates its ability to be used for diverse use cases without major modifications. The main additions were the functionality to upload custom Layer B data to the dashboard, and the ability to compare data in the dashboard to data from Layer B tools stored in an external CSV file. There was also functionality implemented to print information about data on the test report, to ensure that all necessary data for validation is included in the report.

6 Results and Discussion

The integration of dashboard testing to the CDM test process resulted in the following benefits:

  • Risk reduction via repeatable testing and higher fidelity test results: The tests were run repeatedly as
    opposed to only once, which was the case with manual tests.
  • Developer level-of-effort reduction: Manual test events with the dashboard developer were made unnecessary, and repeated testing could occur without developer involvement.
  • Risk reduction via data quality assurance: The Testbed generated synthetic data to robustly verify and
    validate user story functionality. The test results would have higher fidelity as a result of vastly more
    automated test runs and robust data, and potentially earlier detection and assessment of defect functionality.
  • Continuous process improvement to streamline test processes: The team continued to make improvements to the Testbed to improve the speed of testing. In the second iteration of automated dashboard testing, the existing user stories had to also be tested with the additional/updated user stories in the updated dashboard. The first iteration of automated testing took 12 weeks to verify 44 user stories, and the second iteration took only 9 weeks to verify 112 user stories. As a result, the Testbed was able to test more user stories in less time (340% faster) than the first iteration of testing. This pace of testing will continue to improve as the knowledge data base increases and the test script generation process becomes automated.

The data validation testing functionality of the Testbed exists currently as a proof of concept, however there
are major anticipated benefits to the SI through the adoption of this tool and test methodology. Some of the
anticipated benefits of automated data validation testing are:

  • Reduction in SI level-of-effort: The Testbed would empower SIs to focus on test case creation and generating Layer B data without having to be present to demonstrate test cases in a formal IV&V event.
  • Streamlined testing: Testing is streamlined as a real-time CI/CD pipeline so that test cases can be executed without SI presence as soon as test cases are developed.
  • Time Savings:
    – Streamlined automated testing, thus shortened test events (reduced schedule) instead of long and
    repetitive manual test events.
    – Reduced time spent on test artifact reviews as test scripts and test data matures as the process
    matures (i.e., reuse of test data and test case knowledge).
  • Risk reduction: Risk would be reduced as automation introduces vastly more test runs than a singular
    test event. The earlier a program is able to identify risks, the less costly it will be to resolve the risks (cost savings).

It is anticipated that over time, returns on investment will grow each time the Testbed is used. Test execution will be completed faster and with higher fidelity as more test cases are automated and more functions are written. The standardized test approach and streamlined testing should ensure more complete verification and validation of data target requirements, using repeatable test methods.

6.1 Lessons Learned

The development of this Testbed was an iterative effort, and there were lessons learned along the way that led
to higher efficiency and better performance.

Design First Approach
It proved beneficial to map out test cases before writing code. Originally, the team wrote Test Automator functions before knowing all the steps in the test cases. This created redundant or outdated code which became more difficult to maintain as new dashboard versions were released. Designing first and refactoring as necessary creates more efficient and accurate testing while reducing developer workload.

Create Concise Tests

It was also very beneficial to break down larger test cases into smaller test cases, as well as document any steps that needed to be completed manually prior to automation, such as making accounts or putting files in the right locations. This makes tests easier to follow for an inexperienced tester and more logical to follow for an experienced tester.

Avoid Reliance on Specific Data Values

In cases where the test does not validate a specific data value, it is best to write the test scripts so that they are generic enough to be re-used with different data sets. This avoids the extra work of re-writing the test later on when the data changes, and to be re-used used for different situations. For example, rather than testing that the a dashboard table A has data value X, it is more efficient to test that the value in table A matches the data value in table B.

Test Case Standardization

Before creating the test case writing tool, various tools were developed to parse through the existing test cases and automatically generate test scripts based on previous test case/test script pairs. Although this was possible, it was prone to error and would not register that two sentences with the same meaning using synonyms or different wording should be mapped to the same function. Although this was improved with natural language processing measures like sentence similarity, there was still a high chance of error or only partial mapping of test cases that required manual effort to review and correct. The potential to use LLMs was also explored to address this problem, but LLMs are also prone to error and it would require manual review to ensure that there were not any hallucinations or inaccuracies in the model output [17]. Eventually, due to the need for high accuracy in test script writing to ensure confidence in test cases, it was determined that it would be more beneficial to standardize the writing of test cases with automation in mind prior to delivery to the test team for testing. Therefore, the previously mentioned test case writing tool was developed to standardize test case writing and allow for accurate, streamlined test script generation in the Testbed process.

Identify Gaps in Test Coverage

The team learned to identify testing gaps while automating tests, which led to refinement of the original test cases to perform more accurate and reliable tests.

7 Conclusion

The DHS CDM test team has successfully created an automated solution for dashboard and data validation testing for the CDM program. The use case is applicable to integrated systems with large amounts of highly sensitive data moving across architectures with an array of tools, integration layers, and dashboards. The Testbed introduced automation to a use case that was previously relying on manual testing, providing a number of benefits. This capability has replaced manual testing in the case of dashboard testing, and may replace data validation testing as well.

This Testbed provided a solution to many test issues via time savings, repeatability, risk reduction, and reduction in SI level of effort. The implementation of the Testbed has resulted in a decreased risk of human error, and increased confidence in tests, as tests are run more often in an Agile development cycle. It also resulted in higher fidelity test reporting with more detail, and allows bugs to be caught earlier in the development cycle, saving time and money. The Testbed has allowed for the test cycle to be reduced from 8-10 weeks to only a few days. In addition, the capability of the Testbed to keep up with the innovative industry trend of low/no-code test automation by providing a tool to standardize test case writing and makes parsing of test cases into test scripts immediate.

The Testbed provides immense benefit to the CDM program as a whole. It can be used to verify CDM solutions being deployed in various agency production environments and automate testing across the program. It has been used for developmental, functional, and data validation testing but it can also support operational testing utilizing the same framework and datasets. Operational testing is currently conducted separately from this effort. Additionally, the data being generated for CDM testing can be leveraged as an enriched data source for other potential applications of testing throughout the broad CDM domain.

7.1 Next Steps

A priority for future development of this tool would be the incorporation of Machine Learning (ML). There are
various pieces of the Testbed that could benefit from the incorporation of ML algorithms. Most notably, an ML
algorithm could be used to generate test cases to increase test coverage and reduce the human effort of writing test cases, or simply assist the human with writing test cases to save time. ML algorithms could also be used for test case prioritization, and to determine which test cases are most important and should be run earlier in the automated pipeline [10].

Another future priority would be end-to-end testing, and fully automating the testing pipeline of a system of systems (SoS). End-to-end operational testing is essential for complex systems, such as the CDM system [24]. End-to-end testing validates the data quality, timely transition, and accurate reporting of lower level data (Layer A for CDM) at higher levels in a SoS architecture (Layer C and Layer D for CDM). End-to-end testing supports verifying key performance parameters (KPPs) in test scenarios supported by generated test data. KPPs are critical performance metrics to assess the success of a system in an operational assessment. Currently, the Testbed only tests data at the data integration layer (Layer B), so this would involve emulating lower layer data or routing lower layer data to the Testbed to conduct this testing.

Finally, the Testbed could be used to test new tools within an architecture and AI applications. As the CDM program implements new technologies, there will be a need for additional automated testing for these technologies, each with their own challenges. The Testbed is versatile enough that it can be easily adapted to new use cases.

8 Acknowlegments

We would like to express sincere gratitude to the CDM Program Management Office (PMO) leadership for their support in this work: Program Director Matt House and Deputy Program Director Richard Grabowski. We would also like to thank Deanne Harwood, the dashboard government test lead, for facilitating support and collaboration with the dashboard developers. We also like to thank JHU/APL Program Manager Hitesh Patel for his support, guidance, and oversight from conception to current development. In addition, we acknowledge Randall Fuller, Radha Kowtha, Andrew Liu, Lynne Ambuel-Donaldson, Suzanne Hassell, Mihir Joshi, Borge Nodland, Tammy Parsons, and Daniel Ryan for their authorship of the original whitepaper for the Testbed and their contributions in developing the Testbed; Monica Waters for authorship of the SI Testbed whitepaper. Finally, we would like to acknowledge Dillon Prendergast, Gabel Wright, Yasemin Oldac, Rohan Marangoly, Adrian Cheung, and Alex Carney for their work on the development of the Testbed.

References

[1] Amazon Web Services. 2024. What is an ELK Stack. Amazon Web Services. Retrieved April 16, 2024 from https://aws.amazon.com/whatis/elk-stack/.

[2] Ivan Andrianto, M. M. Inggriani Liem, and Yudistira Dwi Wardhana Asnar. 2017. Web application fuzz testing. In 2017 International Conference on Data and Software Engineering (ICoDSE). 1–6. https://doi.org/10.1109/ICODSE.2017.8285893

[3] Asjad Athick and Shay Banon. 2022. Getting Started with Elastic Stack 8.0: Run powerful and scalable data platforms to search, observe, and secure your organization. Packt Publishing Ltd.

[4] Saurabh Chhajed. 2015. Learning ELK stack. Packt Publishing Ltd.

[5] Vladimir Ciric, Marija Milosevic, Luka Mladenovic, and Ivan Milentijevic. 2023. Clustering and Visualization of Network Security-Related Data using Elastic Stack. In 2023 10th International Conference on Electrical, Electronic and Computing Engineering (IcETRAN). IEEE, 1–5. https://doi.org/10.1109/IcETRAN59631.2023.10192106

[6] CISA. [n. d.]. Continuous Diagnostics and Mitigation Program, “Dashboard Ecosystem Fact Sheet. DHS CISA. Retrieved March 12, 2024 from https://www.cisa.gov/resources-tools/resources/cdm-dashboard-ecosystem-fact-sheet.

[7] CISA. 2020. CDM Program Overview Fact Sheet. CISA. Retrieved March 9, 2024 from https://www.cisa.gov/resources-tools/resources/cdmprogram-overview-fact-sheet.

[8] Kevin Cox and Mark Kneidinger. 2017. Protecting the crown jewels of the government through infrastructure resilience and the DHS Continuous Diagnostics and Mitigation programme. Cyber Security: A Peer-Reviewed Journal 1, 2 (September 2017), 147–155. https://ideas.repec.org/a/aza/csj000/y2017v1i2p147-155.html

[9] Daniel M. Gerstein. 2015. Strategies for Defending U.S. Government Networks in Cyberspace. RAND Corporation, Santa Monica, CA. https://doi.org/10.7249/CT436

[10] Mahmudul Islam, Farhan Khan, Sabrina Alam, and Mahady Hasan. 2023. Artificial Intelligence in Software Testing: A Systematic Review. In TENCON 2023-2023 IEEE Region 10 Conference (TENCON). IEEE. https://doi.org/10.1109/TENCON58879.2023.10322349

[11] Jeff Kaibjian. 2014. Enhancing Security in Telemetry Post-Processing Environments with Continuous Diagnostics and Mitigation (CDM). http://hdl.handle.net/10150/577520

[12] Marcelo Marinho, Rafael Camara, and Suzana Sampaio. 2021. Toward Unveiling How SAFe Framework Supports Agile in Global Software Development. IEEE Access 9 (2021), 109671–109692. https://doi.org/10.1109/ACCESS.2021.3101963

[13] Matthew E. Morin. 2016-09. Protecting networks via Automated Defense of Cyber Systems. https://hdl.handle.net/10945/50600

[14] Visnu Sangar N, Lalit Mohan Saini, and Harish Mohan. 2022. Cloud Computing in Automation Testing. In 2022 International Conference on Edge Computing and Applications (ICECAA). IEEE, 31–36. https://doi.org/10.1109/ICECAA55415.2022.9936462

[15] Vallarapu Naga Avinash Kumar. 2021. PostgreSQL 13 Cookbook: Over 120 recipes to build high-performance and fault-tolerant PostgreSQL database solutions. Packt Publishing Ltd.

[16] Akshit Raj Patel and Sulabh Tyagi. 2022. The State of Test Automation in DevOps: A Systematic Literature Review. In Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing (Noida, India) (IC3-2022). Association for Computing Machinery, New York, NY, USA, 689–695. https://doi.org/10.1145/3549206.3549321

[17] Gabrijela Perković, Antun Drobnjak, and Ivica Botički. 2024. Hallucinations in LLMs: Understanding and Addressing Challenges. In 2024 47th MIPRO ICT and Electronics Convention (MIPRO). 2084–2088. https://doi.org/10.1109/MIPRO60963.2024.10569238

[18] Dudekula Mohammad Rafi, Katam Reddy Kiran Moses, Kai Petersen, and Mika V Mäntylä. 2012. Benefits and limitations of automated software testing: Systematic literature review and practitioner survey. In 2012 7th International Workshop on Automation of Software Test (AST). IEEE, 36–42.

[19] Red Hat. 2022. What is a CI/CD Pipeline. Red Hat. Retrieved March 12, 2024 from https://www.redhat.com/en/topics/devops/what-cicdpipeline.

[20] Roman Balakin. 2023. The History of Test Automation. TestRigor. Retrieved March 12, 2024 from https://testrigor.com/blog/the-history-oftest-automation/.

[21] Ron Rice. 2018. Department of Defense Cloud Computing Security Requirements Guide. DISA. Retrieved April 16, 2024 from https: //disa.mil/-/media/Files/DISA/News/Events/Symposium/Cloud-Computing-Security-Requirements-Guide.ashx.

[22] Jay Shah, Dushyant Dubaria, and John Widhalm. 2018. A Survey of DevOps tools for Networking. In 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, 185–188. https://doi.org/10.1109/UEMCON.2018.8796814

[23] Stephen Feloney. 2023. State of Test Automation: Trends and Priorities for 2023. DevOps Digest. Retrieved March 12, 2024 from https://www.devopsdigest.com/state-of-test-automation-2023.

[24] Douglas Wickert. 2019. Test in the Age of Agile: Rising to the Challenge of Agile Software Development. The ITEA Journal of Test and Evaluation (2019).

Author Biographies

Emily Pozniak is a researcher at the Johns Hopkins University Applied Physics Laboratory (JHU/APL), specializing in cybersecurity for critical infrastructure systems. She has spent the past two years on the DHS Continuous Diagnostics and Mitigation (CDM) Test and Evaluation team conducting research in automated testing to enhance efficiency. She is pursuing a Master’s degree in Electrical Engineering at Johns Hopkins University and holds a Bachelor’s degree in Applied Mathematics from the College of William and Mary.

David Warren is a distinguished technical leader in the field of cyber test and evaluation, with over a decade of experience delivering innovative solutions to complex technical challenges. Currently serving at the Johns Hopkins University Applied Physics Laboratory (APL), he specializes in Independent Verification and Validation (IV&V) assessments, ensuring the integrity and reliability of mission-critical systems.

David has led high-impact projects for the U.S. Department of Homeland Security (DHS), including the Continuous Diagnostics and Mitigation (CDM) Program under the Cybersecurity and Infrastructure Security Agency (CISA) and the national Exit Lanes project for the Transportation Security Administration (TSA). His efforts have advanced national security by strengthening cybersecurity frameworks and enhancing operational resilience.

As an independent test agent and consultant, David has provided technical expertise to executive teams at over 20 major airports and 66 federal departments and agencies, optimizing testing processes and systems engineering practices for large-scale programs via automation and process innovation.

David’s technical acumen is grounded in his background as an aerospace and systems engineer, which has shaped his approach to solving complex technical challenges. Originally from Washington, D.C., he balances his professional pursuits with family life, raising his daughter in Maryland and continuing to drive advancements in the T&E community.

Christopher Rouff is a member of the Senior Professional Staff at the Johns Hopkins University Applied Physics Laboratory and a lecturer at Johns Hopkins University. He received a Ph.D. in Computer Science from the University of Southern California and a M.S. in Cybersecurity from Johns Hopkins University.

He is currently working on test and evaluation for the DHS Continuous Diagnostics and Mitigation (CDM) program. His research interests include software engineering, cybersecurity, autonomic systems, artificial intelligence, emergent behavior and memetics. He is a member of ITEA, ACM, and IEEE.

ITEA_Logo2021
  • Join us on LinkedIn to stay updated with the latest industry insights, valuable content, and professional networking!