MARCH 2025 I Volume 46, Issue 1
Testing Without Being a Tester: A Conversation with Dr. Bill D’Amico
MARCH 2025 I Volume 46, Issue 1
MARCH 2025
Volume 46 I Issue 1
Interviewed by J. Michael Barton, Ph.D., Parsons Corporation
The “person of interest” of this interview is probably not widely known to most of the Test and Evaluation (T&E) community, but Dr. Bill D’Amico has conducted many tests of gun-launched projectiles and unmanned aerial vehicles (UAVs) at a multitude of Major Range and Test Facility Bases (MRTFB), has authored numerous ITEA Journal articles with presentations at ITEA symposia, has been the technical advisor for Central Test & Evaluation Investment Program (CTEIP) and Test & Evaluation/Science & Technology (T&E/S&T) projects, and has formed working relationships with T&E managers and test range personnel. Bill’s science and technology (S&T) background and expertise was mixed with his testing experiences over a career of almost 40 years. Today he consults with private industry on the testing of autonomous systems. Bill’s educational experience includes Bachelor and Master of Science Degrees in Mechanical Engineering from Santa Clara University (1966/1968) and a Doctoral Degree in Applied Science from the University of Delaware (1977).
As a Reserve Officer Training Corp (ROTC) second lieutenant in the Ordnance Corps, he came to Aberdeen Proving Ground (APG) in 1968 and was re-assigned to the US Army Ballistic Research Laboratory (BRL), predecessor of the current Army Research Laboratory (ARL). After completing his ROTC commitment at BRL, he was hired as a GS-12 mechanical engineer and found his way into the Free Flight Aerodynamic Branch, which he would eventually lead and transition into ARL. Bill and his wife Pat recently celebrated their 50th wedding anniversary after raising 3 sons who reside up and down the East Coast with their 8 grandchildren. While at APG, Bill was a T-ball coach (met Dave Brown and Jim Stewart, both of TECOM, at that time), a basketball coach in the city of Aberdeen, and a soccer program coordinator and coach in Churchville. He was also a soccer referee with 2 of his sons in Harford County.
Since Bill is not so well known in the T&E community, we have included references to some of his work at the end of the interview. Here is Bill’s T&E story.
Q: What were your APG testing experiences?
A: In the early 1970’s significant flight stability problems existed with liquid-filled, spin-stabilized projectiles, namely white phosphorus (at high temperatures) and binary chemical rounds (both 155mm and 8-inch concepts). I conducted laboratory experiments and used singular perturbation (analytical) methods to explain nonlinear and unstable flight behavior between resonant frequencies of the liquid payloads and the nutation frequency of the spinning projectile.1 Of course, theory is just that – theory. Hence, flight tests were needed. Specially instrumented fuses (yawsondes – see Figures 1 and 2) were built and flight tested at test ranges where unstable flights could be accommodated – NASA Wallops Island, VA and Dugway Proving Ground, UT.2 For this work in rotating liquids, I was cited by the National Society of Professional Engineers – Engineer of the Year in 1986. That research was later expanded to consider “loose internal parts” common to nuclear projectiles.3 I understood that S&T and T&E must work together – theory, ground tests, and flight tests.
Figure 1. Various yawsonde configurations for spin-stabilized projectile testing
At APG, I had operational responsibility of the Transonic Range Facility, which is located within the test range industrial area. This aerodynamic and terminal ballistic range facility used standard Army cannons and specially modified 175mm gun systems to produce unique launch and flight conditions. The Transonic Range Facility experiences exposed me to the undeserved opinion that test ranges were always responsible for schedule problems caused by overly conservative test practices resulting in program development delays. That fallacy also guided me to merge S&T and T&E.
Figure 2. Additional yawsonde technology description
Q: When did you first develop technologies specifically for the T&E community?
A: At BRL/ARL we were involved in developing special instrumentation for gun-launched projectiles, primarily using masked solar detectors that tracked the projectile yawing and spinning motions using the sun. That was great, but the sun and clear skies had to be in a particular location with respect to the trajectory. The need for sunny days and long-distance test ranges forced most testing to Yuma Proving Ground (YPG), AZ. The advent of automotive air bag accelerometers, primarily sponsored under the Defense Advanced Research Projects Agency (DARPA) Micro-electrical Mechanical Systems (MEMS) programs, offered the potential of miniature, rugged, and inexpensive sensors for inflight measurements. Ground-based shock and flight tests demonstrated that standard automotive MEMS could survive the harsh gun-launch and recover to measure projectile axial force (drag).4
Miniature magnetic sensors could also be used to measure spin at launch, which when combined with the gun twist resulted in a measurement of muzzle velocity. However, miniature and rugged telemetry and battery technologies were key components that were not available. The ARL teamed with the YPG Technology Directorate to submit a proposal under CTEIP to develop and demonstrate technologies to produce a new generation of test technologies, the Hardened Subminiature Telemetry and Sensor System (HSTSS). I acted as the technical lead for this program that successfully qualified a cellular phone chip for telemetry, flexible lithium batteries, MEMS accelerometers, magnetometers, and projectile antennas. HSTSS was completed but not fully transitioned to the T&E community since “fabrication/integration” of these components into unique configurations was still difficult as pictured in Figure 3.5
Figure 3. Typical HSTSS-based instrumentation for a kinetic penetrator projectile
Given the support of HSTSS, ARL continued to track DARPA MEMS developments for angular rate sensors. Over the same timeframe, it became clear that a “course-correcting fuse” using a MEMS-based inertial measurement unit (IMU) could be transitioned into a tactical device. My group at ARL took the obvious path and demonstrated a “one-dimensional (1D)” course correcting fuse using a single accelerometer and a magnetometer, since fully 3D inertial instrument sets were still bulky and immature. The 1D fuse could reduce the very large range errors by producing a nearly circular range/deflection error pattern.6,7 The 1D fuse was perhaps a unique occurrence in that sensors for T&E were also used in a tactical solution. Today there are many types of MEMS-based/GPS-aided guidance systems for iron bombs and projectiles. T&E technology developments should always be examined for tactical uses.
Q: With new sensors and instrumentation, what came next?
A: While at ARL, we realized that more accurate indirect fire was still not highly effective, better accuracy wasn’t the full solution. Target location errors dominated artillery error budgets. Furthermore, battle damage assessment was highly suspect since “eyes on the target” were not the usual case. This resulted in targets being re-shot, consuming precious time and assets. Initially, back-packed unmanned aerial vehicle (UAV) concepts were fielded. We knew that the Army and Marines had a plethora of tubes that could be used to launch UAVs into the target area. Under a contract I set up, AeroVironment authored a design study for tube-launched UAVs. AeroVironment then went on to initiate the Switchblade concept.8 Today the UAV story is front and center in the Russia-Ukraine conflict, addressing the target location errors and battle damage assessment issues in many ways. In 2000, I retired from Federal service and moved to the Johns Hopkins University Applied Physics Laboratory (APL) in Laurel, MD. While at APL, I found a small community of researchers, where model-based reasoning autonomy was being developed to expand the mission sets of small UAVs. APL scientists and engineers conducted many demonstrations, but observers rightly asked, “how do we know this will really work?” The user was correctly stating that unless autonomous systems pass through rigorous T&E, then this is just a toy. A three-person team (David Scheidt, Robert Lutz, and I) wrote a proposal that outlined the basic needs and pathways for the T&E of autonomous systems. That proposal, originally known as STACIE, was ready for submission when the BAA sponsors produced a last hour requirement that submissions could only have a 4-letter abbreviation – TACE was born – Testing of Autonomy in Complex Environments. 9,10,11 TACE went through a 5-year development and demonstration effort, but further transition to the MRTFBs was still needed.
Q: When did the specter of autonomy first occur for you?
A: While at APL, I was asked by a government sponsor to oversee a 3rd party development for a small parafoil system that would deliver payloads in remote and sensitive locations without operator control. The project description essentially requested an autonomous delivery. At that time large palletized parafoil systems had been developed, tested, and fielded, but this was not the case for small parafoils that have significantly different aerodynamic characteristics and behaviors. During the process of testing the 3rd party design, it was shown that the control laws were not working properly. I suggested a change in program direction, which was granted. The new team re-oriented the effort and successfully completed the project.12 This was my first exposure to an autonomous system design, and it needed only a simple test sequence. The small parafoil program lessons showed that the control laws must reflect a correct physical model fed by a reliable aerodynamic database. The physics-based guidance algorithm needed the appropriate data to work effectively – proper training data were needed. The model predictive control approach may be a simple artificial intelligence (AI) approach by today’s standards, but it worked! During testing, the model was corrected for unusual atmospheric conditions and showed overall improved performance. Imagine the testing that is needed for a highly complex AI system with multiple platforms of different modalities: air, land, sea surface, and/or underwater. What is their collective view of the world around them and how does AI testing progress?
Q: What challenges do AI-systems pose for the test and user communities?
A: Any new technology provides challenges to the test and user communities. Simply because TACE was demonstrated, the Technology Readiness Level (TRL) had to be increased, and the use accepted by the test and user communities. When I retired from APL several years ago, the transition of TACE was underway. Personnel at Edwards Air Force Base (EAFB) have initiated TACE-based testing and have documented some of their efforts within the Skyborg and the AFWERX Autonomy Prime programs.13,14,15,16,17
The starting point to testing AI-systems for both the developmental test (DT) and operational test (OT) communities would be to establish a test and evaluation master plan (TEMP). What’s new and different for a TEMP when AI is present? The testers must understand what should be added to the standard TEMP. There is no need to start from scratch since the TEMP format is very general. What are the necessary additions to an AI-infused TEMP?
Figure 4 is a pictorial of how AI works, and why AI is different – these systems will make decisions and produce actions without operator oversight.
Figure 4. The world state to situational awareness to an action/decision view
The world state will be derived from sensor inputs (onboard and offboard) and AI-processed into a situational awareness that supports the mission (decision/action). Will this sequence be orderly, correct, or just plain wrong – that’s what testing must determine. A continuous testing process is needed from DT to OT with integrated user tests that flow into a 2-way process, not just left to right. The test community will need help in the development and integration of these new test techniques and technologies to efficiently accomplish this task. Will our adversaries take such care in AI testing – I think not. “In March 2022, the Pentagon reported that Russian missiles had failure rates of 20–60%, with cruise missiles having the lowest kill rates. Some missiles reportedly failed to explode even when they hit their targets.”18 If poorly tested AI is added to these existing systems, performance will probably get worse. We expect our adversaries to rapidly employ AI systems, but we need to test our AI systems comprehensively and with some urgency.
There is also the AI category of “assured autonomy.”19 There will be many situations during testing where unsafe behaviors will occur, and for flight testing this means that test range safety conditions and boundaries may be violated. If the system under test is executing autonomous behavior, how does the range safety officer regain control? The TACE program utilized runtime monitoring and guardrails as an automated override system to regain control and to allow for a repetition of the test.11 If the test is intricate, this may require a separate and highly robust/secure radio link.
With several complex components, higher-level autonomous behavior must build assurance from top to bottom, eventually achieving trust. Actions by individual agents must be understood by the users within complex environments to include human interactions. While system designers may seek specific behaviors at the component-level, the challenge is to demonstrate full operation at the mission level. The T&E approach must include real-time runtime monitoring/guardrails to provide assurance that the AI system is within the expected range of behaviors and parameters.
We will not accept poor AI performance, and we will require that AI systems embody trust. Trust in new technologies is not automatic. As an example, consider the advent of GPS where military users did not (and should not) have implicit trust. It is well documented that military users relied on commercial GPS implementations until military-approved units were finally fielded. “Although each U.S. Army unit had at least one GPS receiver for maneuvering, the demand for receivers was so great that special approval from the Pentagon was obtained for the Army to acquire commercial units. Once a waiver was approved, the GPS Joint Program Office at SMC, managed by Col. Marty Runkle with Lt. Loralee Ryan as GPS user equipment project manager, awarded contracts for over 8,000 more receivers from commercial providers.”20
A similar situation may be the case for military AI systems, especially since AI is taking the commercial world by storm with many different and novel implementations. It is unrealistic to assume that government agencies will be able to independently establish all the practices and procedures without the help of the private sector. Vendors of all sizes and types, especially those with “hands-on” AI experience should be leveraged. AI may require other technologies that are not in common use at the MRTFBs, e.g., novel communication protocols.
To date, most military-style demonstrations of autonomous teams have used wireless mobile and ad hoc network (MANET) technologies for intra-team communications. MANET has no master node or centralized hub, and it naturally blends into the rapid and unplanned nature of AI-based decisions and operations. Most military communications use time division multiple access (TDMA) protocols that are not naturally suited to AI-based systems. These protocols often have master nodes with pre-planned allocations for data slots and bandwidth allocation. These types of networks could possibly retard the AI decision making process and inhibit AI-driven improvements in mission performance.
Care must be exercised to separate the performance of command-and-control methods from the autonomous system. We now realize that several wireless networks may be needed: (1) an intra-AI team tactical network that can (and will) suffer real battlefield problems (poor-line-of-sight, low bandwidth allocations, adversarial jamming, etc.), (2) a T&E network that overwatches the AI team with near perfect wireless performance to record all ”world state” information (who knew what/when/where), (3) a standard test data link to record platform performance, and (4) an autonomy assurance link for runtime monitoring and guardrails. If generous connectivity, bandwidth and response times are available, then perhaps some of these links can be combined, but the testing of AI will be an RF-challenge. This multiplicity of links must be coordinated with other test range wireless needs prior to test planning and execution. Considerations of cost, non-fixed sites, and ease of implementation must also be addressed if AI testing is to be available at multiple MRTFB locations.
Figure 5. Yuma Proving Ground wireless data acquisition network (circa – 2007)
Figure 5 is an example of a wireless network implementation where low cost, non-fixed sites, and ease of implementation were implemented. The US Army initially planned to field the Crusader “shoot & scoot” system in 2008. YPG developed a wireless data acquisition network to support the concept of operations of this mobile howitzer. Highly portable (portable means no environmental impact statement is needed) wireless nodes were designed and built.21,22 These wireless nodes used implementations of the emerging 802.11 standard and were combined with omni-directional antennas and solar panels. This network was interfaced with the existing YPG wired network infrastructure for data backhaul and archiving. Typically, six portable nodes were strategically placed on hill tops to provide universal coverage. One should note that the coverage by these 6 nodes was relatively long distance and suffered few nulls due to shadowing by the mountainous terrain. This network modified the Media Access Control (MAC) layer to provide primarily one-way, uplink data flows. Two-way data flows/instructions at this early date were not a priority. This network was retired since the bridge hardware could not be secured to meet emerging standards. However, this design philosophy could be re-examined using today’s commercially available MANET radios.
Today the 802.11 wireless specification is very mature and has been the basis of several commercial products that could support the wireless needs for testing autonomous systems. Studies are available on this topic and provide additional background on newer developments. A 2014 Canadian study with oversight from the Test Resource Management Center (TRMC) lists several suppliers that today have mature MANET radios.23 It also may be prudent to examine if non-MANET, standard tactical links can be integrated into small platforms such that offboard data feeds from larger platforms can be leveraged. That may drive an interesting tradeoff in terms of AI algorithms that could work effectively in a TDMA environment.
Q: What comes next if AI-based systems are to revolutionize military systems?
A: We need to adapt and understand how to best use the technologies that are driving the AI world. One of the original TACE developers started a small private company that specializes in autonomous intelligent systems technologies. I was asked to consult on a part-time basis – I said yes – and it’s been great. Experiences with self-driving car routines are interesting, but that type of drive and fix after the fact is not sufficient for military style AI.
In addition, a very important research topic known as “intelligent fault management” is key.24 This area of interest is intended to mitigate real-time problems with hardware and software while the AI system is operating in-the-field. These methods will allow complex machines to self-diagnose and self-heal, thus retaining some capabilities of the initial force and contributing to the completion of the assigned mission. The development of T&E products and procedures must include the assessment of fault management procedures for autonomous systems.
AI is a buzzword and according to press releases AI will be inside of everything, but there are different types of AI implementations. One very serious challenge is the case of “mission autonomy” where the AI-enabled entity or team loses connection to the human operator and its acquired world state/database. For mission autonomy, success depends on its self-determined world state and the associated self-directed decisions. Are those decisions rational, safe, and ethical in terms of accomplishing the assigned mission? The TACE program was conceived as an extension for existing test infrastructures for the purpose of safely evaluating autonomous system behavior and performance. Sponsored by the TRMC’s T&E/S&T program under the Unmanned and Autonomous System Test (UAST) Test Technology Area Broad Agency Announcement (BAA), TACE provided a means to detect unsafe autonomous behaviors during live test events and to apply simple guardrails to mitigate the effects of those behaviors. Demonstrations with small unmanned aerial system (UAS) aircraft were made at Phillips Army Airfield (PAAF)/APG. Today UAST has a new name, Autonomy and Artificial Intelligence Test (AAIT).
TACE development did not focus on testing a specific AI algorithm, but rather it was intended to provide an initial overall framework for AI testing. This is just a beginning given that today’s AI algorithms will be the centerpiece of testing and will probably use large language models (LLM) and machine learning (ML) with huge datasets for training. AI algorithms could be model-based learning or regenerative in nature, but they all can be like children – go anywhere and do strange things. Hence, the guidelines of assured autonomy and runtime monitoring with guardrails are necessary.
Simple examples of runtime monitoring and guardrails were demonstrated within the TACE program at PAAF/APG.8 In later TACE phases, more extensive demonstrations (emergency stop, altitude floor violation, loss of link, proximity to other entities (real and synthetic), and collaborative autonomy) were made using LVC methods (live – using real people and real systems, virtual – using real people operating simulated systems, and
Figure 6. A collaborative LVC demonstration of runtime monitoring/guardrails
constructive – using simulated people operating simulated systems). An illustration of one vignette is provided in Figure 6 and is at TRL6.11
The EABF test range topography was used with the test area outlined in purple as a notional search area within the outer perimeter of the EAFB North Base small UAS Work Area. In this vignette, two UASs were released from Loiter Point A and began a cooperative search and track mission. The target in this vignette was a synthetic ground vehicle moving along a racetrack pattern modeled by the Advanced Framework for Simulation, Integration and Modeling (AFSIM) Synthetic Force Generator. When one of the two UASs detected the ground target, it communicated the position of the target to the other UAS, which then flew towards the target location. The UAS approached the moving ground target, got too close, and triggered a preset runtime monitoring/guardrail restriction to keep a certain safe distance from the other live UAS, hence a violation occurred. This resulted in each UAS discontinuing the autonomous tracking task and transiting immediately to different loiter points as the defined remediation.
It could be asked, “why do all of this – can’t the test director just override the autonomous behavior and manually enforce safe practices?” These examples used only two slowly moving UASs executing a simple and singular AI-driven mission. However, for more complex test scenarios runtime monitoring/guardrails are an absolute necessity e.g., if the event is non-line-of-sight, if there are many targets, if there is a swarm of fast-moving AI-entities, if the test link is dropped, etc. When the AI decisions are multiple and rapid, an automatic override is needed. The test engineer will want to halt the test and re-start the sequence to determine if the same AI behavior will occur and if the runtime monitor/guardrail will provide the proper oversight and control. Many questions must be asked and documented. Did the AI-algorithm fail, was the world state highly accurate or totally inaccurate, and/or did the automatic safety overrides work properly. The multiplicity of runtime assurance/guardrails can initially be developed using simulation methods, but they must be matured with the addition of live hardware in an overall LVC approach.
Q: Does testing of AI ever end?
A: Mission autonomy is a particular kind of AI that presents a serious T&E challenge, and the community must be ready to certify these types of military AI systems. And not just initially, but continuously, as hardware and software updates are installed. Even what may be considered as minor or routine software updates must be rigorously tested to avoid a “CrowdStrike” outage; this is where trust is so important.25 In addition, the entire field of human-machine teaming is a challenge for both the DT and OT communities. New technologies must use simulation-based testing and LVC methods. Trust, however, can only be established by actual testing and use by operational personnel. So, the answer is NO – testing will never end. That’s how it should be. I look forward to being a small part of this new generation of test technology developers as AI floods into our world.
Q: What advice do you have for people just entering the T&E career field?
A: Through a series of coincidences, I wandered into the testing world. I was responsible for a gun range, which essentially was a test range. That cemented my belief that test data are critical to program development. Simulation technologies were not highly advanced at that time, but computational sciences have brought simulation-based testing and LVC into the forefront. Clearly, AI-infused military applications will be a major challenge today, and, given that testing is so expensive, these simulation methodologies are imperative in terms of identifying what tests should be planned/costed and what technologies are needed to conduct those AI tests. Furthermore, the simulation results can then be compared with the real data to better understand the fidelity of the simulation. Since exhaustive testing is not affordable or possible, updated simulations from this “comparison loop” will assist in the establishment of assured autonomy and the progression to trust. And as in the case of the technology for the 1D fuse, do these T&E methods provide the basis for tactical use – yes.
Q: Do you have any closing remarks or observations?
A: The lines between DT and OT will blend with continuous testing that is not just a left-to-right progression. Be prepared to build and test early with realistic prototypes. I was once encouraged to push technology frontiers by safely and securely doing 50/50 experiments (but let’s call them tests). The aim was to learn enough in the near timeframe, but then to return and to make corrections. We must learn from our “controlled and calculated” mistakes. Remember the failed calculus or chemistry test that turned out to be a key learning experience. In today’s world where success is almost always required, what I’m suggesting may be a difficult path. That path contains high fidelity simulation and realistic/affordable physical testing, but it’s a path that leads to success.
J. MICHAEL BARTON, Ph.D. Parsons Fellow and Chairman of the ITEA Board of Directors, has worked on Aberdeen Proving Ground since 2001, ten years supporting the Army Test and Evaluation Command and the last nine with the Army Research Laboratory working in large-scale data analytics, high-performance computing, and outreach to test and evaluation and other stakeholders. Dr. Barton’s career is in physics-based modeling and simulation with six years as an aerospace consultant, twelve years as a contractor supporting the Air Force at the Arnold Engineering Developmental Complex in Tennessee and the National Aeronautics and Space Administration Glenn Research Center in Ohio; and the first 4 years of his career with The Boeing Company in Seattle. He received Bachelor of Science and Ph.D. degrees in engineering science and mechanics from The University of Tennessee-Knoxville and a Master of Engineering degree in Aeronautics and Astronautics from The University of Washington.
WILLIAM D’AMICO, Ph.D. is retired from the US Army Research Laboratory at Aberdeen Proving Ground and also retired from Johns Hopkins University Applied Physics Laboratory. Presently, on a part-time basis, he is consulting on the testing of autonomous systems in private industry. Dr. D’Amico’s career included research, development, and program management in flight dynamics, miniature sensor systems, communication architectures/use, unmanned aerial vehicles, and test technologies. At one point, he had operational responsibility of a large caliber gun experimental range, which introduced him to many of the difficulties that test ranges endure. He typically used analytical and numerical results, ground-based tests, and flight testing as pathways to solutions for complex and emerging problems, many of which were for the developmental test community. He received Bachelor and Master of Science degrees in Mechanical Engineering from Santa Clara University and a Ph.D. from the University of Delaware in Applied Science.
JUNE JOURNAL
READ the Latest Articles NOW!