DECEMBER 2024 I Volume 45, Issue 4
Fuzz Testing for System Vulnerabilities | ITEA Journal
DECEMBER 2024 I Volume 45, Issue 4
DECEMBER 2024
Volume 45 I Issue 4
![]()

Adjunct Associate Professor, Old Dominion University (Fall 2024),
Norfolk, Virginia
![]()
![]()
Senior Lecturer Test and Evaluation, Capability Systems Centre University of New South Wales, Canberra, Australia
Fuzzing is an important new test and evaluation (T&E) approach to find information technology vulnerabilities, one that is undergoing rapid research development and improving utility. However, fuzz testing has limited awareness in the broader test community. This article reviews a technical track held during the 2024 Cybersecurity Workshop by the International Test and Evaluation Association (ITEA) that relates those presentations to the research literature on fuzz test techniques. The track was chaired by Dr Mike Shields and titled ‘Fuzzing to Find Unknown Vulnerabilities’ with four presentations concerning the evolution of fuzzing tools from the Vader Modular Fuzzer (VMF) through the G-QEMU (GQ) fuzzing engines to modern Hybrid Fuzzing like the Multi-Arm Bandit fuzzing engine. The final presentation was on work sponsored by the Test Resource Management Centre (TRMC) to measure and compare fuzzing engine performance, building on significant research development of fuzz test benches. In the discussion on fuzz testing research trends, a new AI-enabled literature analysis tool known as LitMaps® is used to examine what such approaches offer to those characterizing trends in a fast-paced research area like fuzz testing. This research hopes to encourage further submissions by fuzz testers on best practices in detecting vulnerabilities to build digital sovereignty through better cyber resilience.
Keywords: Fuzz Testing, Vader Modular Fuzzing, QEMU Fuzzing, Hybrid Fuzzing, Multi-Arm Bandit Fuzzing, Fuzz Test Benches
Fuzz testing stimulates a system or emulation of one with semi-valid inputs in order to determine vulnerabilities. Fuzzing is an important new T&E approach in industry and Government to find software vulnerabilities. According to a recent major literature review of fuzzing techniques by Zhao, et al. [1]:
Fuzzing is an important technique in software and security testing that involves continuously generating a large number of test cases against target programs to discover unexpected behaviors such as bugs, crashes, and vulnerabilities. Recently, fuzzing has advanced considerably owing to the emergence of new methods and corresponding tools. However, it still suffers from low coverage, ineffective detection of specific vulnerabilities, and difficulty in deploying complex applications.
This research article is a review of a presentation track from ITEA’s Cybersecurity Workshop in Orlando, Florida in September, augmented with helpful explanation and related to current research literature. The track was chaired by Dr. Mike Shields and titled ‘Fuzzing to Find Unknown Vulnerabilities’ with four presentations that are aligned somewhat with a chronology of the development of fuzzing engines since about 2016:
Presenters highlighted that TRMC is creating a configurable modular fuzzer that is capable of fuzzing using many different fuzzing strategies that meets the unique needs of the DoD. By improving the wherewithal for advanced fuzzing to all providers of weapon systems and critical infrastructure, TRMC can shift the High Assurance (HA) vulnerability and cost risk equation [6] to be commensurate with the need for greater digital sovereignty and cyber resilience against the increasing risk of advanced persistent threats [7, 8]. This research review hopes to encourage further submissions by fuzz testers on best practices in detecting vulnerabilities.
The review is based on the accepted and published presentation abstracts and the recollection of the author who attended. A short background is given before overviewing each of the four presentations and their topics, relating each presentation area to the research literature. The research article concludes with a discussion on fuzz testing research trends, informed by a new literature analysis tool known as LitMaps®that is partly AI-based. This literature analysis illustrates what such AI-enabled approaches offer to those characterizing trends in a fast-paced research area like fuzz testing.
Fuzzing is defined by Microsoft as ‘a program analysis technique that looks for inputs causing error conditions that have a high chance of being exploitable, such as buffer overflows, memory access violations and null pointer dereferences.’ They go on to characterize fuzz testing as being in the following three categories:
Table 1: Comparison of fuzz test types
| Black-box (BB) | Grey-box (GB) | White-box (WB) | |
| Goal | Mimic external cyber-attack | Mimic insider threats | Mimic threats with privileged access |
| Access & Information | Zero | Some | Complete |
| Advantage | Realistic start | More efficient than BB | More comprehensive than GB |
| Disadvantage | May miss vulnerabilities & resource-intensive | Limited assessment of penetration resistance | Requires extensive release at cost |
Much of the use of fuzz testing to detect vulnerabilities in commercial practice is proprietary while in weapon systems and critical infrastructure it is often classified. However, following the tradition of cybersecurity [9], if we turn to road vehicles where public safety is paramount, we can illustrate the effects fuzz testing is having. For example, the US Government this year is planning to “ban certain hardware and software made in China and Russia from cars, trucks and buses in the US due to security risks” while US vehicle manufacturers are investing in more cyber-resilient systems using fuzz testing . Similar evidence is emerging in the development of public IoT devices .
Fundamentally, fuzzing shifts the focus from functional test cases (i.e., what systems must do) to include the effects of the non-functional factors (i.e., incidental characteristics of systems or invalid values or rarely used values). It does so in various systematic ways of mixing structured experimentation (i.e., design of experiment or combinatorial test designs) with randomization. Such randomization can be applied within runs to the uncontrolled aspects. Finally, it often systematically and concurrently explores variations in hardware and software. Put another way by Mallissery and Wu [10]:
‘Fuzzing is a vulnerability discovery solution that resonates with random-mutation, feedback-driven, coverage-guided, constraint-guided, seed-scheduling, and target-oriented strategies. … Most topline companies and organizations utilize fuzzing to ensure quality control and cybersecurity operations. For example, Google uses fuzzing to verify and ensure that the millions of Lines of Code (LOC) in Google Chrome are bug-free [67]. It was challenging to admit that Google could find 20K vulnerabilities in Chrome using fuzz testing [67]. The dominant software from Microsoft has to pass the fuzz test stage in the software development cycle to ensure no code vulnerabilities and confirm stability [136]. The DoD Enterprise DevSecOps Reference Design document [40] from the United States has mentioned that continuous testing across the software development cycle is necessary for the test tools support. Therefore, it is essential to use fuzzing to discover Distributed Denial of Service attacks and malware exploit possibilities, validate system security, and reduce the risk of system degradation [40].’ [pp. 71:1-2]
Different fuzzing engines employ different methods to look for vulnerabilities and there is a growing repertoire of types and names depending on their target systems and what languages they operate in. According to Mallissery and Wu [10] there are six fundamental steps in a fuzzing engine as illustrated in Figure 1:

Figure 1: Basic Fuzz Test Steps
Fuzzing engines or are usually rated according to how many software bugs (Antonio and Blázquez [11])7 are attributable to the toolset, sometimes known in gaming parlance as ‘trophies.’
According to Weader, et al. [2] and as illustrated in Figure 2:
‘VMF is a reusable, composable, modular fuzzing framework developed under funding from TRMC (Test Resource Management Center). Its value exceeds that of a “single better” fuzzing method by integrating, joining, and adapting multiple fuzzing capabilities all within one common user tool set. VMF enables quickly changing and combining fuzz strategies and applying known-good strategies to similar targets. The VMF framework and many core modules are open-source software, with easy run-time integration of user-controlled (e.g. access-restricted and private) modules. VMF is easier to use than many comparable tools which increases the fuzzing effectiveness of non-specialists. Its modularity and clearly defined interfaces simplify capability integration and new feature development.

Figure 2: VMF illustration from VMF v4.1 documentation
VMF incorporates technologies that are derived from published fuzzing research, and capabilities compatible with existing open-source fuzzers. A large portion of research is built on top of original fuzzing engine called American Fuzzy Lop (AFL) hence much of the academic literature refers to AFL, such as the review by Fioraldi, et al. [12]. However, because VMF is open-source there is now some coverage of its development and utility in the literature. For example, an interesting development from VMF by Naaktgeboren, et al. [13] which states:
‘Fuzzing has proven to be very effective for discovering certain classes of software flaws, but less effective in helping developers process these discoveries. Conventional crash-based fuzzers lack enough information about failures to determine their root causes, or to differentiate between new or known crashes, forcing developers to manually process long, repetitious lists of crash reports. Also, conventional fuzzers typically cannot be configured to detect the variety of bugs developers care about, many of which are not easily converted into crashes. To address these limitations, we propose Pipe-Cleaner, a system for detecting and analyzing C code vulnerabilities using a refined fuzzing approach.’
The current TRMC version of VMF is 4.1 and includes Linux targets in C/C++ shifting soon to being executable in Windows and binary [2]. It can be applied in black, white or grey box testing, with a coverage basis, directed or generation-based (i.e., mutation).
Quick EMUlator (QEMU) is described in the recent text by Antonio and Blázquez [11] and illustrated in Figure 3 from the public site:
QEMU is a piece of software that aims to provide users with a tool where they can emulate different [operating] systems, as well as some system peripherals. QEMU uses an intermediate representation (IR) to represent these operations, and through binary translation, it will transform the instructions of the given system or binary into the IR and compile those instructions into the current architecture-supported instructions (just-in-time mode, faster), or it will interpret those IR instructions on its own interpreter (interpreter mode, slower).
QEMU is therefore a rapid virtualization and simulation of cyber-physical system targets with dynamic binary instrumentation and instruction set semantics. According to the QEMU public site:10
Modern fuzz testing intelligently selects random inputs such that new code paths are explored and previously-tested code paths are not tested repeatedly. This is called coverage-guided fuzzing and involves an instrumented program executable so the fuzzer can detect the code paths that are taken for a given input.
The term G-QEMU or GQ refers to a Government Off-The-Shelf (GOTS) version of QEMU developed for TRMC by Draper. According to Weader and Owen [3] from the subject conference:
‘G-QEMU (GQ) is a set of enhanced QEMU (Quick Emulator) extensions and related capabilities developed under funding from TRMC. It enables rapid virtualization, simulation, and introspective testing of cyber physical real-time system targets. Motivating use cases for GQ include fuzzing, reverse engineering, firmware analysis, and software debugging. GQ minimally augments QEMU with isolated, flexible, generalized capabilities and provides a broader catalog of separable features.’

Figure 3: Fuzzing in an emulated environment
GQ has capability augmentations to facilitate enhanced emulation without breaking, such as shared memory and an embedded Python interpreter for customizable blocks [3]. Some of the background to QEMU is in a new text by Antonio and Blázquez [11]. QEMU and can be further appreciated in the following quote from Gatla, et al. [14]:
Similarly, fuzzing-based tools have proven to be effective for kernel bug detection [24, 63, 66, 99, 101]. For example, Syzkaller [24] is a kernel fuzzer that executes kernel code paths by randomizing inputs for various system calls and has been the foundation for building other fuzzers; Razzer [63] combines fuzzing with static analysis and detects data races in multiple kernel subsystems (e.g., “driver,” “fs,” and “mm”), which could potentially be extended to cover a large portion of concurrency PM bugs in our dataset. Since Syzkaller, Razzer, and similar fuzzers heavily rely on virtualized (e.g., QEMU [33]) or simplified (e.g., LKL [15]) environments to achieve high efficiency for kernel fuzzing, one common challenge and opportunity for extending them is to emulate PM devices and interfaces precisely to ensure the fidelity. Also, Linux kernel developers have incorporated tools such as Kernel Address Sanitizer [25], Undefined Behavior Sanitizer [27], and memory leak detectors (Kmemcheck) [13] within the kernel code to detect various memory bugs (e.g., null pointers, use-after-free, resource leak). These sanitizers instrument the kernel code during compilation and examine bug patterns at runtime. Similarly to other dynamic tools, these tools can only detect issues on the executed code paths. In other words, their effectiveness heavily depends on the quality of the inputs. [p. 36:17]
Hybrid fuzzing merges fuzzing, assessed as ‘concrete’ because it is actual test, and ‘symbolic’ execution, which inculcates code-based paths, where the portmanteau of concrete and symbolic is abbreviated to ‘concolic.’ These two types of techniques are complementary and concolic is symbolic with hybrid fuzzing. One of the limits of earlier fuzzing is that the generational or mutation algorithms can constrain solutions to previously detected vulnerabilities, whereas symbolic logic execution combined with fuzzing can force unusual paths to discover additional vulnerabilities. An illustration of hybrid fuzzing is Figure 4.

Figure 4: Hybrid Fuzzing illustration (adapted )
From the presentation Senator and Allen [4], a fuzzing engine known as ‘Driller’ was used from around 2016 to 2019 to escape compartments and access code used infrequently (i.e., Zhang, et al. [15, p. 4-16]). Major developments since 2020 that were highlighted in the ITEA workshop were ‘Fuzzolic’ [16], ‘Paint Aware Taint Analysis (PATA)’ [17] and Speedy-Automatic-Vulnerability-Incentivized Oracle (SAVIOR) [18]. Each of these three techniques is briefly explained:
Fuzzolic. Borzacchiello, et al. [16] summarize Fuzzolic in the prestigious ‘Computers and Security’ journal as follows:
On one side, we devise a novel concolic executor that can analyze complex binary programs while running under QEMU and efficiently produce symbolic queries, which could generate valuable program inputs when solved. On the other side, we investigate whether techniques borrowed from the fuzzing domain can be applied to solve the symbolic queries generated by concolic execution, providing a viable alternative to accurate but expensive SMT [satisfiability modular theories] solving techniques.
PATA. Liang, et al. [17] summarize PATA as follows:
Taint analysis assists fuzzers in solving complex fuzzing constraints by inferring the influencing input bytes. Execution paths in real-world programs often reach loops, where constraints in these loops can be visited and recorded multiple times. Conventional taint analysis techniques experience difficulties when distinguishing between multiple occurrences of the same constraint. In this paper, we propose PATA, a fuzzer that implements path-aware taint analysis, i.e. one that distinguishes between multiple occurrences of the same variable based on the execution path information.
SAVIOR. Chen, et al. [18] summarize SAVIOR as follows (see also [19]):
Recently, hybrid testing has seen significant advancement. However, its code coverage-centric design is inefficient in vulnerability detection. First, it blindly selects seeds for concolic execution and aims to explore new code continuously. However, as statistics show, a large portion of the explored code is often bug-free. Therefore, giving equal attention to every part of the code during hybrid testing is a non-optimal strategy.… Unlike the existing hybrid testing tools, SAVIOR prioritizes the concolic execution of the seeds that are likely to uncover more vulnerabilities. Moreover, SAVIOR verifies all vulnerable program locations along the executing program path. By modeling faulty situations using SMT constraints, SAVIOR reasons the feasibility of vulnerabilities and generates concrete test cases as proofs.
Multi-Arm Bandit. Fuzzing research pioneer new techniques quickly and often however each technique largely does not subsume the performance of the previous fuzzing engines, so there becomes a ‘toolbox’ of multiple fuzzing engines to use for the necessary coverage of the latest vulnerabilities in any system under test. As introduced already though, the full toolbox may be computationally prohibitive, and it becomes a challenge to determine which engines to use, to what extent, and in what sequence. As systems vary, so too do the vulnerabilities. The subject presentation [4] and fuzzing research therefore turns to a concept known as the ‘multi-arm bandit’ ‘that switches between standard fuzzing and other concolic analysis techniques when the campaign stops making meaningful progress.’ The principle is illustrated in Figure 5.

Figure 5: Hybrid Fuzzing illustration (adapted )
Senator and Allen [4] summarize this multi-arm bandit progression as follows from the subject ITEA workshop:
Since each fuzzing technique has relative strengths and weaknesses, it can be hard to know at the outset which fuzzing technique will favor a given piece of software; the consequence of this only magnifies when the software is large and complex. A given fuzzer might not be applicable to a piece of software without modification; it may be hard to compare two fuzzing techniques to see which is more successful without a large amount of testing; it is difficult to modify software to a state where fuzzing is useful, and it is costly to apply fuzzing techniques to software that has never been fuzzed before.
Vader Modular Fuzzer (VMF) is a tool for fuzzing that provides customizability, via a rich module system, without the need to write any new code. It gives access to multiple state-of-the-art fuzzing tools within a framework that enables easier benchmarking between those tools. While this helps lower the bar for implementers to leverage additional techniques, doing so still requires ending and beginning a campaign. GTRI plans to improve the above issues by integrating additional tools and capabilities directly into VMF. GTRI is working on a Contextual, Multi-Armed Bandit approach to automatically balance the execution of a large set of predefined fuzzers against a given target within the VMF. The Multi-Armed Bandit will integrate directly with VMF as a new module and use the Linear Upper Confidence Bound (LinUCB) machine learning algorithm to select a fuzzer with the highest likelihood of finding interesting behavior while balancing short-term and long-term rewards. To minimize regret when choosing which arm to execute next, the algorithm will select an arm based off of two factors: past success of the arm and less frequently chosen arms to balance exploitation and exploration.
The concept and term of a multi-arm bandit comes into fuzzing from research such as [20] using the machine learning (ML) multi-armed bandit scheduling model for adversarial rewards (AR-SMAB) [21] to ‘model the seed scheduling and energy allocation processes in gray-box fuzz testing.’
LinUCB ML Fuzzing. Another recent and clear description of the LinUCB ML approach to fuzzing is by Su, et al. [22], as follows:
The mutation-based greybox fuzz testing technique is one of the widely used dynamic vulnerability detection techniques. It generates testcases for testing by mutating input seeds. In the process of fuzz testing, the seed scheduling strategy and energy scheduling strategy impact the test results and efficiency. Existing seed scheduling strategies, however, only consider a few specific seed attributes and ignore contextual information during seed execution. This oversight makes it challenging to prioritize the selection of suitable seeds based on historical fuzz test results. Meanwhile, current methods for calculating coverage lack evaluation of software paths, which makes it easy to waste time on testing high-frequency and low-risk paths. This article proposes a new greybox fuzzing scheme, LinFuzz, which transforms the seed scheduling problem into a contextual multi-armed bandit machine model. It utilizes the LinUCB algorithm to assess the value of seeds for scheduling by considering their historical execution information. At the same time, LinFuzz improves the calculation method for fuzz testing path rewards and the seed energy scheduling algorithm. It allocates more energy for testing low-frequency paths in the testing program, thereby enhancing the efficiency of exploration and the path coverage ability of the testing tool.
Pioneered by work like Wang, et al. [23] in 2020, the research this year by Su, et al. [22] finds that ‘under the same testing time budget, LinFuzz outperforms other tools in terms of vulnerability discovery quantity and code coverage ability’ such as ‘AFL, AFLFast, FairFuzz, Neuzz etc.’ This independent research confirms GTRI’s hybrid fuzzing research approach. The presentation by Senator and Allen [4] concluded their overview by noting:
In order to fully realize this benefit, standalone concolic techniques need to be adapted to function within VMF. GTRI plans to implement SAVIOR with the VMF as it brings bug-driven prioritization and bug-driven verification that improves coverage, bug discovery, and formalizes the vulnerability of code areas using SMT constraints.
Other researchers exploring such hybrid fuzzing approaches include Zhao, et al. [24].
The fourth presentation in the ITEA workshop by Koch [5] concerned TRMC’s efforts to develop standardized testing for the effectiveness of fuzzing engines to find different types of vulnerabilities present in typical Defense systems testing. This test infrastructure effort leverages the following four efforts in academia to develop such fuzz test benches:
Magma Benchmark. Hazimeh, et al. [25] (2020) summarize the Magma benchmark for fuzzing methods as follows:
High scalability and low running costs have made fuzz testing the de facto standard for discovering software bugs. Fuzzing techniques are constantly being improved in a race to build the ultimate bug-finding tool. However, while fuzzing excels at finding bugs in the wild, evaluating and comparing fuzzer performance is challenging due to the lack of metrics and benchmarks. For example, crash count—perhaps the most commonly used performance metric—is inaccurate due to imperfections in deduplication techniques. … We tackle these problems by developing Magma, a ground-truth fuzzing benchmark that enables uniform fuzzer evaluation and comparison. By introducing real bugs into real software, Magma allows for the realistic evaluation of fuzzers against a broad set of targets. By instrumenting these bugs, Magma also enables the collection of bug-centric performance metrics independent of the fuzzer. … Based on the number of bugs reached, triggered, and detected, we draw conclusions about the fuzzers’ exploration and detection capabilities … highlighting the importance of ground truth in performing more accurate and meaningful evaluations.
Unifuzz Test Bench. Li, et al. [26] (2020) summarize the Unifuzz test bench effort as follows:
A flurry of fuzzing tools (fuzzers) have been proposed in the literature, aiming at detecting software vulnerabilities effectively and efficiently. To date, it is however still challenging to compare fuzzers due to the inconsistency of the benchmarks, performance metrics, and/or environments for evaluation, which buries the useful insights and thus impedes the discovery of promising fuzzing primitives. In this paper, we design and develop UNIFUZZ, an open-source and metrics-driven platform for assessing fuzzers in a comprehensive and quantitative manner. Specifically, UNIFUZZ to date has incorporated 35 usable fuzzers, a benchmark of 20 real-world programs, and six categories of performance metrics.
Fuzzbench. Metzman, et al. [27] (2021) summarize the reasoning behind the development of Fuzzbench as follows:
In 2020 alone, over 120 papers were published on the topic of improving, developing, and evaluating fuzzers and fuzzing techniques. Yet, proper evaluation of fuzzing techniques remains elusive. The community has struggled to converge on methodology and standard tools for fuzzer evaluation. To address this problem, we introduce FuzzBench as an opensource turnkey platform and free service for evaluating fuzzers. It aims to be easy to use, fast, reliable, and provides reproducible experiments. Since its release in March 2020, FuzzBench has been widely used both in industry and academia, carrying out more than 150 experiments for external users.
IOT Fuzzbench. Cheng, et al. [28] (2023) summarize the reasoning behind the development of a fuzzing engine test bench specific to cyber-physical systems:
High scalability and low operating cost make black-box protocol fuzzing a vital tool for discovering vulnerabilities in the firmware of IoT smart devices. … In this paper, we design and implement IoTFuzzBench, a scalable, modular, metric-driven automation framework for evaluating black-box protocol fuzzers for IoT smart devices … We deployed IoTFuzzBench and evaluated 7 popular black-box protocol fuzzers on all benchmark firmware images and benchmark vulnerabilities. The experimental results show that IoTFuzzBench can not only provide fast, reliable, and reproducible experiments, but also effectively evaluate the ability of each fuzzer to find vulnerabilities and the differential performance on different performance metrics.
These research efforts confirm the maturity in fuzzing engine test benches for TRMC to invest in delivering the capability to Defense testers. Research on such fuzz test benches continues apace with 18 research articles in 2024 and one already for 2025 (i.e., Blackwell, et al. [29]).
New AI-based literature analysis tools such as LitMaps® enable the research literature to be analysed for threads and major influences. Key literature on fuzz testing was categorized according to the tags in Figure 6 with the colour coding being common to all the analysis figures that follow. These tags were created from keywords in previously reviewed research projects in this article.

Figure 6: Legend for all Fuzz Testing Literature Mapping in this article (LitMaps®)
The literature was then graphed according to whether it had been categorized as:
Each of these three analysis sections will be overviewed and discussed, noting some references mapped to more than one section, and for brevity, some references are not formally listed wherever they did not feature earlier.
Literature Surveys on Fuzz Testing. The main literature surveys for fuzz testing for the period 2018 onwards are shown in Figure 7 and are conveniently listed in Table 2 according to the package LitMaps®. The legend for these LitMap® figures is common in Figure 6. This analysis shows the most influential survey work for fuzz testing, based on citation tracking, was by the authors on the right-hand front, such as Klees, et al. [30] (2018) and Schloegel, et al. [31] (2024).

Figure 7: Prominent Survey Articles for Fuzz Testing (LitMaps®) from 2018 (Legend Fig. 6)
Table 2: Key Literature Surveys on Fuzz Testing published since 2018
| DOI | Title | Authors | Year | Cited |
| 10.1109/SP54263.2024.00137 | SoK: Prudent Evaluation Practices for Fuzzing | Schloegel, Bars, Schiller, Bernhard, Scharnowski, et al. | 2024 | 6 |
| 10.1145/3623375 | Demystify the Fuzzing Methods: A Comprehensive Survey | S. Mallissery, Yu-Sung Wu | 2023 | 12 |
| 10.1186/S42400-018-0002-Y | Fuzzing: a survey | Jun Li, Bodong Zhao, Chaomo Zhang | 2018 | 235 |
| 10.1109/CECIT53797.2021.00035 | A systematic review of fuzzy testing for information systems and applications | Shen, Wen, Zhang, Wang, Shen, Cheng | 2021 | 4 |
| 10.1007/S00500-023-09306-2 | A systematic review of fuzzing | Zhao, Qu, Jianliang Xu, Li, Lv, Wang | 2023 | 5 |
| 10.1109/ISSSR61934.2024.00024 | A Comprehensive Review of Learning-based Fuzz Testing Techniques | Cheng, Li, Zhao, Li, Wong | 2024 | 0 |
| 10.1145/3512345 | Fuzzing: A Survey for Roadmap | Zhu, Wen, Camtepe, Yang Xiang | 2022 | 139 |
| 10.1109/ACCESS.2023.3347652 | Machine Learning-Based Fuzz Testing Techniques: A Survey | Zhang, Zhang, Xu, Wang, Li | 2024 | 1 |
| 10.48550/ARXIV.2402.00350 | Large Language Models Based Fuzzing Techniques: A Survey | Huang, Zhao, Chen, Ma | 2024 | 8 |
| 10.1109/ICSIP61881.2024.10671554 | A Review of Fuzz Testing for Configuration-Sensitive Software | Chu, Huang, Li, Nie | 2024 | 0 |
| 10.47857/IRJMS.2024.V05I04.01451 | A Systematic Review of AI Based Software Test Case Optimization | Padmanabhan | 2024 | 0 |
| 10.1145/3243734.3243804 | Evaluating Fuzz Testing | Klees, Ruef, Cooper, Wei, Hicks | 2018 | 673 |
| 10.1016/J.JSS.2022.111423 | A systematic literature review on benchmarks for evaluating debugging approaches | Hirsch, Hofer | 2022 | 10 |
| 10.1145/3649476.3658697 | The Fuzz Odyssey: A Survey on Hardware Fuzzing Frameworks for Hardware Design Verification | Saravanan, Dinakarrao | 2024 | 0 |
Fuzz Test Methods. Consistent with earlier coverage, Figure 8 shows the clear influences on the fuzz test methods by Chen, et al. [18] (2020) with SAVIOR and Liang, et al. [17] (2022) with PATA. However, the analysis also disclosed the following three influential researchers and methods not covered earlier. The Litmaps software was able to find these new publications where our prior manual review did not; a distinct advantage in tracking fast-moving research trends:
Fuzzing researchers have impressively exploited several major disciplines, such as hybrid fuzzing, to create a myriad of highly effective software vulnerability detection tools at an impressive research rate. Furthermore, the 2024 literature survey by Shiri Harzevili, et al. [35] documents how advances in ML are also now being exploited to improve fuzzing heuristics.

Figure 8: Selected Literature on Fuzz Methods 2018+ (LitMaps®) [Legend: see Fig. 6)

Figure 9: Selected Literature on Fuzz Test Benchmarking 2018+ (LitMaps®) [Legend: see Fig. 6)
Benchmarking Fuzzing Engines. Figure 9 shows the influences of Klees, et al. [30] (2018) survey in driving benchmarking, leading to the Metzman, et al. [27] (2021) development of fuzzbench. In 2022, Böhme, et al. [36] produced another influential article on fuzz test benchmarking concerning coverage-based fuzzing engines. More recently, Jiang, et al. [33] documents the use of clustering-based ML to benchmark fuzz test methods. Fuzz test benches and benchmarks have rapidly developed in recent years, making these more useful and available to testers. The effort by TRMC to confirm and extend such toolsets should improve the ability of Defense testers to meet the demands for greater digital sovereignty and cyber resilience documented by Kaloudis [7].
Fuzz Test Education. Insufficient effort has been put into fuzzing in cybersecurity courses with only one example being found in the literature. Hall [37] outline learning ‘activities in Criminal Investigations that teach how to use Address Sanitizer and American Fuzzy Lop (AFL) to identify vulnerabilities in IoT firmware and writing advanced exploits for ARM and x86 that can bypass memory protections’ . The fuzzing skillset will need a substantial educational effort because university research is obviously outpacing curriculum development. TRMC has established a Fuzzing Working Group with the point of contact being Min Kim (Jeong.kim@us.af.mil). Testers should leverage this Working Group and, where possible, publish on best practice fuzz test in T&E journals.
1 https://itea.org/wp-content/uploads/2024/02/ALL-TRACK-ABSTRACTS-8-28.pdf
2 https://www.microsoft.com/en-us/research/blog/neural-fuzzing/
3 24 Sep 2024,https://www.bbc.com/news/articles/cwyegl8q80do
8 https://github.com/draperlaboratory/VaderModularFuzzer/blob/main/docs/design.md
9 Draper Laboratory. 2024. VMF: Vader Modular Fuzzer. Draper Laboratory, Cambridge, MA, USA. https://github.com/draperlaboratory/VaderModularFuzzer
10 https://github.com/draperlaboratory/VaderModularFuzzer/blob/main/docs/intro_to_fuzzing.md
11 https://core-research-team.github.io/2020-10-01/hybrid-fuzzing-1315f7575836434785ab9885c31279b7
12 https://www.attemptspace.in/2022/11/reinforcement-learning-multi-arm-bandits.html
[1] X. Zhao, H. Qu, J. Xu, X. Li, W. Lv, and G. G. Wang, “A systematic review of fuzzing,” Soft Computing, vol. 28, no. 6, pp. 5493-5522, 2024.
[2] J. Weader, A. Owen, and E. Braunstein, “Vader Modular Fuzzer (VMF): Now and in the Future,” presented at the Exploring Cyber Test Ranges: Past Present and Future Perspectives, Orlando, Florida, 18 September, 2024. Available: https://itea.org/event/2024-cybersecurity-workshop/
[3] J. Weader and A. Owen, “G-QEMU (GQ): Now and in the future,” presented at the Exploring Cyber Test Ranges: Past Present and Future Perspectives, Orlando Florida, 2024. Available: https://itea.org/event/2024-cybersecurity-workshop/
[4] K. Senator and S. Allen, “Hybrid Fuzzing,” presented at the Exploring Cyber Test Ranges: Past Present and Future Perspectives, Orlando, Florida, 18 September, 2024. Available: https://itea.org/event/2024-cybersecurity-workshop/
[5] D. Koch, “Measuring Fuzzer Performance,” presented at the Exploring Cyber Test Ranges: Past Present and Future Perspectives, Orlando, Florida, 18 September, 2024. Available: https://itea.org/event/2024-cybersecurity-workshop/
[6] K. F. Joiner, A. Ghildyal, N. Devine, A. Laing, A. Coull, and E. Sitnikova, “Four testing types core to informed ICT governance for cyber-resilient systems,” International Journal of Advances in Security, vol. 11, 2018.
[7] M. Kaloudis, “Digital Sovereignty as a Weapon of Diplomacy in Cyber Warfare in Democracies,” in National Security in the Digital and Information Age, S. Burt, Ed. London, United Kingdom: Incitech Open, 2024, pp. 17-36.
[8] C. Dolan, “AUKUS Pillar 2 – Technology, Interoperability, and Advanced Capabilities in the Evolving Trilateral Security Partnership,” in National Security in the Digital and Information Age, S. Burt, Ed. London, United Kingdom: Incitech Open, 2024, pp. 121-140.
[9] K. Koscher et al., “Experimental Security Analysis of a Modern Automobile,” in IEEE Symposium on Security and Privacy, 2010, May, pp. 447-462.
[10] S. Mallissery and Y.-S. Wu, “Demystify the Fuzzing Methods: A Comprehensive Survey,” ACM computing surveys, vol. 56, no. 3, 2024.
[11] N. Antonio and E. Blázquez, Fuzzing against the Machine : Automate Vulnerability Research with Emulated IoT Devices on Qemu, 1 ed. Birmingham, United Kingdom: Packt Publishing, 2023.
[12] A. Fioraldi, A. Mantovani, D. Maier, and D. Balzarotti, “Dissecting American Fuzzy Lop – A FuzzBench Evaluation – RCR Report,” ACM transactions on software engineering and methodology, vol. 32, no. 2, 2023.
[13] A. Naaktgeboren, S. N. Anderson, A. Tolmach, and G. Sullivan, “Pipe-Cleaner: Flexible Fuzzing Using Security Policies,” arXiv preprint, 2024.
[14] O. R. Gatla, D. Zhang, W. Xu, and M. Zheng, “Understanding Persistent-memory-related Issues in the Linux Kernel.,” ACM Transactions on Storage, vol. 19, no. 4, pp. 1-28, 2023.
[15] B. Zhang, J. Ye, X. Bi, C. Feng, and C. Tang, “FFUZZ: Towards Full System High Coverage Fuzz Testing on Binary Executables,” PloS one, vol. 13, no. 5, p. e0196733, 2018.
[16] L. Borzacchiello, E. Coppa, and C. Demetrescu, “FUZZOLIC: Mixing fuzzing and concolic execution,” Computers & Security, vol. 108, p. 102368, 2021.
[17] J. Liang et al., “PATA: fuzzing with path aware taint analysis,” in IEEE symposium on security and privacy, 2022, pp. 1–17.
[18] Y. Chen et al., “SAVIOR: Towards Bug-Driven Hybrid Testing,” in 2020 IEEE Symposium on Security and Privacy (SP) 2020, pp. 1580-1596.
[19] P. H. Lin, Z. Hong, Y. H. Li, and L. F. Wu, ” A priority based path searching method for improving hybrid fuzzing,” Computers & Security, vol. 105, no. 102242, 2021.
[20] H. Yu, Y. Ma, S. Cheng, S. Chen, Q. Zheng, and S. Chen, “Differential Fuzz Testing of TLS Implementations Based on Multi-Armed Bandit Variant,” in 9th International Conference on Computer and Communications (ICCC), 2023, pp. 2521-2526.
[21] R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma, “Regret bounds for sleeping experts and bandits,” Machine learning, vol. 80, no. 2-3, pp. 245-272, 2010.
[22] Y. Su, D. Xiong, Y. Wan, C. Shi, and Q. Zeng, “LinFuzz: Program-Sensitive Seed Scheduling Greybox Fuzzing Based on LinUCB Algorithm,” IEEE Access, 2024.
[23] X. Wang, C. Hu, R. Ma, D. Tian, and J. He, “CMFuzz: context-aware adaptive mutation for fuzzers,” Empirical Software Engineering, vol. 26, pp. 1-34, 2021.
[24] Y. Zhao, L. Gao, Q. Wei, and L. Zhao, “Towards Tightly-coupled Hybrid Fuzzing via Excavating Input Specifications,” IEEE Transactions on Dependable and Secure Computing, vol. 21, no. 5, 2024.
[25] A. Hazimeh, A. Herrera, and M. Payer, “Magma: A ground-truth fuzzing benchmark,” in Measurement and Analysis of Computing Systems, 2020, vol. 4, no. 3, pp. 1-29: ACM.
[26] Y. Li et al., “{UNIFUZZ}: A holistic and pragmatic {Metrics-Driven} platform for evaluating fuzzers,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2777-2794.
[27] J. Metzman, L. Szekeres, L. Simon, R. Sprabery, and A. Arya, “Fuzzbench: an open fuzzer benchmarking platform and service,” in 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, 2021, pp. 1393-1403.
[28] Y. Cheng, W. Chen, W. Fan, W. Huang, G. Yu, and W. Liu, “IoTFuzzBench: A Pragmatic Benchmarking Framework for Evaluating IoT Black-Box Protocol Fuzzers,” Electronics (Basel), vol. 12, no. 14, p. 3010, 2023.
[29] D. Blackwell, I. Becker, and D. Clark, “Hyperfuzzing: Black-Box Security Hypertesting with a Grey-Box Fuzzer,” Empirical software engineering : an international journal, vol. 30, no. 1, p. 22, 2025.
[30] G. Klees, A. Ruef, B. Cooper, S. Wei, and M. Hicks, “Evaluating fuzz testing,” in SIGSAC conference on computer and communications security, 2018, pp. 2123-2138: ACM.
[31] M. Schloegel et al., “SoK: Prudent evaluation practices for fuzzing,” in IEEE Symposium on Security and Privacy (SP), 2024, pp. 1974-1993: IEEE.
[32] S. Li and Z. Su, “Accelerating fuzzing through prefix-guided execution,” in Proceedings of the ACM on Programming Languages, 2023, vol. 7, pp. 1-27.
[33] L. Jiang, H. Yuan, M. Wu, L. Zhang, and Y. Zhang, “Evaluating and improving hybrid fuzzing,” in 45th International Conference on Software Engineering (ICSE), 2023, pp. 410-422: IEEE and ACM.
[34] L. Huang, P. Zhao, H. Chen, and L. Ma, “Large language models based fuzzing techniques: A survey,” arXiv preprint, vol. 2402.00350, 2024.
[35] N. Shiri Harzevili, A. Boaye Belle, J. Wang, S. Wang, Z. M. Jiang, and N. Nagappan, “A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning,” ACM Computing Surveys, vol. 57, no. 3, pp. 1-36, 2024.
[36] M. Böhme, L. Szekeres, and J. Metzman, “On the reliability of coverage-based fuzzer benchmarking,” in 44th International Conference on Software Engineering, 2022, pp. 1621-1633.
[37] J. G. Hall, “Criminal Investigations: An Interactive Experience to Improve Student Engagement and Achievement in Cybersecurity Courses,” presented at the 53rd ACM Technical Symposium on Computer Science Education, 2022.
Dr Keith Joiner CSC, PhD, MMgt, MSc(Aerosystems), BEng(Aero), CPEng, CPPD. was an Air Force aeronautical engineer, project manager and teacher for 30 years before joining the University of New South Wales to teach and research test and evaluation. As Defence’s Director-General of Test and Evaluation for four years, he was awarded a Conspicuous Service Cross and for doing drawdown plans for the Multi-National Force in Iraq, he was awarded a U.S. Meritorious Service Medal. He has testified four times to Australia’s Senate, twice in uniform and twice since. He has over 100 published research articles on assuring engineered systems, with research interests in: 1) assuring cyberworthiness, AI-enabled systems, and robotic autonomous systems, 2) using high throughput test design and complex systems governance, and 3) developing air-sea hybrid vehicles and the electrification of aircraft. Dr Joiner currently serves as the Chief Editor of the ITEA Journal of T&E and is a member of the ITEA Board of Directors.
JUNE JOURNAL
READ the Latest Articles NOW!