FAQ

What's fuzzing?

Fuzz testing is a dynamic testing method to identify bugs and vulnerabilities early in software development. It enables developers to ensure secure and stable software deployment by executing programs with invalid or random inputs. With these inputs, fuzzers uncover potential crashes and offer detailed feedback on code coverage. This process not only detects bugs but also generates additional inputs to reproduce Findings, thus maximizing code coverage without false positives.

Memory unsafe languages

Fuzzing is particularly effective when applied to memory unsafe languages like C and C++. These languages are more prone to memory-related vulnerabilities, such as buffer overflows and uninitialized memory usage. Fuzzing can identify these issues early in the development cycle, enabling developers to fix them before they attackers exploit them. Furthermore, the process of fuzzing can reveal complex code paths or edge cases that other testing methods might miss.

Memory safe languages

While memory safe languages like Java, JavaScript, Python, or Rust have built-in mechanisms to prevent many common memory-related vulnerabilities, fuzzing can identify logic errors, incorrect assumptions, or other application-specific vulnerabilities that might not be related to memory management. Additionally, fuzzing can uncover issues related to third-party libraries or dependencies, which could introduce security risks. For applications developed in memory safe languages, fuzzing contributes to increased reliability, performance, and security by discovering and allowing developers to address these potential weaknesses.

What goals can fuzzing achieve?

Fuzzing is a versatile and effective testing technique that a wide range of programming languages and applications can profit from. Regardless of whether a language is memory safe or memory unsafe, fuzzing aids in achieving key goals such as strengthening security, enhancing stability, and ensuring robustness. By incorporating fuzzing into the software development process, developers can proactively identify and resolve vulnerabilities, leading to more reliable and secure applications.

What results can you expect from fuzzing?

Fuzzing provides valuable feedback and output in the form of uncovered software vulnerabilities and inconsistencies. As you input a variety of unexpected or random data into your system, the system responds, often in unforeseen ways. These responses can reveal hidden bugs, exceptions, memory leaks, or even system crashes. Some fuzzing tools offer detailed reports on each fuzzing session, including information on input data that led to unexpected behavior, system status, and potential vulnerabilities discovered.

It's essential to remember that fuzzing doesn't assure you of the absence of bugs but instead reveals their presence. Its strength lies in its capacity to expose unforeseen edge cases that could lead to potential security breaches or system crashes. Interpreting the output from fuzzing requires a solid understanding of the SUT and an analytical mindset to determine the potential impact of the identified issues.

How long should a fuzz test run?

The ideal duration for a fuzz test depends on various factors such as the complexity of the target application, the resources available, and the specific goals of the testing process. As a general rule of thumb, you can say the longer you fuzz, the higher the chances to discover bugs and vulnerabilities. Considering some key factors can help guide you in deciding how long to run your fuzz tests:

Complexity of the target application: Larger and more complex applications typically require longer fuzzing sessions to effectively explore various code paths and potential vulnerabilities. A more comprehensive testing process increases the likelihood of discovering hidden issues and ensuring the application’s robustness.
Available resources: The amount of time you can allocate to fuzzing depends on your available resources, such as computing power and personnel. With limited resources, you need to prioritize certain components of the application or focus on specific vulnerabilities. You can also consider using distributed fuzzing to scale your testing efforts and reduce the time required.
Goals and priorities: Your fuzz testing objectives influence the duration of the test. If you’re aiming to meet specific security requirements or achieve a certain level of code coverage, you may need to run fuzz tests for a longer duration to achieve those goals. On the other hand, if you’re running fuzz tests as part of a continuous integration process, shorter, more frequent fuzzing sessions may be more appropriate.

Guidelines for fuzzing duration

As a starting point, consider running individual fuzz tests for at least 24 hours. This often provide sufficient time to discover many common issues. However, for more critical or complex applications, it’s not uncommon to run fuzz tests for several weeks or even months. Continuously monitoring the progress of your fuzz testing and analyzing the results can help you determine if you need to add more time.

For ongoing development projects, integrating fuzz testing into your CI/CD pipeline can help ensure that newly introduced code is regularly tested. By continuously fuzzing your application, you can proactively identify and fix vulnerabilities before they make their way into production.

To make the most of your fuzz testing efforts, consider starting with a baseline duration and then adjust as needed based on your Finding and priorities. Continuous fuzz testing can help ensure the ongoing security and stability of your application as fuzz testing isn't a one-shot solution.

The metric “x since last new branch“ provides a good indicator of the saturation of a fuzz test. If a fuzz test runs for three days without finding new paths for two and a half days, it's unlikely that it finds a new path after more time passes. Maybe it already discovered all paths or the fuzz test can't reach specific paths. Review the coverage of the fuzz test to determine the cause and improve it.

How often should you run a fuzz test?

Many factors play a role in the estimation:

Magnitude and complexity of the software: The bigger and more complex the software, the more often it should be fuzzed. With each new fuzz run, the coverage likely increases and with that the likelihood of finding hidden bugs.
New or old software: New software should be fuzzed often and continuously as it changes and expands frequently. Older software should be fuzzed if something changes.
Feature frequency: Software projects that require many changes and/or new features in a short period of time should be tested in an aligned manner. Bugs should be found and fixed immediately after the implementing a feature.
The later they're found, the more effort it takes to fix them.
Project resources and budget: More fuzzing likely leads to finding bugs earlier. Finding and fixing bugs late in the project is expensive and can lead to additional costs due to reallocation of resources and budgets.

Generally, fuzzing can be done at the unit test-, system test level or at least before a major release.

At the unit test level, it's best to integrate fuzzing into the CI/CD pipeline, for example triggering a fuzzing run with each pull request.

At the system test level, you can fuzz less frequently but you can fuzz the entire system with it. However, the fuzz duration is significantly longer here, and it could be expensive to fix a bug at that point in time.

You can automate fuzzing to a great extent, so that it can be used as a regression test in many cases.

What's a good coverage percent for a project?

There is no absolute target percentage as it depends on the requirements of the project.

Higher code coverage generally indicates that the fuzzer is going through more paths and therefore has a higher likelihood of discovering bugs.

For example, if the coverage should be 80% for a project with 1000 lines of code, this would be only 800 lines of code, which isn't difficult to achieve. For a project with 1,000,000 lines of code, the coverage would be 800,000 lines of code which is very optimistic and not easily achievable. Therefore, for a project with over a million lines of code, even small percentage of coverage is good coverage.

While this is a useful metric to measure the effectiveness of fuzzing, it shouldn't be the main criteria. Improving the corpus and fuzz test with a more real-word scenario can be much more effective at finding more critical bugs.

How to improve code coverage?

To improve code coverage, it's helpful to generate a source line coverage report and investigate the report in detail. You can specify the report format to be able to view it in a browser or your IDE. You can create a coverage report in CI Fuzz with the following command:

cifuzz coverage <fuzz_test> 

When reviewing code coverage, analyze why a specific code part is no covered:

The executed fuzz test is theoretically unable to reach the code. Write more tests or adjust the existing tests to reach new code parts.
The fuzzing time was too short. Fuzzing isn't deterministic and the fuzzing engine might have focused on another part of the code during that specific run. Check the fuzzer statistics and when it found the last new path. If coverage during the run is still increasing, that's a good indicator to allow the fuzzer to continue running until coverage plateaus.
The SUT expects complex input data and edge cases are never covered by the fuzzer. Value profiling can help the fuzzer get more precise feedback from the application. If you want to quickly increase the code coverage, it's suggested to add seed corpora and a dictionary for the fuzz test to help the fuzzer generate syntactic and semantically valid inputs.
The fuzzer could regularly run into a specific blocker. This can be an issue in the tested code that the fuzzer re-discovers regularly which you need to fix to unblock the fuzzer. You can also aid the fuzzer by including checksums to verify the integrity of inputs.

How to determine a good fuzz test?

Determinism: A good fuzz test needs to have the same behavior given the same input and shouldn't use any additional source of randomness.
Speed: A good fuzz test should be efficient since fuzzing requires many iterations.
Memory consumption: For CPU-efficient fuzzing, the fuzz test should consume less RAM than available on the given machine per one CPU core.
Coverage discoverability: It's important to ensure that the fuzz test can discover a large subset of reachable control flow edges without using the seed corpus. If a fuzz test without seed corpus doesn't provide coverage comparable to with a seed corpus, consider splitting it into smaller tests and use dictionaries or structure-aware fuzzing.
I/O: A good fuzz test shouldn't use I/O because it can introduce non-deterministic behavior and make Findings harder to reproduce. Avoid debugging output to StdErr or StdOut as it slows down fuzzing. Avoid writing to a disk generally and reading from a disk other than during initialization.

See the Google fuzzing guide for further information.

What types of bugs does the fuzzer catch automatically?

Each fuzzer typically has a set of sanitizers or bug-detectors to automatically detect specific types of bugs.

LibFuzzer

LibFuzzer works in conjunction with several sanitizers.

AddressSanitizer (ASan) is capable of finding various memory related bugs:

Out-of-bounds accesses to heap, stack, and globals
Use-after-free, use-after-return, use-after-scope
Double-free, invalid free
Memory leaks (experimental)

UndefinedBehaviorSanitizer (UBSan) can detect several types of undefined behavior at runtime:

Array subscript out of bounds, where the bounds can be statically determined
Bitwise shifts that are out of bounds for their data type
Dereferencing misaligned or null pointers
Signed integer overflow
Conversion to, from, or between floating-point types which would overflow the destination

Other sanitizers:

ThreadSanitizer (data races)
MemorySanitizer (uninitialized reads)
LeakSanitizer (memory leaks)
DataFlowSanitizer (data flow analysis, no bugs)

Jazzer

Since Java is a memory safe language, there are no sanitizers that focus on memory corruption bugs but Jazzer can detect uncaught exceptions, memory leaks and infinite loops that can lead to DoS attacks.

It can also detect a variety of bug classes common to the Java software ecosystem:

Deserialization - unsafe deserialization that leads to attacker-controlled method calls
Expression Language Injection - injectable inputs to an expression language interpreter
LDAP Injection - LDAP DN and search filter injections
Naming Context Lookup - JNDI lookups such as log4j
Command Injection - unsafe execution of OS commands using ProcessBuilder
Reflective Call - unsafe calls that lead to attacker-controlled class loading
Regular Expression Injection - regular expression based injection that lead to OOM
Script Engine Injection - insecure user input in script engine invocation
Server Side Request Forgery - unsafe network connection attempts
SQL Injection - SQL injections
XPath Injection - XPath injections

Jazzer.js

JavaScript is also memory safe, so Jazzer.js focuses similar to Jazzer on detecting uncaught exceptions, memory leaks and infinite loops. Through its sanitizers Jazzer.js can also detect a variety of bug classes common to the JavaScript software ecosystem:

Command Injection - unsafe execution of OS commands using ProcessBuilder class
Path Traversal - inputs triggering the access of files and directories outside the current one

How to assess Findings?

You can list all local Findings CI Fuzz detected with the following command:

cifuzz findings

You can review detailed information to a specific Finding, including stack trace, sanitizer details, severity, description & mitigation, with the following command:

cifuzz finding <name>

In general, you can assess all Findings discovered by a fuzzer by analyzing the generated stack trace and crashing input. For every Finding, the fuzzer generates a crashing input that can reproduce the issue. You can use this as a regression test after you implemented a fix for the Finding.

Static assessment of Finding details is often insufficient and requires additional dynamic analysis. Attach a debugger during execution of the fuzz test running with the crashing input of the Finding to create stack traces. These are valuable in the debugging process to identify interesting code positions to place breakpoints and investigate the memory state.

How to continue fuzzing after a Finding is triggered?

Java

--keep_going=<numbers of crashes allowed>

You can also set this in the cifuzz.yaml instead of adding the flag to the command. The configuration applies for each local fuzz test.

engine-args:
- --keep_going=10

C/C++

For C/C++, use the following environment variable:

ASAN_OPTIONS=detect_leaks=0:halt_on_error=0

Due to technical limitations, certain bugs can still cause the fuzzing engine to stop or affect the overall behavior and cause false positive.

Which code to instrument and which code to ignore?

When it comes to fuzzing, deciding which code to instrument and which code to ignore is critical to maximizing the performance and effectiveness of your fuzzing campaign. Proper instrumentation can lead to better code coverage, more efficient execution, and improved vulnerability discovery.

Code coverage vs. bug detection instrumentation

Instrumentation has two primary goals: maximizing code coverage and detecting bugs. Most often, these goals are distinct and subject to careful considerations.

Code coverage

Instrumentation for code coverage guides the fuzzer while exploring as many code paths as possible within a target application. The fuzzer collects information about code execution paths and uses this to generate new inputs to trigger previously unexplored paths. To maximize code coverage, it's essential to instrument critical and complex parts of an application, as well as any custom libraries or components.

Depending on the fuzzer and its configuration, there are different granularity of code coverage collection metrics, for example function-, basic block-, or instruction-based metrics which can all impact the performance. To minimize this impact, prioritize instrumenting the most critical parts of your application first. Avoid instrumentation of widely tested third-party libraries or system libraries to reduce overhead and improve the overall performance of a fuzzing campaign. If you suspect vulnerabilities in third-party libraries, you may choose to instrument them selectively on top of your own application.

Bug detection

To maximize the effectiveness of a fuzzing campaign in discovering vulnerabilities, you can add additional code instrumentation. This is typically worth the significant performance impact.

An example for C/C++ applications is the AddressSanitizer, which introduces on average a 2x slowdown on its own. It’s advisable when using bug detection tools, to primarily instrument code that's more likely to contain memory-related vulnerabilities, such as custom memory management implementations or components that handle complex data structures.

If a partial instrumentation isn't possible, avoid stacking multiple bug detectors in a single fuzzing campaign. Instead, run multiple fuzzing campaigns either in parallel or in sequence, each with a different bug detector instrumentation.

Deciding which code to instrument and which code to ignore during a fuzzing campaign is crucial to optimize performance and vulnerability discovery. By focusing on instrumenting critical components, security-sensitive code, and areas prone to memory-related issues, you can maximize the effectiveness of your fuzzing efforts. Balancing the goals of code coverage and bug detection, along with being mindful of the impact on execution performance, helps to ensure a successful fuzzing campaign. Code coverage instrumentation should have a higher prioritization than specific bug detection capabilities, because it enabled the fuzzer to hit deep code paths and actually reach code areas instrumented with a bug detector.

What's a dictionary?

If the code expects certain keywords, formats, or syntax as input, you can provide these with a dictionary file to the fuzzing engine. This can improve the speed at which the fuzzing engine generates new inputs that trigger new code paths during execution. You can find examples of Dictionaries for various input types and formats in this GitHub repository.

What's a corpus?

A corpus is a set of test input values that trigger code paths in the target code. These come in two primary forms:

Seed corpus: A small corpus hand-crafted or generated to trigger initial code path execution at the first fuzz test execution. If you're fuzzing a well known protocol, any sample data from real world examples can be helpful to assist the fuzzer to quickly generate syntactic and semantic valid input data. You can find a number of corpora for various data formats in this GitHub repository.
Test or fuzzing corpus: In addition to the seed corpus, whenever the fuzzing engine generates a new input that leads to new code paths, it saves this input for further mutation and uses it in future runs. At the beginning of every fuzz test, then engine re-evaluates the test corpus to check which inputs still lead to unique code paths and takes only those into account for mutations in the run.

Corpora can slow down the fuzzing process if incorrectly selected or tuned. If the inputs from the corpus don’t lead to triggering unique code paths, these executions are “wasted”. "Corpus minimization" is the process of tuning the corpora to only include relevant inputs that lead to unique paths and ensures efficient fuzzing runs.

How to mock/replace functions?

When fuzz testing projects, it may be necessary to mock or replace functions, similar to unit testing. There are two main reasons for mocking functions:

The SUT depends on external functions from other software that are outside the scope of the fuzzing setup
The SUT contains functions that can't or shouldn't be executed in their original form when fuzzing. For example, functions with specific hardware dependencies that aren't satisfied on the hardware you are running the fuzz tests on.

Creating mocks for fuzz testing

Starting with a simple approach and improving over time is often the best way to create mocks for fuzz testing. You can create mocks by defining a function that always returns a default value, such as 0. If this method doesn't achieve the desired goals, you can adjust the mocks to return data taken from the input generated by the fuzzer instead of returning fixed values.

Replacing functions in C/C++ projects

You can replace already existing function definitions without compiler or linker errors with the following techniques. The appropriate approach depends on the type of the function you want to replace.

Wrapping functions
- Works well for statically linked functions called from other files
- To wrap a function, add linker flags to the project
- The function mangling in C++ can be confusing
- Allows to call the original function
Overwriting functions
- Works well for statically linked functions even if called from within the same file
- Requires that the symbols of the functions to be replaced, are weak (use tools like objdump for this)
- The original function can't be called
LD_PRELOAD
- Works well for dynamically linked functions
- Requires the LD_PRELOAD environment variable when executing the fuzz test
- Allows to call the original function

If none of the preceding techniques are applicable, as a last resort, change the code of the SUT to support fuzzing.

How to use the `FuzzedDataProvider`?

By default, the fuzzer provides an array of raw bytes to the fuzz test. It would be up to the fuzz test creator to allocate and convert the byte array as needed. The FuzzedDataProvider (FDP) is a class that provides the ability to easily split the input data generated by the fuzzer. This is useful whenever you need to easily generate different data types (string,int, float, etc.) as part of your fuzz test. An FDP is provided for multiple languages / fuzzers.

FDP data types

All implementations of the FDP provide similar convenience functions for most common data types (string,int, float, etc.). You can use a variety of methods:

If you need a single value, you can use methods like ConsumeBool, ConsumeIntegral or ConsumeFloatingPoint. Many of these methods allow you to specify a range of valid values as well.
If you need a sequence of bytes, you can use methods like ConsumeBytes, ConsumeBytesAsString.
There are also methods that include the remaining word. ConsumeRemainingBytesAsString returns a string from the bytes remaining in the fuzzing input buffer. Call these methods as the last FDP method in the fuzz test, because the buffer is empty after the call.
If you need to pick a random value from an array, you can use PickValueInArray methods.

C++

The FDP for C++ is a single-header library that's included as part of LLVM. You can include it in your fuzz test with: #include <fuzzer/FuzzedDataProvider.h>. To create the FDP object, pass it the raw fuzzer input buffer and size. Below is an example of including, creating, and using an FDP object:

my_fuzz_test.cpp
#include "src/explore_me.h"
#include <cifuzz/cifuzz.h>
#include <fuzzer/FuzzedDataProvider.h>
#include <stdio.h>

FUZZ_TEST_SETUP() {}

FUZZ_TEST(const uint8_t *data, size_t size) {

  //create FuzzedDataProvider object from fuzzer input buffer and size
  FuzzedDataProvider fuzzed_data(data, size);
  //create an int from FDP
  int a = fuzzed_data.ConsumeIntegral<int>();
  //create a 2nd int from FDP
  int b = fuzzed_data.ConsumeIntegral<int>();
  //generate a string from the remaining bytes in the input buffer
  std::string c = fuzzed_data.ConsumeRemainingBytesAsString();

  exploreMe(a, b, c);
}

Java

The FDP for Java is part of Jazzer. You can find the Javadocs for FDP here. To use the FDP, just import it in your fuzz test and create a method that accepts an FDP as an argument:

FuzzTestCase.java
package com.example;

import com.code_intelligence.jazzer.api.FuzzedDataProvider;
import com.code_intelligence.jazzer.junit.FuzzTest;

public class FuzzTestCase {
    @FuzzTest
    void myFuzzTest(FuzzedDataProvider data) {
        //create an int from FDP
        int b = data.consumeInt();
        //generate a string from the remaining bytes in the input buffer
        String c = data.consumeRemainingAsString();

        ExploreMe ex = new ExploreMe(a);
        ex.exploreMe(b, c);
    }
}

JavaScript

Jazzer.js provides an implementation of FuzzedDataProvider.

To use the FDP, just import it and create a function that accepts the raw fuzzer input. Create a FuzzedDataProvider object from the raw fuzzer input.

const { FuzzedDataProvider } = require("@jazzer.js/core");

FuzzTestCase.fuzz.js
/**
 * @param { Buffer } fuzzerInputData
 */
module.exports.fuzz = function (fuzzerInputData) {
    const data = new FuzzedDataProvider(fuzzerInputData);
    //create a string with max length between 10-15 and utf-8 encoding
    const s1 = data.consumeString(data.consumeIntegralInRange(10, 15), "utf-8");
    //consume 1 byte to create an unsigned integer
    const i1 = data.consumeIntegral(1);
    //consume 2 bytes to create an unsigned integer
    const i2 = data.consumeIntegral(2);
    //consume 4 bytes to create an unsigned integer
    let i3 = data.consumeIntegral(4);

    if (i3 === 1000) {
        if (s1 === "Hello World!") {
            if (i1 === 3) {
                if (i2 === 3) {
                    throw new Error("Crash!");
                }
            }
        }
    }
};

What's fuzzing?​

Memory unsafe languages​

Memory safe languages​

What goals can fuzzing achieve?​

What results can you expect from fuzzing?​

How long should a fuzz test run?​

Guidelines for fuzzing duration​

How often should you run a fuzz test?​

What's a good coverage percent for a project?​

How to improve code coverage?​

How to determine a good fuzz test?​

What types of bugs does the fuzzer catch automatically?​

LibFuzzer​

Jazzer​

Jazzer.js​

How to assess Findings?​

How to continue fuzzing after a Finding is triggered?​

Java​

C/C++​

Which code to instrument and which code to ignore?​

Code coverage vs. bug detection instrumentation​

Code coverage​

Bug detection​

What's a dictionary?​

What's a corpus?​

How to mock/replace functions?​

Creating mocks for fuzz testing​

Replacing functions in C/C++ projects​

How to use the FuzzedDataProvider?​

FDP data types​

C++​

Java​

JavaScript​

What's fuzzing?

Memory unsafe languages

Memory safe languages

What goals can fuzzing achieve?

What results can you expect from fuzzing?

How long should a fuzz test run?

Guidelines for fuzzing duration

How often should you run a fuzz test?

What's a good coverage percent for a project?

How to improve code coverage?

How to determine a good fuzz test?

What types of bugs does the fuzzer catch automatically?

LibFuzzer

Jazzer

Jazzer.js

How to assess Findings?

How to continue fuzzing after a Finding is triggered?

Java

C/C++

Which code to instrument and which code to ignore?

Code coverage vs. bug detection instrumentation

Code coverage

Bug detection

What's a dictionary?

What's a corpus?

How to mock/replace functions?

Creating mocks for fuzz testing

Replacing functions in C/C++ projects

How to use the `FuzzedDataProvider`?

FDP data types

C++

Java

JavaScript