tin-z/roadmap_fuzzing.md

## roadmap_fuzzing.md

      
    Raw
  

              roadmap_fuzzing.md
            
          
    Roadmap to learn fuzzing


Index

1. Sanitizers
2. Intro-to-fuzzing
3. libFuzzerTutorial repo
4. ClusterFuzz
5. Oss-Fuzz
6. Fuzzing-survey
7. Libfuzzer-workshop
8. AFL++ 
8. antonio-morales workshops and tutorials on Fuzzing binaries in real case scenarios
9. Fuzzing python stuffs
10. Fuzzing native C/C++
11. Fuzzing network protocols
12. Angora fuzzer
13. Fuzzer software dev
14. Extra


Sanitizers


paper: "AddressSanitizer: A Fast Address Sanity Checker" link
summary link
usage examples link


read the documents listed here: intro-to-fuzzing


this is intended as a high-level survey of concepts which will be useful to a fuzzing novice. Links are provided to more detailed further reading.

why-fuzzing
sanitizers
good-fuzz-target
structure-aware-fuzzing
glossary
split-inputs


 - Fuzz testing is a process of testing APIs with generated data. The most common forms are:
    * Mutation based fuzzing which mutates existing data samples (aka the test corpus) to create test data;
    * Generation based fuzzing which produces new test data based on models of the input.

 - Guided fuzzing is an important extension to mutation based fuzzing. Guided fuzzers employ a feedback loop when testing newly mutated inputs. 
   If an input results in a new signal (such as increased code coverage), it is permanently added to the test corpus. 
   The corpus grows over time, therefore increasing the test coverage of the target program.

 - Fuzzing is typically used to find the following kinds of bugs
    * Bugs specific to C/C++ that require the sanitizers to catch:
      > Use-after-free, buffer overflows
      > Uses of uninitialized memory
      > Memory leaks

    * Arithmetic bugs:
      > Div-by-zero, int/float overflows, invalid bitwise shifts

    * Plain crashes:
      > NULL dereferences, Uncaught exceptions

    * Concurrency bugs:
      > Data races, Deadlocks

    * Resource usage bugs:
      > Memory exhaustion, hangs or infinite loops, infinite recursion (stack overflows)

    * Logical bugs:
      > Discrepancies between two implementations of the same protocol (example)
      > Round-trip consistency bugs (e.g. compress the input, decompress back, - compare with the original)
      > Assertion failures
 - Most of these are exactly the kinds of bugs that attackers use to produce exploits, from denial-of-service through to full remote code execution.

### Potential Fuzzing Targets ###
 - Types of projects where fuzzing has been useful:

   * Anything that consumes untrusted or complicated inputs:
    - Parsers of any kind (xml, pdf, truetype, ...)
    - Media codecs (audio, video, raster and vector images, etc)
    - Network protocols, RPC libraries (gRPC)
    - Network scanners (pmon)
    - Crypto (boringssl, openssl)
    - Compression (zip, gzip, bzip2, brotli, …)
    - Compilers and interpreters (PHP, Perl, Python, Go, Clang, …)
    - Services/libraries that consume protobuffers
    - Regular expression matchers (PCRE, RE2, libc)
    - Text/UTF processing (icu)
    - Databases (SQlite)
    - Browsers (all)
    - Text editors/processors (vim, OpenOffice)
   * OS Kernels (Linux), drivers, supervisors and VMs
   * UI (Chrome UI)

### Fuzzing Successes ###
 - Historically, fuzzing has been an extremely effective technique for finding long-standing bugs in code bases that fall into the target categories above. 
   Some trophy list examples (with a total number of tens of thousands bugs found inside and outside of Google):
   * [AFL bugs](http://lcamtuf.coredump.cx/afl/#bugs)
   * [libFuzzer bugs](http://llvm.org/docs/LibFuzzer.html#trophies)
   * [syzkaller bugs](https://github.com/google/syzkaller/blob/master/docs/found_bugs.md)
   * [go-fuzz bugs](https://github.com/dvyukov/go-fuzz#trophies)
   * [Honggfuzz bugs](https://github.com/google/honggfuzz#trophies)
   * [ClusterFuzz bugs in Chrome](https://bugs.chromium.org/p/chromium/issues/list?can=1&q=label%3AClusterFuzz+-status%3AWontFix%2CDuplicate&sort=-id&colspec=ID+Pri+M+Stars+ReleaseBlock+Cr+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&cells=tiles)
   * [OSS-Fuzz bugs](https://bugs.chromium.org/p/oss-fuzz/issues/list?q=label%3AClusterFuzz+-status%3AWontFix%2CDuplicate&can=1)
   * [Facebook’s Sapienz (UI fuzzing)](https://engineering.fb.com/developer-tools/sapienz-intelligent-automated-software-testing-at-scale/)

---

 - the basic things to remember about a fuzz target:
    * The fuzzing engine will execute it many times with different inputs in the same process.
    * It must tolerate any kind of input (empty, huge, malformed, etc).
    * It must not exit() or abort() on any input (if it does, it's a bug).
    * It may use threads but ideally all threads should be joined at the end of the function.
    * It must be as deterministic as possible. Non-determinism (e.g. random decisions not based on the input bytes) will make fuzzing inefficient.
    * It must be fast. Try avoiding cubic or greater complexity, logging, or excessive memory consumption.
    * Ideally, it should not modify any global state (although that’s not strict).
    * Usually, the narrower the target the better. E.g. if your target can parse several data formats, split it into several targets, one per format.

### Determinism ###
 - A fuzz target needs to be deterministic, i.e. given the same input it should have the same behavior. 
   This means, for example, that a fuzz target should not use rand() or any other source of randomness.  

### Speed ###
 - Fuzzing is a search algorithm that requires many iterations, and so a good fuzz target should be very fast.
    * A typical good fuzz target will have an order of 1000 executions (per second per one CPU core) on average (exec/s) or more. 
    * For lightweight targets, 10000 exec/s or more.
 - If your fuzz target has less than 10 exec/s you are probably doing something wrong. 
    * We recommend to profile fuzz targets and eliminate any obvious hot spots.

### Memory consumption ###
 - For CPU-efficient fuzzing a good fuzz target should consume less RAM than it is available on a given (virtual) machine per one CPU core.
    * There is no one-size-fits-all RAM threshold, but as of 2019 a typical good fuzz target would consume less than 1.5Gb

### Timeouts, OOMs, shallow bugs ###
 - A good fuzz target should not have any
    * timeouts (inputs that take too long to process),
    * OOMs (input that cause the fuzz target to consume too much RAM),
    * shallow (easily discoverable) bugs. Otherwise fuzzing will stall quickly.

### Seed corpus ###
 - In most cases a good fuzz target should be accompanied with a seed corpus, which is a set of representative inputs for the fuzz target. 
 - These inputs combined should cover large portions of the API under test, ideally achieving 100% coverage (different coverage metrics can be applied, 
   e.g. block coverage or edge coverage, depending on a specific case).
    * Avoid large seed inputs when smaller inputs are sufficient for providing the same coverage.

 - A seed corpus is stored as a directory where every individual file represents one input, subdirectories are allowed.
 - When fixing a bug or adding a new functionality to the API, don't forget to extend the seed corpus. 
    * Monitor the code coverage achieved by the corpus and try to keep it close to 100%.

### Coverage discoverability ###
 - It is often insufficient to have a seed corpus with good code coverage to claim good fuzzability, 
    * i.e. the ability of a fuzzing engine to discover many code paths in the API under test.

 - For example, imagine we are fuzzing an API that consumes an encrypted input, and we have a comprehensive seed corpus with such encrypted inputs. 
   This seed corpus will provide good code coverage, but any mutation of the inputs will be rejected early as broken.

 - So, it is important to ensure that the fuzz target can discover a large subset of reachable CONTROL FLOW EDGES without using the seed corpus. 
    * Tools such as Clang's source-based code coverage can assist with this process.

 - If fuzzing a given target without a seed corpus for, say, a billion iterations, does not provide coverage comparable to a good seed corpus, consider
    * Splitting the target (see Large APIs)
    * Using dictionaries
    * Using Structure-Aware Fuzzing

 - If your API consumes inputs of a specific size(s) the best way is to express it in the fuzz targer, like this:
\```
// fuzz_target.cc
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  if (Size > kMaxSize || Size < kMinSize) return 0;
\```
 - A good fuzz target does not use I/O:
    * Avoid debug output to stderr or stdout as it slows down fuzzing.
    * Avoid reading from disk other than during one-time initialization.
    * Avoid writing to disk

### Structure-Aware Fuzzing with libFuzzer ###
 - Generation-based fuzzers usually target a single input type, generating inputs according to a pre-defined grammar. Good examples of such fuzzers are:
    * csmith (generates valid C programs) 
    * Peach (generates inputs of any type, but requires such a type to be expressed as a grammar definition)

 - Coverage-guided mutation-based fuzzers, such as libFuzzer or AFL, are not restricted to a single input type and do not require grammar definitions. 
   Thus, mutation-based fuzzers are generally easier to set up and use than their generation-based counterparts. 
    * But the lack of an input grammar can also result in inefficient fuzzing for complicated input types, where any traditional mutation 
      (e.g. bit flipping) leads to an invalid input rejected by the target API in the early stage of parsing.

 - With some additional effort, however, libFuzzer can be turned into a grammar-aware (i.e. structure-aware) fuzzing engine for a specific input type.

---

## Glossary ##
 - Naming things is hard, so this page tries to reduce confusion around fuzzing-related terminology.

### Corpus (Or test corpus, or fuzzing corpus.) ###
 - A set of test inputs. In most contexts, it refers to a set of minimal test inputs that generate maximal code coverage.

### Cross-pollination ###
 - The term is taken from botany, where one plant pollinates a plant of another variety. 
 - In fuzzing, cross-pollination means using a corpus for one fuzz target to expand a corpus for another fuzz target. 
    * For example, if there are two libraries that process the same common data format, it is often benefitial to cross-pollinate their respective corpora.

### Dictionary ###
 - A file which specifies interesting tokens for a fuzz target.
 - Most fuzzing engines support dictionaries, and will adjust their mutation strategies to process these tokens together.

### Fuzz Target (Or Target Function, or Fuzzing Target Function, or Fuzzing Entry Point) ###
 - A function to which we apply fuzzing. A specific signature is required for OSS-Fuzz. Examples: openssl, re2, SQLite.

### Fuzzer ###
 - The most overloaded term and used in a variety of contexts, which makes it bad. 
 - Sometimes, "Fuzzer" is referred to a fuzz target, a fuzzing engine, a mutation engine, a test generator or a fuzzer build.

### Fuzzer Build ###
 - A build that contains all the fuzz targets for a given project, which is run with a specific fuzzing engine, in a specific build mode 
   (e.g. with enabled/disabled assertions), and optionally combined with a sanitizer. 
    * In OSS-Fuzz, it is also known as a job type.

### Fuzzing Engine ###
 - A tool that tries to find interesting inputs for a fuzz target by executing it. Examples: libFuzzer, AFL, honggfuzz, etc.
 - See related terms Mutation Engine and Test Generator.

### Mutation Engine ###
 - A tool that takes a set of testcases as input and creates their mutated versions. 
 - It is just a generator and does not feed the mutations to fuzz target. 
    * Example: radamsa (a generic test mutator).

### Reproducer (Or Test Case.) ###
 - A test input that can be used to reproduce a bug when processed by a fuzz target.

### Sanitizer ###
 - A dynamic testing tool that can detect bugs during program execution. Examples: ASan, DFSan, LSan, MSan, TSan, UBSan.

### Seed Corpus ###
 - A small initial corpus prepared with the intent of providing initial coverage for fuzzing. 
 - Rather than being created by the fuzzers themselves, seed corpora are often prepared from existing test inputs or may be hand-crafted 
   to provide interesting coverage. 
    * They are often checked into source alongside fuzz targets.

### Test Generator ###
 - A tool that generates testcases from scratch according to some rules or grammar. 
    * Examples: csmith (a test generator for C language), cross_fuzz (a cross-document DOM binding test generator).

### Test Input ###
 - A sequence of bytes that is used as input to a fuzz target. Typically, a test input is stored in a separate file.


---


libFuzzerTutorial


durante il tutorial dovrai usare 'https://github.com/google/fuzzer-test-suite/' che dovrai salvare come FTS su ~


try clang code coverage


ClusterFuzz pdf, talk, repo


Oss-Fuzz link


read "The Art, Science, and Engineering of Fuzzing:A Survey" paper link


repo


Now follow the workshop libFuzzer-workshop


Maybe ignore the first 3-4 challenges, because are the same ones of libFuzzerTutorial


From here, now move on AFL++


afl-overview

aggiungi anche https://github.com/AFLplusplus/AFLplusplus/tree/stable/docs, (e.g. docs/life_pro_tips.txt, docs/README.md, llvm_mode/README.md)


segui training afl-training, video, slide


https://aflplus.plus/docs/tutorials/libxml2_tutorial/


leggi questo dopo il secondo target (openssl) AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md, infatti capirai un paio di cose


leggi historical notes


leggi technical_details


(part 2) antonio-morales workshops and tutorials


EkoParty_Advanced_Fuzzing_Workshop, video


Mentre segui i workshops, tieni aperti i seguenti tab:


https://github.com/mykter/afl-training/tree/main/quickstart


https://github.com/mykter/afl-training/tree/main/harness


https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md


https://github.com/google/fuzzing/blob/master/docs/afl-based-fuzzers-overview.md


https://aflplus.plus/docs/tutorials/libxml2_tutorial/


https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md


https://securitylab.github.com/research/fuzzing-challenges-solutions-1/


https://securitylab.github.com/research/fuzzing-software-2/


https://github.com/antonio-morales/Hackfest_Advanced_Fuzzing_Workshop


https://github.com/AFLplusplus/AFLplusplus/blob/master/docs/custom_mutators.md


https://github.com/puppet-meteor/MOpt-AFL


(part 3) Fuzzing in a real case scenario


https://github.com/antonio-morales/Fuzzing101 [in progress]


CSE 598: Applied Vulnerability Research - Final Presentations


more knowledge:


https://sec-consult.com/fileadmin/user_upload/sec-consult/Dynamisch/Blogartikel/2017_11/the_art_of_fuzzing_slides.pdf


afl-plot,collect, https://0x00sec.org/t/fuzzing-projects-with-american-fuzzy-lop-afl/6498


https://securitylab.github.com/research/vlc-vulnerability-heap-overflow/


Fuzzing python stuffs


replicate fuzzing cpython


replicate fuzzing python modules:


atheris


atheris - native modules fuzzing


cpytraceafl


hacking python bytecode tracing method


pillow harness


Fuzzing native C


AFL workflow


readelf, replicate example


binutils, replicate example


Fuzzing network protocols


Tips for fuzzing network programs with AFL, link


Examples:


fuzzare protocolli network intercettando robe con 'preeny' example


fuzzare protocolli network usando 'inetd mode' example


replicate example


fuzzing-sockets-FTP, proFTPD repo mod.


fuzzing-apache, Apache-HTTP-Fuzzing repo


Pulsar


Angora fuzzer


Read its paper here


Source code here


Repeat the workshop, but this time using angora

Do at least quickstart example and libxml2 challenge
Angora is really bad at finding vulnerabilities


If you have issues with DFSan llvm use PIN mode instead

libdft64


Fuzzer software dev


Fuzzing-Like-A-Caveman
Myfuzzer blog series
...


Extra


FuzzingPaper, paper_collection


fuzzing php, link


PolyGlot, a fuzzing framework for language processors, link, paper


Binary only fuzzing,


AFL binary instrumentation, link [COOL]


StochFuzz


others, awesome-AFL


https://github.com/bnagy/crashwalk


https://github.com/pwndbg/pwndbg/blob/dev/FEATURES.md


https://github.com/GJDuck/e9patch


https://github.com/GJDuck/e9syscall


fuzzinglabs.com primers


Books:


Attacking Network Protocols


Fuzzing: Brute Force Vulnerability Discovery 1st Edition