Coverage guided fuzzing for native Android libraries (Frida & Radamsa)

# Intro Recently I have been getting into userland application testing on Android. I want to credit `Iddo` and `Jacob` for their excellent course on attacking `IM Applications` which I took at `zer0con`. As a result, I'm currently pivoting some of my research interests. Android userland fires all my receptors. - There is `Java` which is just like `.NET` from an analysis perspective - There is `Frida` for dynamic `Java` instrumentation - There is a bridge to interface with native application libraries All in all, it is a great playground to work on. If you are interested in userland full-chain I do recommend the `IM` course. `OffensiveCon` is too late at this point obviously but they have upcoming courses at `REcon` and `Hexacon`. - REcon (2024) - [here](https://recon.cx/2024/training.html#ATTACKINGINSTANTMESSAGINGAPPLICATIONS) - Hexacon (2024) - [here](https://www.hexacon.fr/trainer/bech_eldor/) # The Will and the Word Ok, what are we doing here today? Great question! I was reading [a post](https://blog.quarkslab.com/android-greybox-fuzzing-with-afl-frida-mode.html) by [Quarkslab](https://twitter.com/quarkslab) on fuzzing Android applications, on-device, using `AFL++ Frida mode`. The post is very good and I plan to get the same environment set up locally. However, in a first instance, I wanted to replicate the results using a slightly different approach. `Quarkslab` provides an `apk` we can use to test our `fuzzer` which is helpful. Our game-plan is to use `Frida` to orchestrate `Java` function calling and `Stalker` to generate coverage feedback for the native library. We, of course, need to mutate our inputs and for that we will use `Radamsa`. ``` I charge thee, Sir Knight, to care for this tree. It hath grown here to renew thy faith and trust. Thy debt to it must be paid with tender and loving attention to its needs. In time it will bear fruit, and thou wilt gather the fruit and give it freely to any who ask it of thee. For thy soul's sake, thou wilt refuse none, no matter how humble. As the tree gives freely, so shalt thou. -Belgarath, The Eternal Man ``` # Setup We have some setup to do for our environment to be ready. Let's tackle that first. #### Radamsa In my case, I am on `mac arm64` and there are no releases for `Radamsa` (or python packages) so we have to [build a binary](https://gitlab.com/akihe/radamsa/-/releases) that we can run ourselves. ``` curl -s https://gitlab.com/akihe/radamsa/uploads/d774a42f7893012d0a56c490a75ae12b/radamsa-0.7.c.gz | gzip -d | cc -O2 -x c -o radamsa - ``` ![[fuzz_1.png]] Good, later we will have to duct tape case mutation into our fuzzer but it will do. #### Android native library We are using a physical phone attached with `USB debugging`, you can see some information below: ``` b33f@p0wn % adb shell "getprop | grep -E 'ro.product.model|ro.vendor.product.cpu.abilist64 |ro.build.description'" [ro.build.description]: [panther-user 14 AP1A.240405.002 11480754 release-keys] [ro.product.model]: [Pixel 7] [ro.vendor.product.cpu.abilist64]: [arm64-v8a] ``` I installed the `apk` from the [github repository](https://github.com/quarkslab/android-fuzzing/blob/main/apk/qb.blogfuzz.apk) using `adb` which worked fine and I could launch the app on my phone but I noticed that the native library was not loaded. You can see the effect below. ```js // Get module base address let base = Module.findBaseAddress("libblogfuzz.so"); console.log("[?] libblogfuzz base: " + base); // Loop loaded modules and print the name and base address Process.enumerateModules({ onMatch: function(module) { if (module.name.includes("blog")) { console.log("[>] " + module.name + ": " + module.base); } }, onComplete: function() { console.log("[+] Module enumeration complete."); } }); ``` I filter the module list because there are a lot but you can use this to get all loaded modules of course. ![[fuzz_2.png]] Curious, I looked in the application directory under `lib` and there is no library on disk. ``` panther:/data/app # find /data/app/ -name libblogfuzz.so panther:/data/app # find /data/app/ -type d -name '*qb.blogfuzz*' /data/app/~~B2Py8GOoE8ay13Gr-6ZJHw==/qb.blogfuzz-WMzbAMgbPiQMdL-ISSJX_g== panther:/data/app # cd /data/app/~~B2Py8GOoE8ay13Gr-6ZJHw==/qb.blogfuzz-WMzbAMgbPiQMdL-ISSJX_g== panther:/data/app/~~B2Py8GOoE8ay13Gr-6ZJHw==/qb.blogfuzz-WMzbAMgbPiQMdL-ISSJX_g== # ls base.apk lib panther:/data/app/~~B2Py8GOoE8ay13Gr-6ZJHw==/qb.blogfuzz-WMzbAMgbPiQMdL-ISSJX_g== # ls lib/arm64/ ``` I tried some silly things like manually writing the library in the `lib` folder and then loading it. This does work but it doesn't get at the heart of the problem. We can tell what is really happening here if we use `Frida` to list all loaded classes, like so: ```js Java.perform(function() { Java.enumerateLoadedClasses({ onMatch: function(className) { // If class starts with 'qb.blogfuzz', print it if (className.startsWith('qb.blogfuzz')) { console.log("[>] App class loaded --> " + className); } }, onComplete: function() { console.log('Class enumeration complete.'); } }) }); ``` Again, note that we filter here on the relevant prefix to avoid getting spammed. ![[fuzz_3.png]] The problem comes into focus now, if we look at the `apk` in `jadx` there is a class, `qb.blogfuzz.NativeHelper`, that is actually responsible for loading the native library. ![[fuzz_4.png]] Clearly if that class isn't loaded then our module will not be loaded either. Typically classes get loaded when they are first accesses so we can do a little trick here using a `js promise` to load the class and then continue execution based on that promise being resolved. ```js Java.perform(function() { function loadLibraryAsync() { return new Promise((resolve, reject) => { try { Java.use("qb.blogfuzz.NativeHelper"); Java.use("qb.blogfuzz.Wrapper"); resolve(); } catch (e) { console.error("[!] Error:", e); reject(e); } }); } // Await the promise to load the library loadLibraryAsync().then(function() { // Here the class is loaded, so the library is loaded let base = Module.findBaseAddress("libblogfuzz.so"); console.log("[?] libblogfuzz base: " + base); // Print all export names and addresses let exports = Module.enumerateExportsSync("libblogfuzz.so"); exports.forEach(function(exp) { console.log("[>] " + exp.name + ": " + exp.address); }); }).catch(function(error) { console.error("[!] Error:", error); }); }); ``` Notice that I am also loading a different class `qb.blogfuzz.Wrapper` because it has some functions we may need to call the `fuzzMe` native interfaces. ![[fuzz_5.png]] We could reimplement these purely in `Frida` but why not use the `Java` functions in the `apk` as our harness. Eventually, I did not target the `Wrapper` variation but this way we have the ability if we need it. Now when we run the script everything looks good! ![[fuzz_6.png]] You may notice that there is a slight delay when you do this. For our fuzzing purposes that doesn't matter because the delay only happens on launch and we will fuzz the app while it is running. # What izz we doing tho? We should briefly outline the challenge here. The native library has two exports that can be called from the `apk`, these exports then each call into a `fuzzMe` sub-function. First, `Java_qb_blogfuzz_NativeHelper_fuzzMeArray` (note the typical naming convention) takes an array of course. ![[fuzz_7.png]] And second, `Java_qb_blogfuzz_NativeHelper_fuzzMeWrapper` passes in an object and then calls a method to get an array back from it. ![[fuzz_8.png]] You can read the `Quarkslab` [post](https://blog.quarkslab.com/android-greybox-fuzzing-with-afl-frida-mode.html) to understand more about why this setup is used. There isn't really any difference, we could use either. I will note that the `fuzzMeWrapper` would have more overhead because in `Java` we have to make the `object` first and in native code we also have to perform extra steps. We will deal with the `fuzzMeArray` version only in the rest of this post. Finally, the `fuzzMe` function itself, which is what we will fuzz. This is just a big tree basically. If the input `byte array` is equal to `Quarksl4bfuzzMe!` then it will trigger a `null pointer` exception. Each correct character of the array will result in more blocks being executed. ![[fuzz_9.png]] # Hooking & Java & Coverage Let's first do some groundwork to make sure we can: - Hook the native, `Java_qb_blogfuzz_NativeHelper_fuzzMeArray` function (we can also hook `fuzzMe` directly). - Trigger the native function from the `apk` with a custom buffer using `Frida`. We use the `apk` as our harness. Calling the interface is pretty easy, we only have to invoke `NativeHelper->fuzzMeArray`. ```cs public native void fuzzMeArray(byte[] bArr); ``` We can adjust our code as shown below. Note that we have some extra things going on because we are taking a `String` and turning it into `Byte[]`. ```js Java.perform(function() { function loadLibraryAsync() { return new Promise((resolve, reject) => { try { Java.use("qb.blogfuzz.NativeHelper"); Java.use("qb.blogfuzz.Wrapper"); resolve(); } catch (e) { console.error("[!] Error:", e); reject(e); } }); } // Await the promise to load the library loadLibraryAsync().then(function() { // Get the address of the native function let pfuzzMeArray = Module.findExportByName("libblogfuzz.so", "Java_qb_blogfuzz_NativeHelper_fuzzMeArray"); console.log("[+] Address of fuzzMeArray:", pfuzzMeArray); // Hook the native function Interceptor.attach(pfuzzMeArray, { onEnter: function(args) { console.log("[>] fuzzMeArray()"); }, onLeave: function(retval) { console.log("[<] fuzzMeArray()"); } }); // Call the Java function Java.perform(function() { let String = Java.use("java.lang.String"); let StandardCharsets = Java.use("java.nio.charset.StandardCharsets"); let javaString = String.$new("AAABBBCCC"); // Quarksl4bfuzzMe! let buffer = javaString.getBytes.overload("java.nio.charset.Charset").call(javaString, StandardCharsets.US_ASCII.value); let NativeHelper = Java.use("qb.blogfuzz.NativeHelper"); // Define a new implementation for the method NativeHelper["fuzzMeArray"].implementation = function (bArr) { console.log(`NativeHelper.fuzzMeArray is called: bArr=${bArr}`); this["fuzzMeArray"](bArr); }; // Create an instance of NativeHelper let nativeHelperInstance = NativeHelper.$new(); // Call the method multiple times for (var i = 0; i < 5; i++) { nativeHelperInstance.fuzzMeArray(buffer); } }); }).catch(function(error) { console.error("[!] Error:", error); }); }); ``` ![[fuzz_10.png]] You may wonder why we are doing this stuff with `java.nio.charset.StandardCharsets` and `java.nio.charset.Charset`. We don't need to do that and it does not appear in the fuzzer but it's just there to illustrate that we can exactly copy what the `apk` does if we need to. ![[fuzz_11.png]] Let's also make sure that the app will crash if we provide it with the trigger string (`Quarksl4bfuzzMe!`). ![[fuzz_12.png]] Finally, lets generate coverage for individual runs by attaching `Stalker` to our native function hook. ```js // Coverage global variable let coverageSet = []; Java.perform(function() { function loadLibraryAsync() { return new Promise((resolve, reject) => { try { Java.use("qb.blogfuzz.NativeHelper"); Java.use("qb.blogfuzz.Wrapper"); resolve(); } catch (e) { console.error("[!] Error:", e); reject(e); } }); } // Await the promise to load the library loadLibraryAsync().then(function() { // Get the address of the native function let pfuzzMeArray = Module.findExportByName("libblogfuzz.so", "Java_qb_blogfuzz_NativeHelper_fuzzMeArray"); console.log("[+] Address of fuzzMeArray:", pfuzzMeArray); // Get the ranges of the library let libraryName = "base.apk" // Note that the lib shows up as base.apk (not sure why) let ranges = Process.enumerateRangesSync({ protection: 'r-x', coalesce: true }).filter(range => range.file && range.file.path.includes(libraryName)); // Hook the native function Interceptor.attach(pfuzzMeArray, { onEnter: function(args) { Stalker.follow(this.threadId, { events: { call: false, ret: false, exec: false, block: false, compile: false, }, transform: function(iterator) { var instruction = iterator.next(); while (instruction !== null) { if (instruction.address >= ranges[0].base && instruction.address <= ranges[0].base.add(ranges[0].size)){ iterator.putCallout(function(context) { coverageSet.push(context.pc - ranges[0].base) }) } iterator.keep(); instruction = iterator.next(); } } }); }, onLeave: function(retval) { Stalker.unfollow(this.threadId) Stalker.garbageCollect() Stalker.flush() console.log("[*] Execution coverage set: " + coverageSet.length); coverageSet = []; } }); // Call the Java function Java.perform(function() { let String = Java.use("java.lang.String"); let StandardCharsets = Java.use("java.nio.charset.StandardCharsets"); let javaString = String.$new("AAAAAAAAAAAAAAAA"); // Quarksl4bfuzzMe! // AAAAAAAAAAAAAAAA let buffer = javaString.getBytes.overload("java.nio.charset.Charset").call(javaString, StandardCharsets.US_ASCII.value); let NativeHelper = Java.use("qb.blogfuzz.NativeHelper"); // Define a new implementation for the method NativeHelper["fuzzMeArray"].implementation = function (bArr) { console.log(`NativeHelper.fuzzMeArray is called: bArr=${bArr}`); this["fuzzMeArray"](bArr); }; // Create an instance of NativeHelper let nativeHelperInstance = NativeHelper.$new() // Call the method multiple times for (var i = 0; i < 1; i++) { nativeHelperInstance.fuzzMeArray(buffer); } }); }).catch(function(error) { console.error("[!] Error:", error); }); }); ``` The `fuzzMe` function has an input length check of `>= 0x10`, our fuzzer won't care of course but to test we should make our string the correct length. We can see the output below for `AAAAAAAAAAAAAAAA`. ``` [+] Address of fuzzMeArray: 0x745ee69a10 NativeHelper.fuzzMeArray is called: bArr=65,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65 [*] Stalking thread [*] Execution coverage set: 85 ``` Then, if we replace the first character with the correct value, like so `QAAAAAAAAAAAAAAA`, the coverage goes up. ``` [+] Address of fuzzMeArray: 0x745ee69a10 NativeHelper.fuzzMeArray is called: bArr=81,65,65,65,65,65,65,65,65,65,65,65,65,65,65,65 [*] Stalking thread [*] Execution coverage set: 89 ``` Perfect! Notice also that I am the saving the executed addresses in `coverageSet = [];` (for now they are cleared in `onLeave`) which would allow us later to paint the binary with `Lighthouse`. You can see an example here where I paint a buffer with `3` correct characters `QuaAAAAAAAAAAAAA`. ![[fuzz_13.png]] # Fuzzing Great, now we just have to automate passing inputs to the binary and mutate those inputs. #### Radamsa Let's start by briefly implementing an ugly python wrapper around the `radamsa` binary. Remember, on `arm64` we don't have `pyradamsa`. ```python import subprocess # Define global variables radamsa_path = "/Users/b33f/Tools/radamsa-v0.7/radamsa" # Function which takes bytes and returns mutated bytes def mutate_bytes(input_bytes): # call radamsa with arguments process = subprocess.Popen([radamsa_path, '-m', 'ber'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) process.stdin.write(input_bytes) process.stdin.close() mutated_bytes = process.stdout.read() return mutated_bytes # Test the function for i in range(1000): input_bytes = b"Hello, World!" mutated_bytes = mutate_bytes(input_bytes) ``` It's not great actually, we take quite a performance hit on this but it will have to do for now. ``` b33f@p0wn fuzzcase % time python3 fuzzer.py python3 fuzzer.py 2.60s user 0.91s system 94% cpu 3.703 total ``` Notice that I am selecting the `ber` mutator, this mutator will swap random bytes for different ones so we have `0xff` mutation possibilities `per byte` in the array. #### Fuzzing Controller There isn't much magic going on here really: - We create an initial corpus (one entry will do). - We mutate an entry in the corpus (the corpus is biased to return the entry with the highest coverage). - We send the mutated byte array to the `js` script running in the `app`, on the `phone`. - We record the coverage result. - We add a `case` to the corpus if the coverage is unique. - `while->true` loop. Your initial `case` can be created as shown below. I made a simple object to have some bookkeeping facilities. ```python case_obj = { 'case': b'AAAABBBBCCCCDDDDEEEE', 'has_execute': False, 'coverage': 0 } cases.append(case_obj) ``` `case` bytes are mutated like this: ```python mutated_case = mutate_bytes(case) ``` Finally, you can send a `JSON serializable` object to the `js` script and record the coverage. ```python # Send the mutated case to the script cov = set(script.exports_sync.fuzz(list(mutated_case))) ``` #### Fuzzing Client On the client side there is an interface to receive remote communications and pass them to a helper that triggers the case. ```js rpc.exports = { fuzz: fuzz }; // We get our python case here function fuzz(fuzzcase) { coverageSet = [] dofuzz(fuzzcase) return coverageSet } // We execute our Java harness like so function dofuzz(fuzzCase) { // Convert to a Java byte array var convert = Java.array('byte', fuzzCase) // Call the Java function with the input Java.perform(function() { let NativeHelper = Java.use("qb.blogfuzz.NativeHelper"); NativeHelper["fuzzMeArray"].implementation = function (bArr) { this["fuzzMeArray"](bArr); }; // Create an instance of NativeHelper & execute let nativeHelperInstance = NativeHelper.$new(); return nativeHelperInstance.fuzzMeArray(convert); }); } ``` There is one bit of tradecraft we need to cover aside from this. If we trigger a crash we obviously want information about what happened. In `Frida` you can register your own `exception handler` to take care of that. ```js Process.setExceptionHandler(function(details) { // Build the initial info object // |_ There is more information available than we are using here // |_ Build your own as needed let info = Object.assign(details, { lr: DebugSymbol.fromAddress(details.context.lr), pc: DebugSymbol.fromAddress(details.context.pc) }); info['case'] = fuzzCase //send(JSON.stringify(info)) function formatException(exception) { let formattedMessage = `\n[+] Exception: ${exception.type}\n`; formattedMessage += ` |_ Message: ${exception.message}\n`; formattedMessage += ` |_ Memory Operation: ${exception.memory.operation} at address ${exception.memory.address}\n`; formattedMessage += `[+] Context\n`; formattedMessage += ` |_ Program Counter (PC): ${exception.context.pc}\n`; formattedMessage += ` |_ Stack Pointer (SP): ${exception.context.sp}\n`; formattedMessage += ` |_ Link Register (LR): ${exception.lr.name} at address ${exception.lr.address}\n`; formattedMessage += ` |_ Faulting Module: ${exception.pc.moduleName} function ${exception.pc.name}\n`; formattedMessage += `[+] Backtrace\n`; formattedMessage += ` |-> ${Thread.backtrace(details.context, Backtracer.ACCURATE).map(DebugSymbol.fromAddress).join('\n |-> ')}\n`; // Custom "case" param for fuzzer formattedMessage += `\n[?] Fuzz Case:\n`; exception.case = new Uint8Array(exception.case).buffer; formattedMessage += hexdump(exception.case, { offset: 0, length: exception.case.length, header: true, ansi: false }); formattedMessage += `\n`; return formattedMessage; } console.log(formatException(info)); return false }); ``` We are just printing some informative `human-readable` messages but you can send back the `JSON` itself to do something with it on the `controller` side if you like. #### Results Our `fuzzer` is stupid ok. Since it is randomly changing bytes from `0xff` options, per byte, it can take a while till it hits new coverage. Additionally, it is pretty slow (`don't you dare ask me` 🤣) but keep in mind that we are fuzzing from the running Android `app` using `app` functions as the harness. Also, we take a big performance hit because we run `Radamsa` as a binary on disk each time we need a new input. That doesn't mean it doesn't work, it definitely completes its task. ![[fuzz_14.png]] In this case it found the crashing input after `98k+ executions` but it varies since it's random. I would say we are trending unlucky on this particular run. Notice in the `case list` how it slowly builds the string and that each correct letter increases the coverage count. #### How about being smarter? Clearly, the more you understand about your target the better you can tune your inputs. If we adjust our case generation to test each character sequentially we should find the crashing input much faster. ```python def mutate_bytes(input_bytes): # at position, create 0xff variations and add cases to the corpus for i in range(0xff): case_obj = { 'case': input_bytes[:position] + bytes([i]) + input_bytes[position + 1:], 'has_execute': False, 'coverage': 0 } cases.append(case_obj) ``` Every time there is new coverage we can increment the position and add new cases to the queue. ![[fuzz_15.png]] The results are a lot better obviously, we find the crashing input in less than `2k` execs. # Future work Some standardization around the `controller` and the `client` would be useful to have better generic tooling. Also, the way we used `Radamsa` is really costing us a lot of performance. It would be greatly beneficial to implement a module to load `Radamsa` as a (web) service that the controller can talk to. Realistically though, we most likely want to use a different setup for fuzzing. If you look at the `Quarkslab` [post](https://blog.quarkslab.com/android-greybox-fuzzing-with-afl-frida-mode.html) you will see that there is excellent performance. ``` Target Execution speed ------ --------------- Standard native function ~10k/sec Weakly linked JNI function ~9k/sec Strongly linked JNI function ~5k/sec Strongly linked JNI function (with Java hook) ~3.5k/sec ``` Keep in mind though that the setup is very different from what we are doing. Alternatively, we can use something like [fpicker-aflpp-android](https://github.com/marcinguy/fpicker-aflpp-android).