1) QEMU translation blocks
I wrote the basic code for function interception in QEMU; its probably overengineered with too many layers of indirection (aka, might be slow). One thing that caught me out for a short while, was an optimisation QEMU does with translation blocks.
QEMU translates native instructions into ‘micro operations’ and builds them up as ‘translation blocks’. When execution occurs, one of the first things that happen is that a lookup is made to find a translation block that has already been created. Translation blocks are done as small units of code, generally finishing at a basic block boundaries. QEMU has a nice optimisation where if a call or jump is performed, the current translation block simply continues at the target address of the call/jmp. This makes the translation blocks much bigger, and results in faster execution (no setup time to call or return from a translation block – and allows more optimisation to occur inside the translation block too).
Initially I was checking the program counter being at an intercepted function at the start of each translation block. But this is incorrect. With the description I just gave, a translation block might cover more than 1 native basic block. I disabled this feature in QEMU for kernel code so my interception would work. There are certainly better ways to perform interception than what I did. I will consider optimisation later, as for now I’m just happy to have it working.
2) Linux kernel calling convention
I then tried intercepting __kmalloc and kfree from the Linux Kernel. It wasn’t working too well, as the arguments I was intercepting seemed incorrect. It wasn’t until I disassembled vmlinux that I realized that the calling convention was using the registers for argument passing. I don’t know when this was introduced into Linux (I guess its been some years since I’ve done any kernel hacking). I don’t even know the gcc option to do it!
3) Heap management in Linux
After I fixed the calling convention, and intercepted __kmalloc and kfree correctly, I built my own representation of the heap. It didn’t work. I was getting reports that kfree() was being called on pointers that hadn’t been allocated by __kmalloc.
First thing, is there are a number of __kmalloc functions. __kmalloc_node_track_caller, __kmalloc_track_caller and a couple others depending on kernel configuration. Intercepting all these calls helped a little, but not much.
After looking at the source code, there is actually a layer above __kmalloc. Its dependant on which heap allocator is being used, slab, slob, or slub. But kmalloc() is actually an inline function that can potentially call kmem_cache_alloc or __get_free_pages. I tried intercepting kmem_cache_alloc and this resolved most of the problems, but I left out __get_free_pages as I would have to contend with __free_pages also, which leads to alloc_pages aswell.
That might seem like a good approach, intercepting everything, but its not. The problem is that precision is lost. kmalloc works with buffers of really any size. By the time __get_free_pages or kmem_cache_alloc is called, the buffers are all rounded up to cache sizes or page sizes. If I intercepted kmem_cache_alloc and friends, it could lead to me not identifying buffer overflows because of that lost precision. They wouldn’t be exploitable overflows in all likeihood, but they would be overflows none the less. These types of overflows are really the ones my tool would be best to detect.
My solution is to make kmalloc() non inline, and export it to the modules. Also exported and non inline is kmalloc_node. I’m building a new kernel as I type, but its probably not going to be finished until tomorrow, so no more coding for the day..
4) System.map doesn’t match /proc/kallsyms
This was somewhat annoying. In my deault kernel build, and also in Fedora 8, the System.map doesn’t match /proc/kallsyms. The bug is talked about at https://bugzilla.redhat.com/show_bug.cgi?id=309751.
I changed the kernel config with the fix described and am hoping that it will be resolved.