I have been working on emulating pespin. Pespin is a public packer for thos who don’t know. I came across ‘stolen bytes’ protection in pespin which seems to make it very hard on my emulator to handle.
Remember that my emulator works by emulating userland x86 instructions and the win32 api. That is, it never actually executes real win32 code contained in dlls like kernel32.dll. It maintains a list of addresses representing exported functions in each dll, and if the program counter changes to point to one of those addresses, the emulator implements that function. Loading the actual dll’s into the memory image is kind of optional. I do it to maintain better emulation, but for the most part the memory is only for show. It never gets executed.
Most AV emulators work in that fashion from what I can gather. Another approach is to use a whole system emulator like bochs or qemu, and let the emulated operating system execute win32 functions themselves. In this case, the emulator doesn’t have seperate code for each api function. It’s more reliable than the other approach, but not really used in commercial AV.
So back to pespin.. Instead of calling a win32 function directly, it copies the first 10 or so bytes from the function onto the heap. At the end of those 10 bytes on the heap, a jump is made back to the original function skipping the bytes that were copied. Calls to that win32 function are made to point to the function on the heap. In some versions of stolen bytes, the 10 or so bytes of code from the original function are destroyed. Also, in some versions, the copied code is obfuscated with junk instructions or instruction reordering.
What this achieves in pespin, is that you its hard to trace calls to win32. Intercepting execution of the original win32 function doesn’t work, because the function being called is the copied version on the heap.
For an AV style emulator, this spells disaster. My win32 emulation is entirely dependant on intercepting calls to specific addresses for each exported function. The short story, is that the win32 function being stolen never gets emulated, and the analysis aborts. I can detect easily enough that something is wrong, because from the jump back to the original win32 function plus 10 or so bytes, that part of memory I know belongs to a loaded dll and I have marked it as a special case of being non executable.
Even whole system emulator based unpackers like Pandora’s Bochs have problems with pe reconstruction of the unpacked binary when stolen bytes are being used. Though the actual emulation itself is handled correctly.
How to solve the problem of stolen bytes in an AV style emulator? The short answer is that I don’t have a good solution to this yet.. Not sure that I will be able to find one either. (edit: while writing this post I think I have come up with some reasonable solutions)
One thing that could be done, is to replace the first few bytes of each win32 api function with a byte sequence that the emulator can detect. This could be a software breakpoint (not an int3 as this is normally checked for), or a transfer instruction to memory which is designated as representing that function. If using a software breakpoint, we need to a special way of marking so that we can associate that breakpoint with the specific function. Maybe just breakpoint followed by the address of the original function.
If we implemented this theme, what happens is stealing those first 10 or so bytes of that win32 function, copies our marker. And when the copied function is called, we can easily detect which function it is. The problem with this approach, and why I’m reluctant to implement this, is that it seems fairly easy to fingerprint the emulator. If for example the set of win32 functions being emulated all have the same basic first few bytes.
Maybe we could make this marker polymorphic. Maybe an approach is to use a technique used in viruses known as surface tracing. Disassemble the function until the first control transfer instruction is made, and replace it with a transfer of our own that can be used to identify the function. It should be possible to track function arguments automatically with some work, even when esp changes.
That might work ok. The biggest problem with implementing it would be integrating it with my program tracer.
Hmm.. another idea which might be better is to catch the jump back into the original win32 function. I can do this reasonably well already. I can narrow it down to a particular function without real worries (the nearest symbol for instance).
Now I imagine what happens when stolen bytes are implemented, is that a disassembler disassembles the original win32 function up until the first control transfer instruction, or maybe just up until the first 10 instructions. It might all be a disaster on my part if it doesnt disassemble, and uses the reloc information instead.
So I know the original function thats stolen, I just need to get the arguments in working order, and I can do a static analysis to adjust esp to get the original arguments. This analysis would be a very simplified (not following branches) version of IDA’s stack tracing.
OK.. I think I will go ahead and implement that. The other idea I had was using dynamic taint analysis to track the stealing of each function, but I think the idea I just talked about will work better, as I didn’t have a great plan for the taint analysis.
A few more words on say if it copies the entire function and uses reloc information to relocate it. I am really stuck here (but maybe the dynamic taint analysis is the way to go). The long approach is to implement the windows system calls, so that in theory the emulator could execute windows api functions directly.
Anyway.. this post turned more into a rambling more than anything else. I guess I will implement a stack tracer now..