Since Ruxcon, there has been some interest in my automated unpacker which is based from an x86/win32 emulator I wrote. I haven’t worked on it for some months, but spent a day last week working on it and also over the past couple days I’ve been continuing its development further.
I had a look at a few packers with the hope of getting them to work with my emulator. I noted a few things of interest so I thought I’d write about them now before forgetting. It makes me wish I’d made notes of the anti-debugging tricks and features I had to implement to overcome a particular packer.
pelock is a packer which I still can’t emulate, but it did raise a few notes of interest. It uses an undocumented x86 extension instruction with opcode c1 /6 really being a shift left (shl). Binutils (gdb, objdump etc) couldn’t disassemble this, and nor could libdasm which I’m using for my emulator. ollydbg could disassemble it fine. I added the instruction to libdasm’s opcode table and it works fine now.
pelock also checks the 1st byte of imported function(s) whose addresses are known via the IAT. It checks if its a 0xcc which is an int3 or software breakpoint. My emulator didn’t actually load the libraries into memory up until this point. Only the executable text/data etc sections were loaded in the guest. So I decided the easiest approach would be to emulate the first few bytes of every import, and just substitute a non 0xcc value at that location. This worked.. or rather, the packer ran a little longer before running into problems.
At this point I decided that the best thing to do would be to load directly into the guest all the libraries that are imported. I won’t ever actually execute any of that code, as I intercept when control transfers to a library call (or at least whats in the import table).
I actually have two modes to run the emulator in. In one mode, the emulator runs alone, and in the other, the packer is run in parallel to the emulator, and the emulator can copy from memory or registers what it likes. I was able to hack together library loading for when the packer is being run in parallel. I have not finished the code yet to run the emulator standalone, but I’ll continue that in the next day or so.
Funnily enough, even with the libraries mapped in, the same code in the packer was failing to be emulated. At this point I thought wrongly it was a PE relocation that hadn’t been applied. So I wrote fixup code to apply the relocations. PE relocations are so much easier to handle than ELF and it was fairly straight forward to implement. There is only 1 type of relocation to apply and its done by adding the ImageBase of the binary to an RVA determined by a page address and an offset. ELF has many types of relocations, and everything is relative to sections in the binary, which makes it much harder.
With that done, I still can’t emulate pelock completely, so I’ll have to continue work on it this week. That brought on some other problems in emulating another packer.
I was showing a mismatch between the emulator view and the program tracer view of the packer. It was showing that after returning from an exception handler, the instruction pointer was at the wrong place.
I was sure I was calculating the new eip after the return correctly. I was using it from the CONTEXT structure which is placed on the stack in an exception (its actually really an argument to the exception handler). I am not very good with ollydbg but using the stollystruct plugin, I verified that eip was as my emulator believed it was.
To cut a long story short. When single stepping (in Olly for instance), through an exception handler, the first instruction after returning back from the handler is skipped (olly gets this wrong too). Infact, that missing instruction was a runtime patched jump to the address that was showing up in my tracer.
My tracer works in XP (or at least it did last time I ran it), and what I was doing was setting the debug registers to enable a hardware breakpoint on the eip address from CONTEXT. In XP, it seems you can do this and it works, but in Vista, it doesnt work. I have no idea why I was setting a hardware breakpoint from inside the CONTEXT structure.
Using a software breakpoint at CONTEXT.eip corrected the behaviour in my tracer, and my emulator which had no problem at all with it, was verified by the new tracing algorithm.