Monthly Archives: December 2008

Dynamic Taint Analysis of packers during packing

I was trying to come up with ideas to automatically generate a static unpacker given a random packer executable.  I didn’t arrive at anything concrete but did have a couple ideas that could aid the analysis.

Take a packer (the packer), and a binary (the binary) to be packed.  The idea is to do dynamic taint analysis of the binary as the packer is run.   The analysis would need to follow file reads and writes.  What you would end up with a packed executable with well data inside it that is tainted, being the ‘data’ (what the unpacking code unpacks), and the rest of the packed executable being the code (the static code being put in by the packer).

This would be useful when doing manual analysis I think, and I guess is part of a step involved in making an automatic static unpacker.  That is, it can automatically identify what is data and what is code, except for the fact that data/code separation is specific to that single packed binary… but

How about doing dynamic taint analysis on PE header members of the binary.  Say the entry point member in the PE headers.  By following where that ends up you could determine references to the original entry point in the resulting packed executable.  By doing a bindiff and structurally analysing the ‘code’ (from above), you could find equivalent locations in other executables that have been packed by the same packer.

Maybe by tracking other members such as members indicating the size of sections, you could arrive at ways to automatically determine the length of the packed executable data.

I think this is something worth implementing, but it seems to be a fair bit of work from where I am now.. The easiest way to do this would be to modify something like Pandora’s Bochs (even though it uses python. arggh), which can trace individual windows processes (in our case, the packer that we are running), then perform analysis on each instruction traced to do the dynamic taint analysis.

I don’t see my emulator being as useful as something like Bochs, since my emulator would have extreme trouble emulating a regular standalone program in its entirety.

I hope when I start working again soon, I will have time to implement some of these things.


Some emulator fixes

First up, I got winupack to unpack successfully which required me to implement and fix some x86 emulated instructions.  I also implemented tracking/emulation of export forwarding, though I think there are still some bugs in it.

A few other things got fixed up, including writing to the process image during tracing of a PEB without the BeingDebugged flag set.  This doesn’t seem to affect anything badly during tracing, but can evade the debugger checks.  peloc was doing that check.

pelock also uses the sidt instruction, presumably to check if its running inside a VM.  I implemented that instruction in the emulator along with sgdt and sldt.  This makes me think I should use QEMU and not VMWare for malware testing.  pelock does some other anti-debugging checks including checking for the existance of some drivers using CreateFileA.

pelock fails early when emulating in standalone mode, but runs longer when being traced.  I checked the instruction trace, and it seems it uses the status flags after a few library calls (LoadLibraryA and MessageBoxA iirc).  In the traced version I copy the flags from the traced program back into the emulator, as I consider them undefined after a library call (I try to keep the register state the same in the emulator and the tracer so I can track when there are real differences and so easily detect bugs).  I’ll try to track down this problem a bit later, as I think the next thing I should do is implement file system emulation, and hopefully then be able to unpack telock and pespin.

x86 hardware debug emulation, VirtualProtect return value?

Merry xmas everyone.  After a long day of family festivities I had a spare few hours so I worked on my emulator.

I tried unpacking telock.  The special thing about telock is that it uses hardware execution breakpoints.

I had to modify my program tracer to not use any hardware breakpoints at all.  There is one slight cavaet in that i use an int3 on NtContinue that exists while the exception handler is executing, which could be detected.   I might change the int3 to a clc to avoid simple checks, but for now nothing I am trying to emulate does these kind of checks.

Now I could trace telock, I went about implementing the hardware debug emulation.  It took a few hours but I was able to implement most of the behaviour.  I don’t set dr6 (the status) in my emulation so thats something I should do in the future, but its not necessary for now.

Which reminds me.  After implementing DLL loading from the other day, I was able to unpack rlpack without futher modification.  I finally finished today the code to do the library loading completely within the emulator (without the program tracer run in parallel).

I fixed a number of other bugs in the emulator, including some misbehaving x86 instructions, and a number of win32 functions.

I came across what seems to be a problem in the msdn documentation with VirtualProtect in one instance returning zero, which is meant to indicate failure.  But from all accounts including GetLastError showing the call a success.  There is some discussion on the wine mailing list from 2001 saying that in win98 and earlier VirtualProtect returns the original protection flags on the pages that were set.  I couldn’t find a consistant explanation so I decided to simply follow the msdn documentation ignoring what actually happens in win32.  Hopefully nothing bad happens..

I have another thing I have to implement.. import/export forwarding.. During program tracing, a call to GetProcAddress returned a RtlExitUserThread from ntdll.dll, when infact ExitThread from kernel32.dll had been requested.  This apparently is a feature of the PE format, which I don’t implement.  That’s what is currently the problem with telock, though I know that it also uses win32 file functions (CreateFileA iirc), which I don’t have implemented yet.   That is one of the next things I’m going to implement – a virtual file system for the emulator.  I had done the code for Linux emulation and attempted to merge it with win32 emulation, but its kind of broken now.

Oh.. I can also unpack expressor now.  And also packman – dunno if I mentioned that earlier.

Oh.. one last thing that needs to be implemented.  It’s hard to know when to stop the unpacking process. breaking on execution on priorly written to memory locations is the basic algorithm, but sometimes there are multiple layers.  I am thinking of implementing a check of entropy to guess if memory is still packed to decide if the unpacking process should be reset and continued.

Working again on my auto unpacker emulator

Since Ruxcon, there has been some interest in my automated unpacker which is based from an x86/win32 emulator I wrote.   I haven’t worked on it for some months, but spent a day last week working on it and also over the past couple days I’ve been continuing its development further.

I had a look at a few packers with the hope of getting them to work with my emulator.  I noted a few things of interest so I thought I’d write about them now before forgetting.  It makes me wish I’d made notes of the anti-debugging tricks and features I had to implement to overcome a particular packer.

pelock is a packer which I still can’t emulate, but it did raise a few notes of interest.   It uses an undocumented x86 extension instruction with opcode c1 /6 really being a shift left (shl).  Binutils (gdb, objdump etc) couldn’t disassemble this, and nor could libdasm which I’m using for my emulator. ollydbg could disassemble it fine.  I added the instruction to libdasm’s opcode table and it works fine now.

pelock also checks the 1st byte of imported function(s) whose addresses are known via the IAT.  It checks if its a 0xcc which is an int3 or software breakpoint.  My emulator didn’t actually load the libraries into memory up until this point.  Only the executable text/data etc sections were loaded in the guest.   So I decided the easiest approach would be to emulate the first few bytes of every import, and just substitute a non 0xcc value at that location.  This worked.. or rather, the packer ran a little longer before running into problems.

At this point I decided that the best thing to do would be to load directly into the guest all the libraries that are imported.  I won’t ever actually execute any of that code, as I intercept when control transfers to a library call (or at least whats in the import table).

I actually have two modes to run the emulator in.  In one mode, the emulator runs alone, and in the other, the packer is run in parallel to the emulator, and the emulator can copy from memory or registers what it likes.   I was able to hack together library loading for when the packer is being run in parallel.  I have not finished the code yet to run the emulator standalone, but I’ll continue that in the next day or so.

Funnily enough, even with the libraries mapped in, the same code in the packer was failing to be emulated.  At this point I thought wrongly it was a PE relocation that hadn’t been applied.  So I wrote fixup code to apply the relocations.  PE relocations are so much easier to handle than ELF and it was fairly straight forward to implement.  There is only 1 type of relocation to apply and its done by adding the ImageBase of the binary to an RVA determined by a page address and an offset.  ELF has many types of relocations, and everything is relative to sections in the binary, which makes it much harder.

With that done, I still can’t emulate pelock completely, so I’ll have to continue work on it this week.   That brought on some other problems in emulating another packer.

I was showing a mismatch between the emulator view and the program tracer view of the packer.  It was showing that after returning from an exception handler, the instruction pointer was at the wrong place.

I was sure I was calculating the new eip after the return correctly.  I was using it from the CONTEXT structure which is placed on the stack in an exception (its actually really an argument to the exception handler).  I am not very good with ollydbg but using the stollystruct plugin, I verified that eip was as my emulator believed it was.

To cut a long story short.  When single stepping (in Olly for instance), through an exception handler, the first instruction after returning back from the handler is skipped (olly gets this wrong too).  Infact, that missing instruction was a runtime patched jump to the address that was showing up in my tracer.

My tracer works in XP (or at least it did last time I ran it), and what I was doing was setting the debug registers to enable a hardware breakpoint on the eip address from CONTEXT.  In XP, it seems you can do this and it works, but in Vista, it doesnt work.  I have no idea why I was setting a hardware breakpoint from inside the CONTEXT structure.

Using a software breakpoint at CONTEXT.eip corrected the behaviour in my tracer, and my emulator which had no problem at all with it, was verified by the new tracing algorithm.

Ruxcon 2008, MemCheck 0.01 release and presentation slides

I’m back from Ruxcon2008.

First up, a link to where you can find my MemCheck tool for finding out of bounds heap access in the Linux Kernel and the slides of my ruxcon presentation Security Applications for Emulation.   If you have trouble gett6ing MemCheck to build, which is highly likely since the dependancies are pretty hairy then send me an email.  I will a better package with precompiled binaries and maybe even a qemu image, and write up better docs in the future.

Now for my small roundup of Ruxcon.

I arrived in Sydney on the Friday, so with time to kill I went to UTS in the evening where it was being setup.  I am officially staff, but really I don’t do much, so basically i just loitered and practiced my presentation for the most part.

Lots of people lining up to enter in the morning, even before doors officially were opened.  It was also raining a little which was not good for those waiting.

I didn’t see all the presentations over the weekend as I was manning the staffroom for a couple hours each day.  My favourite talk was probably Netscreen of the Dead: Developing a Trojaned Firmware for Juniper Netscreen Appliances – Graeme Neilson.

I also liked other talks including Browser Rider by Ben mosse as I am pretty new to the web side of things so seeing a xss tunnel was quite suprising for me.

Neil and Steph Archibalds talk on Intelligent Web Fuzzing I thought was interesting, and was impressed that they able to instrumented quite nicely applications like php and mysql.

Daniel Hodson’s talk on Uninitialized Variables was nicely presented and well structured even though a Security Guard interrupted his talk.  For about 5 minutes I thought it was some crazy guy who just wanted to get on stage, but it turned out he had not been informed of when the lecture theatre was closing. 

kuza55’s and Stefano Di Paola’s talk mostly went over my head as it was on Attacking Rich Internet Applications, which was quite indepth on web topics.

I saw Nishad Herath’s talk on  Now you see it, now you don’t – Obfuscation ’08 style.. I guess i agree with his point of view that its essentially a back and forth between attackers and defenders, and I also agree that we need better analysis tools (and as he said, compilers have advanced at really significant rate, while binary analysis tools have had little progress).

I only saw half of Ben Hawke’s talk on Attacking the Vista Heap, as I was rostered for staff duties.  He was impressing on the crowd the important of heap vulnerabilities and giving a background on the technolog.  I think it was becomming more technical as the talk went on, but alas, I had to leave..

Paul Ducklins talk on Javascript is Harder Than you think was very entertaining.  He is a very good speaker.

GPU Powered Malware by Daniel Reynaud covered an interesting topic, and is probably something that we should keep on eye on in the future.

The police obviously see Ruxcon as a more casual affair these days as the AFP representative was wearing jeans during the panel discussion.  It was appropritate however for the occasion.

My presentation went well I think.  If you think so too, send me an email! 😉