x86 hardware debug emulation, VirtualProtect return value?

Merry xmas everyone.  After a long day of family festivities I had a spare few hours so I worked on my emulator.

I tried unpacking telock.  The special thing about telock is that it uses hardware execution breakpoints.

I had to modify my program tracer to not use any hardware breakpoints at all.  There is one slight cavaet in that i use an int3 on NtContinue that exists while the exception handler is executing, which could be detected.   I might change the int3 to a clc to avoid simple checks, but for now nothing I am trying to emulate does these kind of checks.

Now I could trace telock, I went about implementing the hardware debug emulation.  It took a few hours but I was able to implement most of the behaviour.  I don’t set dr6 (the status) in my emulation so thats something I should do in the future, but its not necessary for now.

Which reminds me.  After implementing DLL loading from the other day, I was able to unpack rlpack without futher modification.  I finally finished today the code to do the library loading completely within the emulator (without the program tracer run in parallel).

I fixed a number of other bugs in the emulator, including some misbehaving x86 instructions, and a number of win32 functions.

I came across what seems to be a problem in the msdn documentation with VirtualProtect in one instance returning zero, which is meant to indicate failure.  But from all accounts including GetLastError showing the call a success.  There is some discussion on the wine mailing list from 2001 saying that in win98 and earlier VirtualProtect returns the original protection flags on the pages that were set.  I couldn’t find a consistant explanation so I decided to simply follow the msdn documentation ignoring what actually happens in win32.  Hopefully nothing bad happens..

I have another thing I have to implement.. import/export forwarding.. During program tracing, a call to GetProcAddress returned a RtlExitUserThread from ntdll.dll, when infact ExitThread from kernel32.dll had been requested.  This apparently is a feature of the PE format, which I don’t implement.  That’s what is currently the problem with telock, though I know that it also uses win32 file functions (CreateFileA iirc), which I don’t have implemented yet.   That is one of the next things I’m going to implement – a virtual file system for the emulator.  I had done the code for Linux emulation and attempted to merge it with win32 emulation, but its kind of broken now.

Oh.. I can also unpack expressor now.  And also packman – dunno if I mentioned that earlier.

Oh.. one last thing that needs to be implemented.  It’s hard to know when to stop the unpacking process. breaking on execution on priorly written to memory locations is the basic algorithm, but sometimes there are multiple layers.  I am thinking of implementing a check of entropy to guess if memory is still packed to decide if the unpacking process should be reset and continued.

5 responses to “x86 hardware debug emulation, VirtualProtect return value?

  1. >breaking on execution on priorly written to memory locations is the basic algorithm, but sometimes there are multiple layers

    if you execute a priorly written to memory address, you know you’ve reached the first layer. Now if you implement that recursively (recording the memory addresses written by the first layer and breaking on them), you should be able to reach the 2nd and next layers.

    Is there any problem with this approach ?

  2. Hi Dan. Merry XMas in France!

    There is no problem in recursively applying the algorithm. After a layer has been unpacked, I clear the store of addresses that have been written too and restart unpacking. The problem is knowing when there are no more layers and to stop recursively applying the algorithm. I was hoping that multiple layers were unusual in packers, but it seems quite a few use it.

    An approach I talked about in my talk at Ruxcon was to use a timeout while unpacking each layer. If the timer expires during unpacking then stop unpacking the current layer. But I think this is not very good as the timeout must be singificantly large to be effective which makes the unpacker take too long, especially if your aiming for realtime unpacking. Perhaps I’m not giving the timeout solution enough thought, as I think some AV uses that approach successfuly.

    A simple heuristic I thought of earlier today to know if unpacking should stop after a break on execute (of priorly written to memory), is checking the memory image for not at least one large block of sequential written too memory. The original executable has I guess two large chunks of memory for text and data. Checking for a sequential block of 4k or so might avoid unpackers that only write a small stub and pass execution there.

    I’ll try to write the entropy code today and see if it works in knowing if unpacking should not be restarted.

    Incidentally, I finished the handling of forwarded exports. I have two things that take priority now – a vfs for the emulator, and the entropy calculator code.

  3. Actually, I don’t really know how common multiple layers of packing are. Aspack does it in my sample list of packers, but I haven’t tested against any real list of malware.

  4. assuming the packer will ever give you a chance to see the whole binary/memory decrypted/descrambled

  5. Yeah.. this is one of the problems with these types of unpackers. Apart from on the fly decryption, the biggest problem is virtualization.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s