Merging basic blocks to deobfuscate non continugous control flow

In some binaries, basic blocks may be connected only by jumps.  These basic blocks may also be non contiguous in the file, ie scattered throught the binary.

In cases like this, if your looking at the disassembly, you need to constantly jump throught the image to have the logical order of the control flow.  When the control flow is graphed, it appears logically linear, but when reading the code, it sometimes help to go for the older text dump of the disassembly.

The way I implemented this, was to construct a control flow graph of each procedure.  Then merge basic blocks with their predecessor iff only one predecessor exists and that this predecessor only has one successor (the original basic block we are looking at merging).  To dump the disassembly, a recursive approach for each basic block is taken.  Dumping the assembly representing the current basic block, the next linear basic block (applied recursively), and the branched basic block (if it exists.  also applied recursively).

I made these improvements to my disassembler, so it prints the disassembly in logical order, following the jumps.  In at least one piece of malware out of a sample of about ten, this deobfuscation proved successful, and over 800 basic blocks were merged in an object with around 14000 instructions.  The malware samples I’ve been using have come from http://www.offensivecomputing.net/

I’m in the process of looking at more malware samples to see how common this type of obfuscation is.  If anyone can, names of malware samples would be great for me to look at and run my disassembler.

Probably more useful that the deobfuscator I’ve described is an automatic unpacker.  Most of the malware is packed, and infact, the disassembly is non trivial since indirect jumps and calls seem common.  This might be something that I will work on in the future.

In at least one other malware sample I have, dead code is common.  That is, registers are assigned, modified, then reassigned new values (without making any furthur use of the original references) making the older references dead.  I would like to automate this, and liveness analysis should be able to identifify these cases, however, I have yet to implement dataflow analysis in my disassembler..

2 responses to “Merging basic blocks to deobfuscate non continugous control flow

  1. Nice and simple idea. I was wondering if you will ever make your disassembler available.

    Cheers!

  2. silviocesare

    I will probably release my disassembler once I have a GUI up and running. I’m looking at Gtk/Glade.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s