In some binaries, basic blocks may be connected only by jumps. These basic blocks may also be non contiguous in the file, ie scattered throught the binary.
In cases like this, if your looking at the disassembly, you need to constantly jump throught the image to have the logical order of the control flow. When the control flow is graphed, it appears logically linear, but when reading the code, it sometimes help to go for the older text dump of the disassembly.
The way I implemented this, was to construct a control flow graph of each procedure. Then merge basic blocks with their predecessor iff only one predecessor exists and that this predecessor only has one successor (the original basic block we are looking at merging). To dump the disassembly, a recursive approach for each basic block is taken. Dumping the assembly representing the current basic block, the next linear basic block (applied recursively), and the branched basic block (if it exists. also applied recursively).
I made these improvements to my disassembler, so it prints the disassembly in logical order, following the jumps. In at least one piece of malware out of a sample of about ten, this deobfuscation proved successful, and over 800 basic blocks were merged in an object with around 14000 instructions. The malware samples I’ve been using have come from http://www.offensivecomputing.net/
I’m in the process of looking at more malware samples to see how common this type of obfuscation is. If anyone can, names of malware samples would be great for me to look at and run my disassembler.
Probably more useful that the deobfuscator I’ve described is an automatic unpacker. Most of the malware is packed, and infact, the disassembly is non trivial since indirect jumps and calls seem common. This might be something that I will work on in the future.
In at least one other malware sample I have, dead code is common. That is, registers are assigned, modified, then reassigned new values (without making any furthur use of the original references) making the older references dead. I would like to automate this, and liveness analysis should be able to identifify these cases, however, I have yet to implement dataflow analysis in my disassembler..