Disassembling Obfuscated Assembly

I wrote a disassembler a week or two ago.  Actually I used libdasm to do the grunt work while I just played with the higher level code.

I want to write some static analaysis tools, so disassembling is the first part of the process. Analysis of control flow soon follows the disassembly. Obfsucating the control flow can cause static analysis tools to fail.


  JMP A
  .byte 0xE8 ; 0xE8 is the opcode for a CALL
A: CALL _myfunc

This is a small obfuscation that works on disassemblers that use the Linear Sweep method. It successfuly obfuscates the real call, making static analysis tools ,that are dependant on correctly identifing the control flow and call graph, fail. The Linear Sweep method of disassembly is used most noteably by objdump.

The Recursive Traversal method can successfuly disassemble the previous obfuscation, because it follows the control flow.  Therefore, it would cease disassembly after the jump site, and then correctly follow the jump to the target.

It is possible to modify the code again to make it harder to disassemble by recursive traversal.


  MOVL $0,%eax
  JZ A
.byte 0xE8
A: CALL _myfunc

The conditional jump now replaces our earlier unconditional one. A recursive traversal disassembler will see the conditional jump and believe control flow can occur to the jump target, and immediately proceeding the jump site. The code immediately proceeding the jump site is incorrect, so this should be ignored. But how to do it by automatic methods?

The first idea I came up with, was modifying the disassembler to look for static code. In the case above, the disassembler would try to recgonize jz+1.

What happens though if the following occurs.


  MOV $0,%eax
  JZ A
  NOP
  NOP
  NOP
  .byte 0xe8
A: CALL _myfunc

In this case, it’s no longer a jz +1, but a jz+4. This can be changed indefinately. The NOPS’ could be  replaced with junk instructions, or any other polymorphic and metamorphic code.

There will however be a conflict using recursive traversal when it reaches A via the jump target, and reaches A-1 via disassembling in a straight line. There will be overlapping disassemblies, and it is unclear which disassembly is correct.

The root problem is that the recursive traversal identifies code, when infact it should not. This is what is termed a false positive in disassembly.

Perhaps disassembly should keep both sequences of disassembly, and consider them unique parts of the program. Later on, dead code elimination can be performed if there is enough data flow analysis.

Or should the data flow analysis be done during the recursive traversal?  This might be enable the disassembler to recognize the conditional jump as really being unconditional.

Another method of obfuscation is to replace the conditional jump with an indirect jump.  This works, because data flow analysis is required to see where the jump target occurs.  Static disassembly has a hard time dealing with this.


PUSH _myfunc; // setup return address. aka emulate call
PUSH _myfunc; // now to do the jump
RET

Again, there are a number of methods of doing this. We could scan for PUSH/RET pairs and identify them as jump sites. But what if an obfuscator changes that code.


PUSH _myfunc;
NOP
MOV %eax,%eax
PUSH _myfunc
NOP
mov %ebx,%ebx
RET

There are many such morphisms that can occur. The solution to disassemble this, could be to perform data analysis as we perform disassembly. That is, for each basic block found, eliminate all equivalent NOP instructions and then check for jumps.

This idea still wouldn’t be perfect I imagine, but it could improve the situation considerably.

4 responses to “Disassembling Obfuscated Assembly

  1. Have you tried it under IDA? IDA uses Recursive Traversal so with a bit of a plugin, this technique can be defeated. Will have to look a bit more into it after I’m done relocating

  2. Pingback: A party trick against dead listing « Cyberpunk as a commodity

  3. There’s definitely a serious amount of static analysis difficulty around these cases. I first noticed something similar when reversing generated shellcode’s looking for bugs…

    One IDA bug/UI de-synchronization I recently came across was with respect to FLIRT. FLIRT cap’s it’s fingerprint at 32bytes, to support minor variations and whatever other copyright infringement problems full copies would cause.
    So the problem became, a malware/trojan using it’s own C runtime lib, (tinylib purposes) they had used identical prologues and apparently padded their wanna-be-libc-functions with bytes to match the CRC16 (used to compensate for only handling the initial 32bytes).
    This lead to the case where many of the MAL sequences were collapsed (“press + to expand”) or what have you. Obviously, non-optimal when trying to analyze some code, being fooled into assuming it was simply some random static lib.
    It’d be nice of they had more than 1 technology, something like a n-gram derived from basic block structure.

    Of course, IDA, is not your friend with MAL wasrez, even though it’s great and all… it would be nice if there were more integration with statistical tools, updated visualizations for human guided pattern recognition… etc..😉
    Anyhow it’s great and all but will be nice to see what Silvio’s got cooking. peace.

  4. I know it’s old post, but take a look at my asm obfuscator:

    http://www.pelock.com/products/obfuscator
    http://www.pelock.com/download.php?f=obfuscator_example.zip

    activation code 820D-B29C-9A6D-9B28 so you can play around with it at:

    http://www.pelock.com/obfuscator/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s