I wrote a disassembler a week or two ago. Actually I used libdasm to do the grunt work while I just played with the higher level code.
I want to write some static analaysis tools, so disassembling is the first part of the process. Analysis of control flow soon follows the disassembly. Obfsucating the control flow can cause static analysis tools to fail.
.byte 0xE8 ; 0xE8 is the opcode for a CALL
A: CALL _myfunc
This is a small obfuscation that works on disassemblers that use the Linear Sweep method. It successfuly obfuscates the real call, making static analysis tools ,that are dependant on correctly identifing the control flow and call graph, fail. The Linear Sweep method of disassembly is used most noteably by objdump.
The Recursive Traversal method can successfuly disassemble the previous obfuscation, because it follows the control flow. Therefore, it would cease disassembly after the jump site, and then correctly follow the jump to the target.
It is possible to modify the code again to make it harder to disassemble by recursive traversal.
A: CALL _myfunc
The conditional jump now replaces our earlier unconditional one. A recursive traversal disassembler will see the conditional jump and believe control flow can occur to the jump target, and immediately proceeding the jump site. The code immediately proceeding the jump site is incorrect, so this should be ignored. But how to do it by automatic methods?
The first idea I came up with, was modifying the disassembler to look for static code. In the case above, the disassembler would try to recgonize jz+1.
What happens though if the following occurs.
A: CALL _myfunc
In this case, it’s no longer a jz +1, but a jz+4. This can be changed indefinately. The NOPS’ could be replaced with junk instructions, or any other polymorphic and metamorphic code.
There will however be a conflict using recursive traversal when it reaches A via the jump target, and reaches A-1 via disassembling in a straight line. There will be overlapping disassemblies, and it is unclear which disassembly is correct.
The root problem is that the recursive traversal identifies code, when infact it should not. This is what is termed a false positive in disassembly.
Perhaps disassembly should keep both sequences of disassembly, and consider them unique parts of the program. Later on, dead code elimination can be performed if there is enough data flow analysis.
Or should the data flow analysis be done during the recursive traversal? This might be enable the disassembler to recognize the conditional jump as really being unconditional.
Another method of obfuscation is to replace the conditional jump with an indirect jump. This works, because data flow analysis is required to see where the jump target occurs. Static disassembly has a hard time dealing with this.
PUSH _myfunc; // setup return address. aka emulate call
PUSH _myfunc; // now to do the jump
Again, there are a number of methods of doing this. We could scan for PUSH/RET pairs and identify them as jump sites. But what if an obfuscator changes that code.
There are many such morphisms that can occur. The solution to disassemble this, could be to perform data analysis as we perform disassembly. That is, for each basic block found, eliminate all equivalent NOP instructions and then check for jumps.
This idea still wouldn’t be perfect I imagine, but it could improve the situation considerably.