Monthly Archives: May 2009

Removing semantic NOP’s from Malware

A common obfuscation technique used by malware is by randomly inserting a sequence of instructions that have no other effect on the functionality of the program.  This technique is additionally used in polymorphic and metamorphic malware.

There are a couple of interesting papers that try to address this by identifying some classes of “semantic nops” in the malware and removing them.  The basic model is to identify a particular sequence or set of instructions to examine, typically generated by bruteforce of all possible combinations in each basicblock, and then ask an oracle if its a semantic nop.

There are two papers which are probably of interest.  Malware Normalization http://www.cs.wisc.edu/wisa/papers/tr1539/tr1539.pdf, is an in interesting paper and has several techniques to canonicalize a binary so that it can be more effective when used scanning for in antivirus.  Another paper of interest is http://www.eecs.berkeley.edu/~sseshia/pubdir/oakland05.pdf which is a little more formal (and thus for me, harder to understand), on Semantics-Aware Malware Detection.  This paper performs static analysis on the malware and tries to match it to a template signature which describes particular behaviours of the malware such as its memory access.

The semantic nop oracle that can tell you if a sequence of code is a nop or not, is an SMT decision procedure (http://en.wikipedia.org/wiki/Satisfiability_Modulo_Theories).  SMT is an extension of SAT.  You give it a list of constraints, such as equalities and arithmetic, and then you can query the decision procedure to see if certain conditions can possibly be true or false.  Counter examples can be given in return to prove the results of the decision procedure.

To determine if a sequence of instructions is a semantic NOP, you ask the decision procedure if the set of registers that are modified by the instructions, result in the same values as what they were before hand.  If they are the same, then it’s clearly a NOP.  You also need to check that all memory written to is the same before and after as well.  Care should be taken with the status flags, and the papers mentioned don’t talk in depth about this. Perhaps checking for a lack of liveness in the flags or that the flags are the preserved at the end of a potential nop sequence. All these types of checks are only applied within basicblocks.  Detecting semantic nops as code that branches seems a much harder problem.

This appears to fine in most cases, but what those papers I mentioned earlier fail to describe is the case of exceptions being generated from memory access.  Consider the following code, which for all purposes appears to be a semantic NOP.

mov temp_mem, %eax; # potential exception
inc %eax
dec %eax
mov %eax,temp_mem

Note: status flag are not live at the end of the above code sequence making it eligable as a nop, which means there is a subsequent operation that overwrites the flags before using them.

That code indeed is a semantic NOP, _except_ when an exception occurs from accessing temp_mem.  If an exception occurs, then eax is clearly different from its starting condition when control is transferred to the exception handler.  This opens up a can of worms I think.

If we only look for semantic nops between memory accesses, we will undoubtedly miss a large number of cases that really are nops.  But if we ignore the potential for exceptions, then removing those misidentified nops will result in an unsound transformation.

Another problem was suggested to me, in terms of exceptions resulting from hardware break points. Removing a sequence of code, or applying a transformation to it, may not be sound.

PS.  I implemented a basic and incomplete version of dynamic taint analysis, and am using it to track the return value of GetProcAddress, so I can reconstruct the import table after auto unpacking.  It works fairly nicely.  And I implemented most of the code to transform my IR into STP constraints (STP is an opensource SMT decision procedure).

Advertisements

Interval Analysis and Register Tracking (Taint Analysis)

It’s been a busy day.  http://www.phrack.org/issues.html?issue=64&id=8 describes several methods of finding bugs using static analysis.  It talks at one point about path insensitive interval analysis.  Interval analysis tries to discover the potential range or interval(s) a particular variable might have.  Path insensitive means that the interval it generates for a particular variable is quite conservative and contains values that would otherwise not be included if specific execution paths are taken into account.

Register Tracking is what Zynamics calls what appears to be taint analysis. http://www.zynamics.com/downloads/csw09-slides.pdf talks about it.  It starts by tracking a particular register, and all other registers that are influenced by it.  I think this would be the basis for a (forward) program slice, but my knowledge of program slicing is pretty limited.  Slicing is all about finding instructions that are dependant or resultant of a particular source instruction.

The Zynamics slides above also talks about MonoREIL, which implements a dataflow analysis framework.  The basis for most dataflow analysis is founded around such formal constructs as monotonic functions and partially ordered lattices.  The original dataflow analysis model, is that each particular node has a set of dataflow equations, and the analysis tries to arrive at a solution for the equations.  To arrive at a solution, there are specific restrictions on the types of equations that can be analysed.  They need to be monotonic and they should be finite (excuse my simple and probably incorrect explanations).  The solution is derived typically through an iterative fixed-point algorithm.  The fixed point part is basically saying that after a particular number of iterations of analysis and applying the dataflow equations, the result stops changing, and this particular fixed point is the solution you want.  The monotonicity and the finite ceiling means that a fixed point solution always occurs.

I’m not the strongest in the theory, but none the less, as I posted sometime last week or so, I implemented a dataflow analysis framework on top of my IR (intermediate representation).  I used this framework to implement classic dataflow analysis such as reaching definitions and live variable analysis.  Today, I implemented interval analysis and register tracking.  I’m not convinced the interval analysis is working 100% so I’ll have to spend some more time looking over the results.  Interval analysis is not well suited to many of my IR instructions such as bitwise arithmetic (logical, shifts and rotates), but it is still worth applying in any case.

Zynamics also describes what they call “Range Tracking” to find negatively indexed arrays.  This is something I haven’t grasped yet but I’ll certainly go over it again and try an implementation in the future.

A rough implementation of the IR flowgraphs in the auto unpacker

Title says most of it.  I have now integrated into my emulator the IR component of the static analysis I will require to perform the control flow classification in my Masters research.  To help my debug, I generated some graphviz output of the callgraph and control flow graphs.  The following is a control flow graph from one of the procedures in the callgraph after auto unpacking to the binary hostname.exe which was packed by UPX http://silvio.cesare.googlepages.com/cfg.png .

The IR implementation is still somewhat buggy.  It doesn’t generate all that is required for the status flags handling in x86, and I also have quickly noticed that I have some inconsistency in how i use the StoreMem instruction.  But none the less, it should give an idea of what I’m doing.

I generate the callgraph and control flow graph based on the IR.  This differs to some implementations (like ERESI I believe), which build these graphs from the native assembly.  There is some redundancy between the disassembler and building the graphs from the IR, but this system allows me to use linear sweep disassembly when I need too.  I currently use linear sweep disassembly to disassemble any areas that are not covered by recursive traversal.  This technique is referred to as speculative disassembly.