Removing semantic NOP’s from Malware

A common obfuscation technique used by malware is by randomly inserting a sequence of instructions that have no other effect on the functionality of the program.  This technique is additionally used in polymorphic and metamorphic malware.

There are a couple of interesting papers that try to address this by identifying some classes of “semantic nops” in the malware and removing them.  The basic model is to identify a particular sequence or set of instructions to examine, typically generated by bruteforce of all possible combinations in each basicblock, and then ask an oracle if its a semantic nop.

There are two papers which are probably of interest.  Malware Normalization http://www.cs.wisc.edu/wisa/papers/tr1539/tr1539.pdf, is an in interesting paper and has several techniques to canonicalize a binary so that it can be more effective when used scanning for in antivirus.  Another paper of interest is http://www.eecs.berkeley.edu/~sseshia/pubdir/oakland05.pdf which is a little more formal (and thus for me, harder to understand), on Semantics-Aware Malware Detection.  This paper performs static analysis on the malware and tries to match it to a template signature which describes particular behaviours of the malware such as its memory access.

The semantic nop oracle that can tell you if a sequence of code is a nop or not, is an SMT decision procedure (http://en.wikipedia.org/wiki/Satisfiability_Modulo_Theories).  SMT is an extension of SAT.  You give it a list of constraints, such as equalities and arithmetic, and then you can query the decision procedure to see if certain conditions can possibly be true or false.  Counter examples can be given in return to prove the results of the decision procedure.

To determine if a sequence of instructions is a semantic NOP, you ask the decision procedure if the set of registers that are modified by the instructions, result in the same values as what they were before hand.  If they are the same, then it’s clearly a NOP.  You also need to check that all memory written to is the same before and after as well.  Care should be taken with the status flags, and the papers mentioned don’t talk in depth about this. Perhaps checking for a lack of liveness in the flags or that the flags are the preserved at the end of a potential nop sequence. All these types of checks are only applied within basicblocks.  Detecting semantic nops as code that branches seems a much harder problem.

This appears to fine in most cases, but what those papers I mentioned earlier fail to describe is the case of exceptions being generated from memory access.  Consider the following code, which for all purposes appears to be a semantic NOP.

mov temp_mem, %eax; # potential exception
inc %eax
dec %eax
mov %eax,temp_mem

Note: status flag are not live at the end of the above code sequence making it eligable as a nop, which means there is a subsequent operation that overwrites the flags before using them.

That code indeed is a semantic NOP, _except_ when an exception occurs from accessing temp_mem.  If an exception occurs, then eax is clearly different from its starting condition when control is transferred to the exception handler.  This opens up a can of worms I think.

If we only look for semantic nops between memory accesses, we will undoubtedly miss a large number of cases that really are nops.  But if we ignore the potential for exceptions, then removing those misidentified nops will result in an unsound transformation.

Another problem was suggested to me, in terms of exceptions resulting from hardware break points. Removing a sequence of code, or applying a transformation to it, may not be sound.

PS.  I implemented a basic and incomplete version of dynamic taint analysis, and am using it to track the return value of GetProcAddress, so I can reconstruct the import table after auto unpacking.  It works fairly nicely.  And I implemented most of the code to transform my IR into STP constraints (STP is an opensource SMT decision procedure).

8 responses to “Removing semantic NOP’s from Malware

  1. Just ran across this from phrack, hadn’t seen you since you popped up in nologin some months ago. You are truly gifted, which always causes problems (no gift is free). It’s amazing however, a few months later and its like you never left (quality of work wise).

    On a side note, if you have any idea how to get ahold of mandragore, he disappeared a year or two ago also.

  2. hey it’s nice to read about it, so i can make my own code more “aware” of removing nop sequences🙂

  3. Why did you stop posting ?:/

  4. I have not stopped posting!🙂

    For the past couple months I have been working on a prototype for a flowgraph based malware classification tool using a novel graph matching algorithm. I have finished the prototype and am currently working on writing a paper for an Australian academic conference. This has taken up most of my time. I am unable to publish the details of the prototype or algorithm until the paper is public, hopefully later this year.

    I hope that explains why I’ve not posted during this time.

  5. Neat.

    Send us a link once the paper becomes public.

    • I will definately put the paper online after the conference – presuming my submission is accepted. The conference is in January 2010.

  6. out of interest, what is this graph matching algorithm you speak of?

    • Hi. I am unable to comment on the graph matching algorithm I’m using, as it is central to a paper I am currently trying to publish. Sorry😦

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s