Silviocesare’s Weblog

Bug in single stepping over a popf setting the trap flag

June 4, 2008 · No Comments

Title of the post nearly sums it up.  In win32, single stepping over a popf that sets the trap flag.  The trap flag when examined using GetThreadContext reports the trap flag as being clear.

I tried for the first time installing OllyDbg today also, but Olly has no problem in detecting the trap flag as set.  I’m not sure how it is able to do this.

To implement a solution in my own debugger, I will have to disassemble from the instruction pointer.  If its a popf, I will retrieve the contents from the stack and check for the trap flag being set.  If it is, I will call DbgContinue with DBG_EXCEPTION_NOT_HANDLED.

→ No CommentsCategories: C Bugs · Reverse Engineering

Single stepping through NtContinue, faking pushf, and trapping int1

June 3, 2008 · No Comments

In my bid to emulate win32 seh (in my emulator), I needed (to test my emulator) to single step through exception handling.  My first approach was to just single step through the program by setting the trap flag in the eflags register.  I had some problems with this.

One of the problems is single stepping through NtContinue.  Looking at the code it does a sysenter which transfers control to the kernel.  The kernel then sets the new context to have somewhere to return too.  I tried single stepping past the sysenter - the problem is however, that a single stepping exception isnt raised until it comes too the 2nd instruction of the new context.eip.  Bizarre.

Also bizarre is the fact that NtContinue changes Dr6 and Dr7.  I had some bad code just checking for Dr6 to be non zero for a breakpoint exception, but infact while the lower 4 bits are 0 (1 bit for each possible breakpoint register Dr0-Dr3), the upper bits were thrashed.  Also Dr7 was changed.  Perhaps its even conceivable that there is information leakage from the kernel.  Dunno for sure. 

The only solution for trapping on the first instruction following NtContinue, was setting a breakpoint on context.eip.  Therefore this requires parsing of NtContinue’s argument (for which there is a CONTEXT structure).  This is fairly straight forward.  Set a breakpoint on NtContinue, then parse the CONTEXT and set a new breakpoint on context.eip.

How do we get the breakpoint for NtContinue?  We can do a couple of things.

1) resolve the symbol, and place a breakpoint on it

2) look at the return address of the exception handler, which eventually ends up taking you into NtContinue.

I chose 2).

Both these solutions have problems if the exception handler decided to implement its own version of NtContinue.  Perhaps the best solution is to emulate down to the native api.  I didn’t implement emulation of the native api, so I’m stuck for the time being with the partial solution.

In the process of partially emulating vmprotect, I came across a few calls to pushf, and popf.  These instructions push and pop the eflags register.  Initially I thought the packer might be checking to see if the trap flag was set.  It turns out it wasn’t doing this check, but I none the less implemented in my tracer/debugger the ability to process pushf’s by modifying the stack contents to have the trap flag clear.  This hack should allow my debugger to work on binaries that include this type of anti-debugging.

Also implemented in my emulator is processing of explicit int1’s in the code.  Before DbgContinue is called, I check the current instruction pointer and if its an int1 instruction, I keep note of this, and also allow the application to process the int1 by itself.  Naturally I have to set a breakpoint in the exception handler, If i still want to maintain control.

→ No CommentsCategories: C Programming · Reverse Engineering · Windows

Fast string operations, Was x86 CPU bug in rep movsb

May 23, 2008 · No Comments

UPDATE:  This isn’t a bug after all.  Aspect provided documtation of what is actually occuring.

It’s a feature since pentium pro computers to do ‘fast string’ or block operations.  A block operation (eg, movb) of 64 bytes is performeed if ecx >= 64, if edi is aligned to 8 an byte boundary, and if esi and edi are not both in the same cachline (64 byte block).  Otherwise, it performs a single operations.

This seems to have resolved my emulation problems :-)

While unpacking MEW in my emulator, I came across an interesting bug.  single stepping through rep movsb with ecx=65 completes the instruction in 2 steps.

movsb copies a byte from the memory pointed to by esi, into the memory pointed to be edi.  the rep part of the instruction, repeats the movsb ecx times.  It does this by iteratively decrementing the ecx register until it is 0.

On my computer, an old P4, single stepping rep movsb with ecx 65,  single steps from ecx=65  to ecx=1.  This is incorrect (I presume), it should single step through every decrement of ecx.

nemo courteously tested this bug on his own PC, and reported that it single stepped through every decrement of ecx.  This bug is probably specific to my CPU type.

→ No CommentsCategories: Uncategorized

cpu bug, repne changes status flag in scasb

May 21, 2008 · No Comments

Another CPU bug uncovered while testing my emulator.   I came across a repne scasb while emulating the win32 version of upx.  The logic of scasb (scan string), to paraphrase the intel manuals is


SRC = dereference(edi)
temp = al -  SRC
SetStatusFlags(temp)
update_edi

In the code I ran across, %al was set to 0, the byte at (%edi) was 70 (decimal).   %ecx was large.  Following the operation, the carry flag was cleared.  This is incorrect, the carry flag should be set (0 - 70 sets carry).

I was unsure if my understanding of carry was wrong, so I tried 0 - 70 in a sub.  Carry was set as expected.  scasb’s logic is to perform a temporary subtraction of %al-(%edi) and set the status flags using the temporary result as explained earlier.

When scasb was performed in isolation with the same test case, carry was set.  It seems that including repne in the scasb, changes the carry flag to an incorrect result.

→ No CommentsCategories: Reverse Engineering

gdb leaves file descriptors open in debugee

May 13, 2008 · 6 Comments

I have my emulator running reasonably successfully on upx now.  It’s actually an auto unpacker, and identifies when the program is unpacked by monitoring execution on previously written memory.  In the process of emulating file io I came across a particular bug in gdb.

The file descriptor returned from an open call inside the debuggee, was 6.  I was expecting 3.

stdin=0, stdout=1,stderr=2

gdb must be using file descriptors 3,4,5, and forgot to close them before calling execve.

I’m not sure what the descriptors are used for.  Anyone care to take a look?

In the best case scenario, this bug can be used for another test to see if a debugger is present, and in the worst case if these file descriptors were used for control, *gasp* control gdb?  Probably they arent used for anything important, but I havent looked any furthur..

→ 6 CommentsCategories: C Bugs · Reverse Engineering

CPU Bug x86 shl behaviour sets overflow flag

May 9, 2008 · No Comments

I’ve been writing an x86 emulator, and to debug it, I ran it on a p4 computer in parallel to a debugger on a target program (a upx packed binary).  Well.. I got to shl $8, %eax where eax = 0×00ffffff.

The intel documentation says that the overflow flag is only changed for 1 bit shifts.  Suprisingly, in the 8 bit shift, the overflow flag became set.  In a 7 bit or 9 bit shift of the same value, the overflow flag remains clear (or perhaps unchanged).

I’ve been googling to see other reports of this undocumented behaviour, but either its not out there, or more likely my googling skills are poor.  I couldn’t find a reference.

Anyone got more information on this?

[Update:  I have had reports from one person which said the behavior varied between setting and clearing the flag depending on the cpu.]

→ No CommentsCategories: Reverse Engineering

Merging basic blocks to deobfuscate non continugous control flow

May 1, 2008 · 2 Comments

In some binaries, basic blocks may be connected only by jumps.  These basic blocks may also be non contiguous in the file, ie scattered throught the binary.

In cases like this, if your looking at the disassembly, you need to constantly jump throught the image to have the logical order of the control flow.  When the control flow is graphed, it appears logically linear, but when reading the code, it sometimes help to go for the older text dump of the disassembly.

The way I implemented this, was to construct a control flow graph of each procedure.  Then merge basic blocks with their predecessor iff only one predecessor exists and that this predecessor only has one successor (the original basic block we are looking at merging).  To dump the disassembly, a recursive approach for each basic block is taken.  Dumping the assembly representing the current basic block, the next linear basic block (applied recursively), and the branched basic block (if it exists.  also applied recursively).

I made these improvements to my disassembler, so it prints the disassembly in logical order, following the jumps.  In at least one piece of malware out of a sample of about ten, this deobfuscation proved successful, and over 800 basic blocks were merged in an object with around 14000 instructions.  The malware samples I’ve been using have come from http://www.offensivecomputing.net/

I’m in the process of looking at more malware samples to see how common this type of obfuscation is.  If anyone can, names of malware samples would be great for me to look at and run my disassembler.

Probably more useful that the deobfuscator I’ve described is an automatic unpacker.  Most of the malware is packed, and infact, the disassembly is non trivial since indirect jumps and calls seem common.  This might be something that I will work on in the future.

In at least one other malware sample I have, dead code is common.  That is, registers are assigned, modified, then reassigned new values (without making any furthur use of the original references) making the older references dead.  I would like to automate this, and liveness analysis should be able to identifify these cases, however, I have yet to implement dataflow analysis in my disassembler..

→ 2 CommentsCategories: Reverse Engineering

retn $0xhh consistency across function tails

April 13, 2008 · 1 Comment

Some procedures, following a calling convention simply return (ret), without modifying the stack pointer (they expect the caller to perform stack correction).  In another call convention, procedures (callees) modify the stack using retn $0xhh.

Yesterday I made some changes to my disassembler, so that it would look at the stack correction in procedures.  But there is a possibility of inconsistency when there is more than one control path that exits the procedure.  Each seperate path could have different stack correction.

I ran a test involving all the binaries on Windows XP Home Edition in \windows\system32\ that can my disassembler can handle (278 in total).  It doesn’t yet process DLL’s, just executables.  The results were interesting.  9 binaries had inconsistancies in some of its procedures.

  • There were a majority of binaries that had a failed analysis of a procedure using an idiom where a call would be issued, and the following instruction would be an int3.  This apparently is a call to a procedure that should not return, or if it does, it should raise a debug exception (thanks to nevar for telling me this).
  • A couple of binaries failed in procedures that were using the frame pointer (for local variables and procedure parameters), but did not setup or destroy a new frame using the standard prologue and epilogue.  I suspect that these are nested functions.  However Cygwin’s compiler fails to generate such code.  I should experiment with other compiles. 

For comparison, I ran several of the failed binaries through IDA, which fails to perform SP (stack pointer) analysis.  Note that the procedures in question only have one entry point.

My question then, are these code sequences correct?  Is it ever valid for a compiler to generate code that performs stack correction differently, depending on the code path taken in the function tail?

My suspicion is that these are compiler bugs… but maybe someone can answer definitively..

→ 1 CommentCategories: Uncategorized

ClamAV.

February 19, 2008 · 7 Comments

I lied when I said I would write in a day the details of the ClamAV bug published by idefense last week.

 ClamAV was acquired by Sourcefire, which is the software company that is responsible for the Snort IDS.

 ClamAV code needs a fair amount of refactoring to be maintainable.  The current sources are quite disturbing.  I’m not suprised there have been a number of bugs posted against in the past 6 months.  Mind you, the ClamAV website doesn’t seem to keep on its list of security advisories, all the advisories that have been posted against it.

ClamAV is being developed by Sourcefire, and are obviously working hard to get their acquirement (the source code) up to standards.

→ 7 CommentsCategories: Uncategorized

IDefense Updates Continued..

February 12, 2008 · 1 Comment

Tomorrow should be released an IDefense advisory of the vulnerability I submitted.  I’ll make a posting tomorrow with more details of the bug.

→ 1 CommentCategories: Uncategorized