Category Archives: Reverse Engineering

Bug in single stepping over a popf setting the trap flag

Title of the post nearly sums it up.  In win32, single stepping over a popf that sets the trap flag.  The trap flag when examined using GetThreadContext reports the trap flag as being clear.

I tried for the first time installing OllyDbg today also, but Olly has no problem in detecting the trap flag as set.  I’m not sure how it is able to do this.

To implement a solution in my own debugger, I will have to disassemble from the instruction pointer.  If its a popf, I will retrieve the contents from the stack and check for the trap flag being set.  If it is, I will call DbgContinue with DBG_EXCEPTION_NOT_HANDLED.

Single stepping through NtContinue, faking pushf, and trapping int1

In my bid to emulate win32 seh (in my emulator), I needed (to test my emulator) to single step through exception handling.  My first approach was to just single step through the program by setting the trap flag in the eflags register.  I had some problems with this.

One of the problems is single stepping through NtContinue.  Looking at the code it does a sysenter which transfers control to the kernel.  The kernel then sets the new context to have somewhere to return too.  I tried single stepping past the sysenter – the problem is however, that a single stepping exception isnt raised until it comes too the 2nd instruction of the new context.eip.  Bizarre.

Also bizarre is the fact that NtContinue changes Dr6 and Dr7.  I had some bad code just checking for Dr6 to be non zero for a breakpoint exception, but infact while the lower 4 bits are 0 (1 bit for each possible breakpoint register Dr0-Dr3), the upper bits were thrashed.  Also Dr7 was changed.  Perhaps its even conceivable that there is information leakage from the kernel.  Dunno for sure. 

The only solution for trapping on the first instruction following NtContinue, was setting a breakpoint on context.eip.  Therefore this requires parsing of NtContinue’s argument (for which there is a CONTEXT structure).  This is fairly straight forward.  Set a breakpoint on NtContinue, then parse the CONTEXT and set a new breakpoint on context.eip.

How do we get the breakpoint for NtContinue?  We can do a couple of things.

1) resolve the symbol, and place a breakpoint on it

2) look at the return address of the exception handler, which eventually ends up taking you into NtContinue.

I chose 2).

Both these solutions have problems if the exception handler decided to implement its own version of NtContinue.  Perhaps the best solution is to emulate down to the native api.  I didn’t implement emulation of the native api, so I’m stuck for the time being with the partial solution.

In the process of partially emulating vmprotect, I came across a few calls to pushf, and popf.  These instructions push and pop the eflags register.  Initially I thought the packer might be checking to see if the trap flag was set.  It turns out it wasn’t doing this check, but I none the less implemented in my tracer/debugger the ability to process pushf’s by modifying the stack contents to have the trap flag clear.  This hack should allow my debugger to work on binaries that include this type of anti-debugging.

Also implemented in my emulator is processing of explicit int1’s in the code.  Before DbgContinue is called, I check the current instruction pointer and if its an int1 instruction, I keep note of this, and also allow the application to process the int1 by itself.  Naturally I have to set a breakpoint in the exception handler, If i still want to maintain control.

cpu bug, repne changes status flag in scasb

Another CPU bug uncovered while testing my emulator.   I came across a repne scasb while emulating the win32 version of upx.  The logic of scasb (scan string), to paraphrase the intel manuals is

SRC = dereference(edi)
temp = al -  SRC

In the code I ran across, %al was set to 0, the byte at (%edi) was 70 (decimal).   %ecx was large.  Following the operation, the carry flag was cleared.  This is incorrect, the carry flag should be set (0 – 70 sets carry).

I was unsure if my understanding of carry was wrong, so I tried 0 – 70 in a sub.  Carry was set as expected.  scasb’s logic is to perform a temporary subtraction of %al-(%edi) and set the status flags using the temporary result as explained earlier.

When scasb was performed in isolation with the same test case, carry was set.  It seems that including repne in the scasb, changes the carry flag to an incorrect result.

gdb leaves file descriptors open in debugee

I have my emulator running reasonably successfully on upx now.  It’s actually an auto unpacker, and identifies when the program is unpacked by monitoring execution on previously written memory.  In the process of emulating file io I came across a particular bug in gdb.

The file descriptor returned from an open call inside the debuggee, was 6.  I was expecting 3.

stdin=0, stdout=1,stderr=2

gdb must be using file descriptors 3,4,5, and forgot to close them before calling execve.

I’m not sure what the descriptors are used for.  Anyone care to take a look?

In the best case scenario, this bug can be used for another test to see if a debugger is present, and in the worst case if these file descriptors were used for control, *gasp* control gdb?  Probably they arent used for anything important, but I havent looked any furthur..

CPU Bug x86 shl behaviour sets overflow flag

I’ve been writing an x86 emulator, and to debug it, I ran it on a p4 computer in parallel to a debugger on a target program (a upx packed binary).  Well.. I got to shl $8, %eax where eax = 0x00ffffff.

The intel documentation says that the overflow flag is only changed for 1 bit shifts.  Suprisingly, in the 8 bit shift, the overflow flag became set.  In a 7 bit or 9 bit shift of the same value, the overflow flag remains clear (or perhaps unchanged).

I’ve been googling to see other reports of this undocumented behaviour, but either its not out there, or more likely my googling skills are poor.  I couldn’t find a reference.

Anyone got more information on this?

[Update:  I have had reports from one person which said the behavior varied between setting and clearing the flag depending on the cpu.]

Merging basic blocks to deobfuscate non continugous control flow

In some binaries, basic blocks may be connected only by jumps.  These basic blocks may also be non contiguous in the file, ie scattered throught the binary.

In cases like this, if your looking at the disassembly, you need to constantly jump throught the image to have the logical order of the control flow.  When the control flow is graphed, it appears logically linear, but when reading the code, it sometimes help to go for the older text dump of the disassembly.

The way I implemented this, was to construct a control flow graph of each procedure.  Then merge basic blocks with their predecessor iff only one predecessor exists and that this predecessor only has one successor (the original basic block we are looking at merging).  To dump the disassembly, a recursive approach for each basic block is taken.  Dumping the assembly representing the current basic block, the next linear basic block (applied recursively), and the branched basic block (if it exists.  also applied recursively).

I made these improvements to my disassembler, so it prints the disassembly in logical order, following the jumps.  In at least one piece of malware out of a sample of about ten, this deobfuscation proved successful, and over 800 basic blocks were merged in an object with around 14000 instructions.  The malware samples I’ve been using have come from

I’m in the process of looking at more malware samples to see how common this type of obfuscation is.  If anyone can, names of malware samples would be great for me to look at and run my disassembler.

Probably more useful that the deobfuscator I’ve described is an automatic unpacker.  Most of the malware is packed, and infact, the disassembly is non trivial since indirect jumps and calls seem common.  This might be something that I will work on in the future.

In at least one other malware sample I have, dead code is common.  That is, registers are assigned, modified, then reassigned new values (without making any furthur use of the original references) making the older references dead.  I would like to automate this, and liveness analysis should be able to identifify these cases, however, I have yet to implement dataflow analysis in my disassembler..

Disassembling Obfuscated Assembly

I wrote a disassembler a week or two ago.  Actually I used libdasm to do the grunt work while I just played with the higher level code.

I want to write some static analaysis tools, so disassembling is the first part of the process. Analysis of control flow soon follows the disassembly. Obfsucating the control flow can cause static analysis tools to fail.

  .byte 0xE8 ; 0xE8 is the opcode for a CALL
A: CALL _myfunc

This is a small obfuscation that works on disassemblers that use the Linear Sweep method. It successfuly obfuscates the real call, making static analysis tools ,that are dependant on correctly identifing the control flow and call graph, fail. The Linear Sweep method of disassembly is used most noteably by objdump.

The Recursive Traversal method can successfuly disassemble the previous obfuscation, because it follows the control flow.  Therefore, it would cease disassembly after the jump site, and then correctly follow the jump to the target.

It is possible to modify the code again to make it harder to disassemble by recursive traversal.

  MOVL $0,%eax
  JZ A
.byte 0xE8
A: CALL _myfunc

The conditional jump now replaces our earlier unconditional one. A recursive traversal disassembler will see the conditional jump and believe control flow can occur to the jump target, and immediately proceeding the jump site. The code immediately proceeding the jump site is incorrect, so this should be ignored. But how to do it by automatic methods?

The first idea I came up with, was modifying the disassembler to look for static code. In the case above, the disassembler would try to recgonize jz+1.

What happens though if the following occurs.

  MOV $0,%eax
  JZ A
  .byte 0xe8
A: CALL _myfunc

In this case, it’s no longer a jz +1, but a jz+4. This can be changed indefinately. The NOPS’ could be  replaced with junk instructions, or any other polymorphic and metamorphic code.

There will however be a conflict using recursive traversal when it reaches A via the jump target, and reaches A-1 via disassembling in a straight line. There will be overlapping disassemblies, and it is unclear which disassembly is correct.

The root problem is that the recursive traversal identifies code, when infact it should not. This is what is termed a false positive in disassembly.

Perhaps disassembly should keep both sequences of disassembly, and consider them unique parts of the program. Later on, dead code elimination can be performed if there is enough data flow analysis.

Or should the data flow analysis be done during the recursive traversal?  This might be enable the disassembler to recognize the conditional jump as really being unconditional.

Another method of obfuscation is to replace the conditional jump with an indirect jump.  This works, because data flow analysis is required to see where the jump target occurs.  Static disassembly has a hard time dealing with this.

PUSH _myfunc; // setup return address. aka emulate call
PUSH _myfunc; // now to do the jump

Again, there are a number of methods of doing this. We could scan for PUSH/RET pairs and identify them as jump sites. But what if an obfuscator changes that code.

PUSH _myfunc;
MOV %eax,%eax
PUSH _myfunc
mov %ebx,%ebx

There are many such morphisms that can occur. The solution to disassemble this, could be to perform data analysis as we perform disassembly. That is, for each basic block found, eliminate all equivalent NOP instructions and then check for jumps.

This idea still wouldn’t be perfect I imagine, but it could improve the situation considerably.