What appears to be a QEMU segment limit bug, and some DBT implementation thoughts

[Edit:  I received an Email from Otto Ebeling who told me that this is a known and documented limitation of QEMU.  But practically no OS would require these segment limit or access checks in any case.]

First up, an unverified QEMU bug handling segmented memory.  From looking quickly at the code QEMU has special handling for checking the segment limit in places say like, executing code in the code segment.  In other places such as what happens during a movs instruction, it calculates a linear address by adding the segment base to the address/offset held in esi or edi.  It doesn’t check the segment limit however (or even check for an integer overflow in adding the segment base for that matter).  This to me seems like a bug.  If you set up 2 segments, it could be possible to reference the 2nd segment from the 1st by overflowing the 1st segments limit.

Fixing this bug seems to be quite tricky.  The correct behaviour I think is to generate an access violation/exception when trying to dereference a segment:offset.  But QEMU generates translated code that calculates the linear address and holds it in its cache.  After calculating the address, at some point the software MMU is called using the linear address, and this is when the exception should be raised.

I had this same bug in my interpreted emulator although I knew of the problem when I first wrote the code, but then forgot about it.  The fix was pretty straight forward.  To fix the problem in the dynamic binary translation code I’m working on seems a little harder.  The ReadMemory function I use in the emulator actually takes a segment selector as a parameter, so I could simply pass the segment selector instead of trying to calculate a linear address from the segment base.  This is probably the easiest fix, but it means that the IR i’m working on becomes really ugly.  I don’t think its wise to use an IR that has segmentation, and it would be annoying to use such an IR in other projects..  I’m undecided how I’ll try to fix the segment limit problem for now..

I also spent a fair amount of time deciding about the problem of register aliasing.  In x86, al, ax, and eax all reference the same register.  They are register aliases.  I wasn’t sure if my IR should handle everything as 32bit, and not use different size operands.  I decided against this because I would lose some kinds of information.  The type reconstruction in a decompiler would seem to need the original operand sizes, and I wanted my IR to be potentially useful if I tried implementing this in the future.  I have also decided on using IR instructions that extract the lo/hi bytes akin to al/ah etc from a 32bit word, and also extracting the lower 16bits of a 32bit word.  This will make for better code generation for the DBT to have such instructions.  Valgrind actually has similar functions, except to get say ah from eax, it extracts the lower 16 bits first in 1 instruction, then the high byte from that 16bits in another instruction.

I have also currently decided to use registers in my IR instead of a block of memory representing the guest state like Valgrind uses.  Maybe the Valgrind approach is better, but for now I’m using registers.

REIL (the IR used in binnavi/zynamics etc) doesnt seem to have instructions that do the type of casts that I’m talking about above.  I only have the public online REIL documentation, so clearly they must have equivalent but for now I can’t see it documented.

3 responses to “What appears to be a QEMU segment limit bug, and some DBT implementation thoughts

  1. Hi Silvio,

    REIL does not have any special instructions for working with aliased registers.

    When we encounter an instruction like “add ax, 10” we mask the ax part of eax into a new variable, perform the operation on the temp variable, cut off overflows and mask the result back into eax.

    So, kinda like this:

    and eax, 0xFFFF, t0
    add t0, 10, t1
    and t1, 0xFFFF, t2
    and eax, 0xFFFF0000, t3
    or t2, t3, eax

    It is unclear at this point whether this design decision is a good idea or not. We have simply not yet faced a situation where this approach is preferable to another approach or vice versa.

    • I think what you describe is a very reasonable approach.

      I decided against special handling of register aliasing also.
      My needs are based heavily on dynamic binary translation (with some thoughts on not shooting myself in the foot in using the IR in other applications). Since I am required to translate my IR back to x86 I’m adding instructions that are more complicated than they necessarily need to be. I’m building my IR to use Add8/Add16/Add32, and also instructions for loading 16 bit words, and lo/hi bytes from words – LoadLo16/LoadLo8/LoadHi8 (I’ll probably rename these to something more meaningful later). Valgrind/VEX has similar instructions in its IR also. I also needed a conversion from 8/16 bit to 32 bit to handle some x86 instructions and also setting some of my lazy eflags operands from 8/16 bit results.

      So the IR for add $10,ax (at&t!) I will generate looks more like –>

      LoadLo16 eax,t1
      Add16 t1,10,t2
      StoreLo16 t2,eax (not sure about this instruction, but you get the idea)

      These instructions should generally help me select better instructions when generating native code from the IR. The problem with this of course, is that I have all these extra instructions in my IR. I’m thinking of possibly having another pass over the initial IR in the case where I want to do static analysis (I’m toying with the distant idea that I might be able to reuse my code to do code normalization to remove nops/re-order instructions/reassign registers etc) which might be helpful in a malware detection. I am not sure if this will ever come to fruition, as it’s not part of the research proposal I made for my degree.

      One thing I see at face value missing from REIL but maybe you implemented it differently, is move with sign/zero extend. Maybe you implemented
      these by your store register instruction with operands of different sizes?

      PS. I read the recent cansecwest slides on abstract interpretation using MonoREIL. I’m oh so envious of your framework🙂

  2. Yeah, it’s kind of a trade-off we are having here. We decided to have fewer instructions to make analysis as simple as possible (fewer instructions = fewer possible abstract program transformations). On the other hand, translating REIL code back to native code might turn out to be difficult. We actually have an external student working on a problem where he has to translate REIL code back to x86 code but he just started and it’s not yet clear how successful/easy this will be.

    We do not have special zero/sign-extend instructions. STR is zero-extend by default so you can just use STR to move ax zero-extended to ebx. Sign-extension is simulated using some bit trickery.

    About MonoREIL, yeah we have a lot of plans for it. We’ll see how this turns out in the next few months. What we presented at CSW were basically just the very first steps. We already have a few people working on MonoREIL stuff (like the student I mentioned previously) that will hopefully be released until the end of the year.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s