[Edit: I received an Email from Otto Ebeling who told me that this is a known and documented limitation of QEMU. But practically no OS would require these segment limit or access checks in any case.]
First up, an unverified QEMU bug handling segmented memory. From looking quickly at the code QEMU has special handling for checking the segment limit in places say like, executing code in the code segment. In other places such as what happens during a movs instruction, it calculates a linear address by adding the segment base to the address/offset held in esi or edi. It doesn’t check the segment limit however (or even check for an integer overflow in adding the segment base for that matter). This to me seems like a bug. If you set up 2 segments, it could be possible to reference the 2nd segment from the 1st by overflowing the 1st segments limit.
Fixing this bug seems to be quite tricky. The correct behaviour I think is to generate an access violation/exception when trying to dereference a segment:offset. But QEMU generates translated code that calculates the linear address and holds it in its cache. After calculating the address, at some point the software MMU is called using the linear address, and this is when the exception should be raised.
I had this same bug in my interpreted emulator although I knew of the problem when I first wrote the code, but then forgot about it. The fix was pretty straight forward. To fix the problem in the dynamic binary translation code I’m working on seems a little harder. The ReadMemory function I use in the emulator actually takes a segment selector as a parameter, so I could simply pass the segment selector instead of trying to calculate a linear address from the segment base. This is probably the easiest fix, but it means that the IR i’m working on becomes really ugly. I don’t think its wise to use an IR that has segmentation, and it would be annoying to use such an IR in other projects.. I’m undecided how I’ll try to fix the segment limit problem for now..
I also spent a fair amount of time deciding about the problem of register aliasing. In x86, al, ax, and eax all reference the same register. They are register aliases. I wasn’t sure if my IR should handle everything as 32bit, and not use different size operands. I decided against this because I would lose some kinds of information. The type reconstruction in a decompiler would seem to need the original operand sizes, and I wanted my IR to be potentially useful if I tried implementing this in the future. I have also decided on using IR instructions that extract the lo/hi bytes akin to al/ah etc from a 32bit word, and also extracting the lower 16bits of a 32bit word. This will make for better code generation for the DBT to have such instructions. Valgrind actually has similar functions, except to get say ah from eax, it extracts the lower 16 bits first in 1 instruction, then the high byte from that 16bits in another instruction.
I have also currently decided to use registers in my IR instead of a block of memory representing the guest state like Valgrind uses. Maybe the Valgrind approach is better, but for now I’m using registers.
REIL (the IR used in binnavi/zynamics etc) doesnt seem to have instructions that do the type of casts that I’m talking about above. I only have the public online REIL documentation, so clearly they must have equivalent but for now I can’t see it documented.