When writing an interpreter or a dynamic binary translator, it’s typical to maintain in memory a CPU context. This is some memory set aside to store the CPU state such as register contents, the program counter and so forth. In an interpreter, since you are only handling a single instruction at a time, you do all changes to the guest CPU registers directly to that CPU context stored in memory. In dynamic binary translation, there is an opportunity to use real registers in the translated blocks. But typically, at the end and beginning of the block, you transfer from/to the stored CPU context into those real registers.
But how does that transfer of registers to the CPU context actually work?
Valgrind uses a nice idea of having specific IR instructions set aside to do this transfer. Get and Put. You specifiy an offset in the CPU context, the size, and an IR register. Then the CPU context contents are transferred.
A question is, should Get/Put be manually generated in the native code to IR translation? But first, what are all the places Get/Put should be used?
At the beginning of a code fragment, registers need to be loaded. So this is the main place to use Get. At the end of the code fragment, or rather, at every exit point in the fragment, modified registers need to be Put back.
That is quite reasonable, but more annoying are places where control can be indirectly transferred. Take memory loads and stores. These operations, handled by callbacks in the translated code when using a software MMU implementation, can cause exceptions in the guest. This means, that a load and store of memory is a point of exit for the code fragment. Thus, it seems we need to Put the registers back before hand.
Puting back registers before every memory access is really annoying and seems likely to cause a bit of bloat in the translated fragments. I don’t have a nicer solution however. Valgrind doesn’t seem to have this problem, and I’m not quite sure how it avoids it. More investigation is required..
So back to automatically generating Get and Put statements in the IR. This can be done. A simple but slightly incorrect solution is to Get all registers at the entry point that are used by the rest of code fragment. Another simple solution is to Put all registers that are defined (written to) by the code fragment at each exit point.
A problem exists that you might Put a register before its been defined, depending on the order of instructions. I imagine the correct solution to this is only Put registers that are in the set of reaching definitions.
I didn’t implement the reaching definitions solution. I used the more conservative approach of Getting the set of all registers that are used, AND defined. That is faster to generate, but produces worse IR.
Incidentally, another problem for memory access is the saving of the lazy eflags registers. These also need to be saved before a potential exception, else the SEH handler might have the wrong value in the SEH context.
In summary, alot of extra headaches are caused by potential exceptions in memory accesses. Maybe there is a better solution, but it might require a more hackish approach than the one I’ve described.