I’ve been writing code to convert three address code, a linear IR , to a structured IR, a tree based IR. I’m doing this with the hopes of better instruction selection using the structured IR. The tree IR is essentially an expression tree, but it also has to deal with statements (which for me, are instructions which don’t have a result).
It hasn’t seemed like a large task to construct an expression tree from the instructions for the most part. There are some interesting things that cropped up however.
Common sub-expressions are evident in the tree as nodes with more than one parent. To output the expression below as plain text requires some thought, eg
[note: Put32 and Get32 are Valgrind/VEX style instructions that transfer a value to/from a stored context (in Valgrind, this stores CPU state like registers etc)].
If there is a common sub-expression, then this should be placed in a temporary, and the temporary used instead of that particular sub-expression in the code. In the code I showed above, it should be
t0 = Add32(ebx,ecx) Put32(Add32(t0, t0), 0)
This brought up the problem I’m having with conditional branches. Generating the tree IR only seems to really work in the context of a basic block. And it’s quite possible for my three
address code IR to generate multiple basic blocks from a single native instructions. For instance, I explictly generate an iterative loop when representing a rep movs etc.
To co-ordinate the interaction between basic blocks, I’m having to explicitly assign the value of expressions to their originally assigned register. Remember that by the original IR to an expression in the tree IR, I’m essentially eliminating those registers, eg
Put32(Add32([eax=]Add32(eax, ebx), ebx), 0)
The [eax=] isnt really required.
Anyway, I have to add that explicit assignment in between basic blocks and I do this immediately before a conditional branch is made. But which registers and expressions should be explicitly assigned? Its at most the set of registers that were written to in the original code to build the expressions. To reduce this set even further, I imagine that a liveness analysis could give the set of minimal registers. The liveness analysis would tell us which registers would be used in the future. Clearly, if the register isn’t used we do not need to remember it.
But this all seems a bit over the top. I don’t really want to do a liveness analysis just to convert my linear IR to a tree based IR. This translation has to be fast if its going to be effective in a dynamic binary translator. Anyway, I’m doubtfully going to implement the liveness analysis at this point, but if I’m feeling particularly motivated one day after the rest of the code works, who knows..
I’ll try to finish the tree IR in the next couple days, and have a look at the output it generates before making up my mind on whether the new IR is worth the effort.