The start of a tree based IR, and some problems with basic blocks

I’ve been writing code to convert three address code, a linear IR , to a structured IR, a tree based IR.  I’m doing this with the hopes of better instruction selection using the structured IR.  The tree IR is essentially an expression tree, but it also has to deal with statements (which for me, are instructions which don’t have a result).

It hasn’t seemed like a large task to construct an expression tree from the instructions for the most part.  There are some interesting things that cropped up however.

Common sub-expressions are evident in the tree as nodes with more than one parent.  To output the expression below as plain text requires some thought, eg


[note: Put32 and Get32 are Valgrind/VEX style instructions that transfer a value to/from a stored context (in Valgrind, this stores CPU state like registers etc)].

If there is a common sub-expression, then this should be placed in a temporary, and the temporary used instead of that particular sub-expression in the code. In the code I showed above, it should be

t0 = Add32(ebx,ecx) Put32(Add32(t0, t0), 0)  

This brought up the problem I’m having with conditional branches. Generating the tree IR only seems to really work in the context of a basic block. And it’s quite possible for my three
address code IR to generate multiple basic blocks from a single native instructions. For instance, I explictly generate an iterative loop when representing a rep movs etc.

To co-ordinate the interaction between basic blocks, I’m having to explicitly assign the value of expressions to their originally assigned register. Remember that by the original IR to an expression in the tree IR, I’m essentially eliminating those registers, eg

Put32(Add32([eax=]Add32(eax, ebx), ebx), 0)

The [eax=] isnt really required.

Anyway, I have to add that explicit assignment in between basic blocks and I do this immediately before a conditional branch is made. But which registers and expressions should be explicitly assigned? Its at most the set of registers that were written to in the original code to build the expressions. To reduce this set even further, I imagine that a liveness analysis could give the set of minimal registers. The liveness analysis would tell us which registers would be used in the future. Clearly, if the register isn’t used we do not need to remember it.

But this all seems a bit over the top. I don’t really want to do a liveness analysis just to convert my linear IR to a tree based IR. This translation has to be fast if its going to be effective in a dynamic binary translator. Anyway, I’m doubtfully going to implement the liveness analysis at this point, but if I’m feeling particularly motivated one day after the rest of the code works, who knows..

I’ll try to finish the tree IR in the next couple days, and have a look at the output it generates before making up my mind on whether the new IR is worth the effort.


2 responses to “The start of a tree based IR, and some problems with basic blocks

  1. Probably my IE, but I can’t seem to make out the textbox you have in the middle of the post…

  2. Thanks for notifying me of the problem. I suppose I missed it in the preview. Sometimes wordpress changes your embedded html when switching from visual to html mode and back again, though I probably still should have seen the problem in preview. The problem was because I had 1 long preformatted () text line. It seems the vertical height of the text box was created wrongly. Adding some breaks () inside the preformatted text was able to fix it – although this really shouldn’t be necessary.. In the end I got rid of the text box altogether.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s