The topic says it all. Basically I take the intermediate representation (IR) and translate this back into x86 assembly. The IR I use is tree based. I translate these trees representing expressions into x86 assembly, taking care to allocate registers and avoiding too many redundant copies.
The instruction selection works by a process of greedy tree matching. The top of the tree is matched against a list of patterns. The biggest patterns are matched first. The top of the tree containing the pattern is then chopped off and an x86 instruction emitted and the subtrees are then recursively matched. In actual fact, generally the subtrees are recursed before emitting an instruction. Registers can be allocated either from the leaves of the tree where a an expression is calculated with registers or immediates as operands. Or register allocation can occur more to the top of the tree. This is still a work in progress to generate more optimal code.
I still however assume an unlimited number of registers, so next to implement is register allocation. I intend to use the linear scan algorithm which is supposedly a fast register allocation algorithm especially useful for dynamic binary translators and JIT compilers. Other algorithms for register allocation use a graph colouring approach, but these can reportedly be quite slow. After register allocation comes the assembling of x86 instructions into machine code.
There are still lots of problems to fix. My code doesn’t work very nicely still with non sequential code (ie, code with more than one basic block). Also, I can still generate IR sequences that result in semantic NOPS..
Incidentally, I have enough of the x86 -> IR code done so that it can now process all the instructions executed by the UPX packer. But I still haven’t finished the status flags handling, so it’s not quite complete.