That tree based IR was causing me grief, and at one point I decided that the solution was to convert my IR into SSA form. And then I started thinking I needed a data flow analysis framework. After a day or two or coding I’ve got some initial results on the data flow analysis but still haven’t been able to generate a tree based IR.
The data flow analysis framework has only reaching definitions implemented so far, but it should be fairly easy now to add new types of analysis such as live variables or available expressions. I have 3 main handlers, Join, TransferFunction, and a final function to move the results from the in state to the out state and check if the state has changed. That final function is useful for solving the dataflow equations using an iterative solution.
Along the way I had to introduce some other bits and pieces. My IR works in a kind of stack. Each IR instructions gets handed down layers in the stack. The first layer is handed IR instructions that are generated from the translation of native code. The next layer could be used to provide a storage mechanism of each instruction.
I added another layer today in my code to convert single exit multiple exit blocks into single entry, single exit blocks. It’s much easier to write code when you don’t have to use explicit labels on every basic block, but for automated analysis, those demarcations are essential. I also wrote code to support the control flow graphs – this is processed by another layer on the IR stack. It takes in a sequence of IR instructions, and providing each basic block is labelled (which now is generated by the single entry single exit code), it builds a control flow graph.
I’m worried that all this analysis is not going to be viable for use in the dynamic binary translation as it’s really quite computationally expensive, but at least I have the code for other things also.