The iterative solution is a well known method for solving data flow equations. The iterative solution processes nodes by applying the transfer function and join operator, and repeats until the state of each node reaches a fixed point – that is, there is no change of state in the node after the transfer+join.
There are a different ways of implement the iterative solution. The round robin is the simplest, and that is done by traversing or iterating through each node in a single pass applying the data flow equations (transfer function and join). If any node has a change in state, the process is repeated. My first implementation of this ignored the ordering of the nodes and control flow graph completely.
Today I thought I would make a faster version using the worklist approach. The worklist approach starts off with all nodes in a worklist. Then a node is removed from the worklist and processed. If the state changes, then its successors (for forward dataflow problems) are added to the worklist. Another item is taken from the worklist if one exists and the process repeats until the list is empty. I implemented that earlier today.
It turns out that my worklist approach isn’t at all well implemented. I was requiring processing of 9000 nodes at one point. Turned out I wasn’t ignoring duplicates in the worklist. Fixed. Now it took 1900 nodes to reach a fixed point.
BUT.. the round robin approach implemented by traversing the CFG is much faster. It only took 580 odd nodes to process. The order of the traversal is supposed to be quote important. For forward data flow problems, reverse post order is used, where the node is processed before its successors. There is a modification to this indicated on wikipedia which talks about exceptions involving backedges. I haven’t implemented that.
It’s just a note to say that sometimes when you think you’ve got the faster algorithm, its not certain until you test it. I’m sure its possible to get the worklist approach to be faster than the round robin, and infact a paper I’m reading talks about the naive worklist algorithm being slower than the round robin method.
I think that it seems even if I implement a better algorithm, data flow analysis is still too slow to be used in a dynamic binary translator.