I was running my emulator against some malware, and came across one that was packed with an early version of telock (0.7x iirc). My emulator didn’t handle it nicely, and neither did my tracer. The problem was a rep stosb instruction which was overwriting itself (the target of the stosb was the address of the instruction). Remember that it takes many steps in a rep, depending on ecx and weather fast string operations are being used on your cpu. So single stepping through the rep, my emulator was decoding the instruction at that address after each step. In the real world, the instruction is cached in a Prefetch Input Queue http://en.wikipedia.org/wiki/Prefetch_input_queue. Even though that wiki is a bit old and most of the prefetch tricks dont work anymore, I’ve read that rep stos and rep movs still use the prefetch. What this means, is that the rep stos instruction in that malware I was looking at is cached, and even if the instruction is overwritten in memory, the prefetch’d instruction is still used for execution. Flushing the PIQ occurs in a number of places, including during single stepping. So when single stepping that malware with the prefetch trick, it behaves differently than if it were running normally. In my case, the malware was crashing.
It was a fairly straight forward task to modify my emulator to handle this correctly, but a different story in making my tracer work around it. I really needed to single step for the most part, but could cope with losing stepping through a rep stos/movs. I thought of a few ideas, including setting a breakpoint after the instruction to simulate a single step. In the end I came up with a different approach.
I single step as per usual, but when I come across a rep movs/stos, I handle it specifically. I copy the memory at eip for the instruction length – this is what is in the prefetch.
I still step through the rep movs/stos, but before executing it, I take note of %edi. If it points to the address of the instruction, I copy the contents of that memory location after the step, as this will be the real contents of memory after the prefetch.
I then restore the memory that was modified to its original data (what is in the prefetch). This allows the instruction to continue as if the prefetch and the the target memory were exactly the same.
At the end of the rep, I replace the memory with the modified version I’ve been building.
Dunno how good of a description that is.. My English skills were never the best. In any case, this works fairly well. I have some concerns if a rep movs occurs with a source and destination both pointing to parts of the current instruction.. But I don’t think any malware is using this form of rep movs, so I’ll not spend any more time on it.
And yes, I did manage to unpack telock with my emulator once this was working – and also verify it with my tracer :-)