Monthly Archives: July 2008

Finding free() in IOS

This is a continuation of the post https://silviocesare.wordpress.com/2008/07/27/finding-malloc-in-ios/.

Now that I have the address of malloc, I can go about finding free().  The logic behind determining free, is by looking for mallocs that return the same address.  This must surely indicate a free() has occured, and memory is being recycled.  Next, we track all calls, that use as an argument (register $a0-$a3) the address of the malloced buffer.  One of these functions must be free.

Using the logic described over a number of test cases, we look at the common functions before the following malloc.  This gives us an address which we think is free.

To test that we have the right address, I track all mallocs and suspected frees, making sure that no mallocs return the same buffer without a free in between, and that there are no double frees.  That is, make sure all our mallocs and frees match up.

So did it work?

I spent several hours last night testing without success,  and was truely becoming frustrated.  A few minutes this morning with fresh eyes identified the problem.  I was hooking free() by breaking on jalr/jar (MIPS call instructions), instead of hooking directly at the head of the real function.  Hooking at the first instruction resolved the issues I was having.  This must mean that there are jumps being made directly to free instead of using the call subroutine approach.

Anyway, a happy ending to several hours of frustration.. I have the addresses of malloc and free, in a fairly portable way across IOS versions.

Advertisements

Finding malloc() in IOS

I needed to find the address of malloc() in Cisco IOS.  Incidentally, I also need to find free(), but I havent gotten around to doing that yet.

The first thing to note is ‘show mem’ on the IOS command line.  It shows malloc chunk information, and is utterly useful; it gives you the address of each malloc chunk and its header information.  The next thing to note is the IOS Exploitation presentation by FX in 2002 http://www.blackhat.com/presentations/bh-usa-02/bh-us-02-phenoelit-network.pdf.  This presentation gives information on the malloc chunk itself, the header and the trailer.  Also essential is dynamips http://www.ipflow.utc.fr/index.php/Cisco_7200_Simulator which I use for dynamic analysis to locate malloc.

Dynamips has a command vm_debug pmem_r32 (and r16, and w16, and w32) to print data at a physical address.  I suspect IOS has a command to dump specific memory contents, but I don’t know what it is (can someone who knows IOS tell me if this is true or not?).  I made some modifications to dynamips to do virtual address lookups.  The problem with implementing read memory using a virtual address, is that virtual to physical address translation is done in software under MIPS in the event of a TLB miss. 

In the FX presentation, and visible by looking at the malloc chunks in memory, we can see that the first 32bit word of a malloc chunk is a constant magic value of 0xab1234cd.  At the end of a malloc chunk (following the malloc data) is the ‘redzone’, which is another constant value of 0xfd0110df.

Finding malloc’s address has been done before. http://www.lemuria.org/mirrors/lynn-cisco.pdf finds the address of malloc by its relation with an error string it uses.  I do not believe his code is public however.

Attempt 1 that doesn’t give us results)

First, Load the IOS image in IDA (which requires unzipping the original image, and setting the e_machine type to MIPS in the resulting elf binary).  Because MIPS doesnt allow you to do a 32bit immediate load, the loads are performed as two 16 bit loads.  The logic we are to follow is that the malloc function writes the constant values to the malloc chunk, so finding those stores, gives us the malloc procedure.  Searching for an immediate value of 0xab12 gives us a list of functions to work with.  Then searching for 0xfd01 and finding a matching function gives us 1 function to work with.

Is this the function we are looking for?   It seems like we followed the right approach, but this is not the code we are looking for.  I included this approach, even though it did not work for me, because it seemed like a fairly reasonable approach.

Attempt 2) A dynamic approach

This approach works and gives us a malloc location in any IOS that runs inside dynamips.

If we track the MIPS store word, or sw instruction and look for a write of 0xab1234cd, we can tell when a malloc chunk is being created, and at what location in memory the chunk is at.  So we instrument sw in dynamips, and have it keep track of the chunk address in an associative map.

We also keep track of the stack trace.  I did this by instrumenting JAL/JALR and JR to build a stack implemented as a c++ list.  JAL is jump and link, JALR is jump and link register.  Its the MIPS call instruction.  It makes a copy of the return address in the register $ra (return address) register.  Then to return, JR, or jump register is called with a target of $ra.

Now when a function returns (JR $ra), we look at the return address (register $v0), and see if that address is in our map storing malloc chunks.  But because we are looking for the user’s view of malloc without the prepended header, we believe the return address is actually the address of the malloc chunk plus an offset.  Turns out that offset is 48 in ios 12.4.  Having verified the return address is the address of the malloc buffer (chunk address + header), we take note of the current procedure on the stack frame and call this our malloc.

Also important, is that only the first function that returns our malloc buffer is malloc.  Otherwise we could get functions that simply pass around the malloc buffer.

So running the code, low and behold gives a single address for the times we see the first malloc buffer being returned.  Comparing the results with IOS’s show mem command validates our findings.  A little more inspection, sees that register $a3 (argument 4) contains the size of the malloc buffer.

Interestingly, the size passed to malloc() is 4 less than as shown by the show mem command.  Perhaps this extra 4 bytes is used internally for debugging.

Symbolic Execution (of Cisco IOS)

Symbolic Execution was first described in 1979 in a paper by King.  Java path finder uses symbolic execution it to find bugs in Java programs.  In recent years, Dawson Engler released a paper on ExE, to fing bugs in the Linux Kernel.  Dawn Song’s BitBlaze group also use symbolic Execution to show the capabilities of Windows malware.  In binary applications, catchconv can find signedness bugs in Linux executables using a symbolic execution implementation forked from Valgrind.

Symbolic Execution is a promising technology, that appears more capable than either purely static or purely dynamic approaches to binary analysis in a number of applications – including bug checking and intelligent fuzzing.

So what is Symbolic Execution?

Symbolic Execution treats variables as algebraic symbols capable of representing a range of concrete values.  As the program executes, a list of formula is built using those symbols.  The concret results  of execution at any specific time is the set of solutions that satisfy those built up formulas.

We can determine solutions to queries about the formula that we have built.  If the expression x divide y is parsed, we can ask if y is ever equal to 0.  The answer looks at all the possible values for y, and can tell us if this is the case.  This is much more powerful than fuzz testing, which can only ever determine a divide by zero across a single test case.  in Symbolic Exectutin, more bugs can be identified than just divide by zeros.  Assertions can be resolved at any point in program; assertions such as null pointer dereferences, or even buffer overflows.

When a condition is introduced in a program, a constraint is added to the formula that represents program execution.  An example constrain is x being less than 5, or y not being equal to x.   there There are two paths for a condition, the constraint being true and the constraint being false.   To represent this symbollically, execution splits into two, one set of formula representing the true path, another set of formula representing the false path.  The constraint respective to the condition being followed, is added to the formula.

Forking at branches, allows symbolic execution to systematically explore different paths in the program.  The Bitblaze project uses this, to determine the capabilities of malware by systematically identifying what the malware does along all identified paths.

Symbolically executing a program introduces an exponential number of paths to be analysed proportional to the number of lines of code.  A practical solution to path explosion, is to execute symbolically only the parts of the program we are interested in.

Mixed symbolic execution marks data as either symbolic or concrete.  If an expression is purely concrete, it is executed natively.  If the expression has symbolic components, it is executed symbolically.  BitBlaze, Dawson Engler’s ExE and Catchconv all use mixed symbolic execution and can run on reasonably priced PC’s; you dont need a supercomputer.

The constraint solver used in BitBlaze, Exe and Catchconv is STP (Simple Theorm Prover), written in conjunction with ExE specifically for applications in bug checking.  These types of theorem provers are SMT or Satisfiablity Module Theories.  It is a generalized version of SAT, or Boolean Satisfiablity theories.  Apparently in the past 10 or 15 years much progress has been made in SAT solvers and this is said to account for the recent popularity of it in program verification.

part 2)

I’m pencilled in to speak at Ruxcon this year.  The title of my talk is Code Emulation for Malware Unpacking.  Its a summary of the work I’ve done over the past few months in writing an x86 and win32/linux emulator.  An application of code emulation is automated unpacking of malware.  Code emulation has been a mainstay of commericla AV for the last 15 years.  It is used in unpacking, and in identify polymorphic viruses.  There are no public unpackers based on code emulation to my knowledge.

My work in emulation is good, but not very exciting.

I’m not promising anything, but I am hoping that by the time Ruxcon comes around, I will have some working results in something more exciting.  I’ve started coding only this week, so the clock will be ticking; Symbolic Execution of Cisco IOS.

Dynamips is an opensource Cisco emulator.  Perhaps emulator is not the right word, as it also uses dynamic binary translation (just in time compilation) to achieve speed.  QEMU uses dynamic binary translation also.

I had to learn MIPS assembly.  Its a RISC architecture, and has a really easy to understand instruction set.  It’s much easier than x86.

The input in Cisco IOS to be marked as symbolic, is the receiving of network traffic.  My code is planned to symbolically explore the network stack.

I have to dynamically instrument each instruction that is emulated in dynamips.  Each instruction is executed concretely or symbolically.  If all operands are concrete, the instruction is executed conretely (by dynamips).  If any operands are symbolic, then the instruction generates a set of formula for use by STP, and the result of the instruction is marked as symbolic.  I have written the code for all the general arithmetic and bitwise operations.  STP constraints are generated for each respective instruction.

Memory loads and stores are tricky.  The problem is getting the theroem prover to work on only the data you are interested in.  But for example, when performming a load operation and the address is symbolic, potentially ALL of memory is referenced by the symbolic operation.  I solve this by maintaing a table of all of memory identifing each of the memory contents as being symbolic or concrete.  At loads and stores with symbolic addresses, I solve the points-to set (where the symbolic addresses can point too) using STP, and update my tables and data accordingly.  I hope STP does not take too long when running live.  I have only a couple load and store operations implemented.  There are quite a handful of operations in MIPS, LB, LBU, LW, LWL, LWR, lWU, lD, lDL, and LDR for the loads (if my memory is right), and an equal amount for the stores.

The only problem I have with this, is that while the number of memory addresses I refer too is probably quite small, the size of the array (which is sparse) handled by STP is 4g – all of the MIPS virtual memory.  I am not sure if STP is capable of solving the constraints I generate.  I will have to wait until I have a working implementation before I can say that this approach will work.

At branches, execution will need to fork.  There will also be a need to select which paths to execute.  I haven’t implemented any of this yet.

Booting of IOS takes a couple seconds on my vintage hardware.  Executing it symbolically will probably take even longer.  I made modifications to dynamips to save and restore running VM’s, much like vmware’s suspend and resume.  It’s not perfect, as I dont save state of things like DMA and many other things, but it did work in a test run tonight.  This was saving state when IOS was idling.  Startup time on restore was almost instaneous.

I will start symbolic execution when IOS is fully booted.  But I will need to determine when the run terminates.  I imagine I will need to check that the program counter is back to IOS’s idle loop.  I haven’t implement this part either.

So that’s my plan and where I’m at.  Hopefull this isnt pie in the sky stuff and I’m able to get a working implementation.  At best I hope to find real bugs in the cisco network stack.  At worse I hope to perform intelligent fuzzing of IOS covering code and paths efficiently.  I dare not say worse case is not finishing..

Incidentally, I am thinking of doing a Master’s degree by research next year.  I think that symbolic execution seems to be a good project.