Monthly Archives: October 2007

Kernel Pointer Overflows in Read and Writes

I was looking at some Linux Kernel code today when I came across this piece of code implementing the write operation in a device driver.

if (copy_from_user(writebuf, buffer + writecount, transfer_length) ...

buffer is user controlled.  writecount is derived from the user controlled count parameter that is used the write(…,const char __user *buffer,size_t count,…) operation.

 This type of code of pointer indexing as in buffer + writecount is not uncommon and found in a number of places with the Linux kernel, and presumably many other kernels including NetBSD, OpenBSD, FreeBSD, and perhaps even non Unix Kernels.

 There is an explicit check in the Linux file system layer that validates the count parameter, so that it is not greater than INT_MAX.  The reason for this validation, amongst eliminating a number of kernel bugs when passing large values of count, is that the return value on the read and write operations is of type ssize_t.

ssize_t is a signed integral type and is in the range of INT_MIN and INT_MAX.   Therefore if the return value, which indicates the number of bytes read or written is to fit into size_t, the count parameter must be less than INT_MAX.  The unused bit between INT_MAX and UINT_MAX is used in the return vaue, and indicates the sign.

 Back to the Linux Kernel code..

An interesting thing occurs when we start using buffer + count with large values of count.  It is possible to have pointer overflows and wraparounds.

 In Linux (default build), the userland stack begins at 0xbfffffff and the kernel begins at 0xc0000000. The largest address that a 32 bit pointer can hold is 0xffffffff.  0xffffffff – 0xc0000000 is 0x40000000.  In cases where we use a user supplied count from a read or write operation to index either a kernel buffer or a buffer controlled by userland (at or below 0xbfffffff), count must be less than INT_MAX. 

But 0x40000000 is much less than INT_MAX, and we can have a pointer overflow or wraparound when we use this to index a buffer.

This means that there are number of pointer overflows within the Kernel.  buffer + count can overflow and wraparound when count is greater than 0x40000000.

Is code like this exploitable?

 There is no doubt that code like this is exploitable in the sense that pointer overflows can occur.  But are they exploitable from a malicious attacker’s point of view?  I’m not sure of this, and will have to audit and investigate each case individually.  I suspect that it may be hard to trigger such code without being caught out by memory access concerns.


A good use of goto for error handling

Not all use of goto is considered evil by C programmers. Used appropriately, it can make code easier to read, and increase code performance.

It is often you have a series of mallocs.

static char *a, *b, *c, *d;
void do_initialization(void)

To be robus, each malloc should be checked for failure. The do_initialization should reflect either success or failure. The following code implements that, but introduces more bugs.

int do_initialization()
if ((a = malloc(10)) == NULL) return -1;
if ((b = malloc(10)) == NULL) return -1;
if ((c = malloc(10)) == NULL) return -1;
if ((d = malloc(10)) == NULL) return -1;
return 0;

Memory leaks have been introduced. Lets fix that.

int do_initializae()
if ((a = malloc(10)) == NULL) return -1;
if ((b = malloc(10)) == NULL) {
return -1;
if ((c = malloc(10)) == NULL) {
return -1;
if ((d = malloc(10)) == NULL) {
return 0;

That correctly implements do_initialization, but it appears somewhat redundant and can be hard to read in larger projects.

A good use of goto can make the above code much better.

int do_initialization()
if ((a = malloc(10)) == NULL) return -1;
if ((b = malloc(10)) == NULL) goto err_a;
if ((c = malloc(10)) == NULL) goto err_b;
if ((d = malloc(10)) == NULL) goto err_c;
return 0;
err_c: free(c);
err_b: free(b);
err_a: free(a);
return -1;
The code demonstrated above is very often seen in production code. The opensource kernels and other large C projects, make extensive use of this type of construct.

Linux Kernel *fpos += count signed integer overflow bug

I’ve included an interesting genre of bug I’ve seen in some kernel drivers.  In some cases, it may be exploitable.  However, not everyone would agree that this code represents the bug.

  loff_t pos = *ppos;
         *ppos = pos + nbytes;
         res = nbytes;
         free_page((unsigned long) page);
         return res;

I’ve included smore code than is necessary, but what is the bug?  The bug is that the file pointer ppos can overflow.  No validation is performed on the result of pos + nbytes.  The file pointer is of type loff_t and is a signed long long, so an overflow leads to a negative result which is not valid.  That the file pointer is now in an undefined state, may leave other sections of code that depend on it vulnerable.

Some may argue that these are not bugs at all.   In 2002 no-one in the Linux camp (and neither I at the time; as I was looking for exploitable conditions) cared much for such bugs and were generally ignored.

But are these types of bugs exploitable?  Depends. It requires another area of code to depend on the value of the file pointer for the above code to become useful from an attackers point of view. Alof of kernel code is definately vulnerable if the lseek code in general was allowed to overflow and make the file pointer negative, but there are explicit checks for this in most of the lseek code. 

Most of the time these bugs are not exploitable.   In 2002, I did however find a number of exploitable conditions concerning the use of interesting reads and seeks.  Some of those bugs were in default installs as part of the proc filesystem.

Setting sin_zero to 0 in struct sockaddr_in

I was looking at the Linux DHCP source code today which recently had a security advisory posted on it.   I looked at the patched version – without looking at the patch, and could not find any striking vulnerabilities.

I did however come across code that did something like this.

struct sockaddr_in sin;
sin.sin_family = PF_INET;
sin.sin_port = htons(PORT);
sin.sin_addr.s_addr = addr;

ret = sendto(...(struct sockaddr *)&sin...);

What it didn’t do was clear the sin_zero field in sockaddr_in to zero. This is a bug. I see it occur occasionaly.  This bug can cause undefined behaviour in applications.  In practice, I’ve seen functions such as sendto or recvfrom fail randomly because of lack of clearing sin_zero.  The intermittant nature of the error probably occured becuase I was using the stack for sockaddr_in, and often sockaddr_in was automatically set to zero, but occasionaly was not depending on the application in question.

 Firstly, what is sin_zero, and why should we care about it.

struct sockaddr_in
  sa_family_t    sin_family;   
  in_port_t      sin_port;     
  struct in_addr sin_addr;     

  unsigned char  __pad[__SOCK_SIZE__ - sizeof(short int)
                        - sizeof(unsigned short int) - sizeof(struct in_addr)];
#define sin_zero        __pad

That is from Windows cygwin /usr/include/cygwin/in.h.  The include file to use that is portable is /usr/include/in.h

Most of the net code does not use sockaddr_in, it uses sockaddr.  When you use a function like sendto, you must explicitly cast sockaddr_in, or whatever address your using, to sockaddr.  sockaddr_in is the same size as sockaddr, but internally the sizes are the same because of a slight hack.

 That hack is sin_zero.  Really the length of useful data in sockaddr_in is shorter than sockaddr.  But the difference is padded in sockaddr_in using a small buffer; that buffer is sin_zero. 

On some architectures, it wont cause any problems not clearing sin_zero.  But on other architectures it might.  Its required by specification to clear sin_zero, so you must do this if you intend your code to be bug free for now and in the future.

There are two ways of clearing sin_zero.

memset(&sin, 0, sizeof(sin));
sin.sin_family = ...;
sin.sin_addr = ...;
sin.sin_port = ...;


sin.sin_family = ...;
sin.sin_addr = ...;
sin.sin_port = ...;
memset(&sin.sin_zero, 0, sizeof(sin.sin_zero));

I prefer the later because in my view it performs slightly faster due to the fact that it doesnt set data to zero which then later gets reassigned.
I have seen people use a hardcoded figure of 8 for sizeof(sin.sin_zero). I believe this is also incorrect, and not as portable; for example, if a possible extention of sockaddr occurs in the future.  This probably will never occur, and some might argue that using 8 is ok, but in my opinion it is just nicer style.

Electric fence helps finds buffer overflows

There is a library that implements a replacement malloc library that helps developers find buffer overflows.

 Sometimes buffer overflows dont SEGV.  A buffer overflow only causes a segmentation violation, when it tries to access memory that it doesnt have access too.  The heap occupies a large space of memory that houses any number of buffers.  A buffer could overflow into another buffer on the heap.  But the overflow wouldn’t go past the end of the heap, where it doesnt have access to.

 Electric fence uses a stategy of placing guard pages at the end of buffers it allocates.  Memory is dived into a series of pages, each page having its own acces s rights.  A memory allocation in electric fence is placed so the end of the buffer, is directly at a page boundery.  The next page is marked as non readable or writeable.  If the buffer overflows, it hits this guard page.  The result is a segmentation violation.

Electric fence can also be used to find buffer underuns, where a guard page is placed before the buffer.

Running your software in a debugger, or debugging the core dump will then establish where the buffer overflow occured in source.

 The disadvantage of electric fence, is that it uses alot more memory than normal.  It allocates between 1 or 2 extra pages for every malloc.  And also rounds up all allocations to a page boundary.

 In 2001, a co-worker was using electric fence for the first time on the software we were developing, and discovered that early Linux kernels would not at kernel compile time allocate enough vm_area_struct’s for use at runtime.  The result was that electric fence wasnt able to allocate enough memory, even though it had available physical ram and a large enough swap.  This bug was fixed shortly after.

Function definitions begin on a new line

The below defines a new function.

int f()

A common convention, used in FreeBSD is to define it like this.


At the beginning of my programming career, I used the first definition. Later on, I used the second. The advantage of the second definition, is that you can do this

grep ^f foo.c
or in VIM /^foo

Using regular expressions to find the function definition, as opposed to its use or declaration is very useful. This is the reason why I always use this convention of defining functions.

What to do with bugs? Money or Glory.

I’ve been auditing for the past week now, and have uncovered several vulnerable bugs in the Linux Kernel, and a bug in some popular userland code.  The question now: what to do with them?

 Previously when I’ve discovered a vulnerability, I would contact the developers in control of the software and work with them towards developing a patch.  Working with developers also helps establish a rapport and trust with software makers.  It also helps establish the contributer as exactly that; someone who contributes development time to opensource.  This approach was very successful in 2002 when I last did any auditing.

 Today, things are different.  Having not worked in computing for the past 4 to 5 years, and now having the status of a university student means I am financially worse off than I was in 2002.  $1000 would make a huge impact to my financial status. and the pay money for software vulnerabilities.  IDefese pays from $100 to $1000 for most bugs, and for bugs in OpenSSH or Apache and other critical software, awards are in the tens of thousands of dollars.  The zero day iniative make custom bids on each vulnerability.

 Both companies require a person to submit or disclose the vulnerability on their secure web site.  After evaluation, the vulnerability is given a monetary bid by the company to establish rights to it, or no bid is made.  IDefense/ZDI then works with the vendor to establish a resolution.  The submitter gets credit in any advisories released.

There isnt as much glory in submitting vulnerabilities the monetary way.  I would prefer to work in conjunction with IDefense/ZDI to develop a patch with the software maker.  But I am biased, being 80% developer and 20% security auditer.  Many people would see the advisory credit as 99% of the glory.

The bugs I’ve discovered in the last week are not high impact bugs.  All but the one userland bug I found are local in nature and require specific hardware or configurations.  For all the bugs, priviledge escalation is not the direct result of exploitation.  I am truely skeptical if IDefense/ZDI will make a bid on these bugs.  But they are still vulnerabilities none the less, and if for each bug I was offered $100, I would, financially, be better off.

I encourage people to write comments to this post, in regards to how they think bugs should be handled.  Is IDefense/ZDI popular in the community?  Or is the feeling that direct communication with the developers/vendors should be made?