read() buffer overflows

When I was auditing some code recently, I was trying to find ‘entry points’ into the code I was auditing.  I don’t know if anyone uses this terminology, but I’ll give it a shot.

 Basically, I didn’t wabnt to read all the code, or understand too many details.  So an entry point is just something where bugs around it are likely to lead to exploitation.  Examples in userland are malloc, examples in kernel are kmalloc, copy_from_user and copy_to_user.  It’s the same idea as grepping for strcpy.

 I had this idea that if I could find an integer overflow in the size parameter of a read system call, I might be able to copy a large amount into the destination buffer of the read call.  Its a genuine source of bugs, and I found some instances of it.  But alas, in the instance I found, after writing an exploit, read returned an error, and did not overflow my buffer.  The size parameter I was using was very large.  I was hoping it would copy the rest of the file in question in use by the read call.

I think perhaps on some other or older operating systems, this could potentially be exploitable.  I gave up and didnt investigate any furthur on why it didn’t work.  I’ll have to continue this in the future.


4 responses to “read() buffer overflows

  1. Modern Linux read() system calls will check to see if the address as well as the address+len are both mapped…

  2. hello jduck,
    interger overflow and buffer overflow (not checking buffer length when copying data) are two
    different types of bugs.

    so you may be talking about different bugs from
    the integer overflow.

  3. Ciao Silvio, non ci conosciamo, ho letto il tuo paper “Unix ELF parasites and virus” (cavolo del ’98, ne è passato di tempo ^^) e lo stavo studiando … prima di tutto devo farti davvero i miei complimenti per quello e per tutti gli altri tuoi lavori sparsi in giro per la rete, di rado capita di trovare materiale così interessante e ben scritto quando si parlà, aimè, di noi italiani .

    Volevo farti una semplice domanda sul codice allegato a quel paper, allora il prototipo della funzione ‘infect_elf’ del sorgente ‘infect-elf-p.c’ è il seguente :

    infect_elf(char *filename, char *v, int len, int he, int e)

    e se ho capito bene :

    filename : nome del file da infettare
    v : puntatore al codice parassita da inserire
    len : lunghezza del suddetto codide parassita
    he : offset del codice parassita al quale si trova l’entry point originale del programma hoster (che poi viene patchato nel codice per rimpiazzarlo in base all’hoster)
    e : !??!?!?!

    che diavolo è ‘e’ ? sono ore che sto impazzendo per capire che diamine è 😛

    vedo che l’unico punto del codice in cui viene usato è :

    ehdr.e_entry = evaddr + e;

    nel loop dove aggiorni i program segment headers dell elf .

    Poco prima nel codice vedo questo :

    “Parasite length: %i, ”
    “Host entry point index: %i, ”
    “Entry point offset: %i”
    len, he, e
    printf(“Host entry point: 0x%x\n”, ehdr.e_entry);
    *(int *)&v[he] = ehdr.e_entry;

    qui indichi ‘he’ come “host entry point index” e ‘e’ come “entry point offset” … non capisco la differenza che intendi tra offset e index … anche perchè nel sorgente ‘parasite-v.c’ tra i due valori ci sono ben 605 byte di differenza, quindi evidentemente hanno un significato molto diverso .

    Saresti così gentile da illuminarmi in proposito ?

    Grazie mille .

  4. Dammit.. why didn’t I know about your blog earlier.

    We used read() as the canonical endpoint example of the “bad thing” of what can happen when u get an over-large length.

    A did a quick survey of the kernels though, and the upper-half system call code looks for reads higher than max signed int IIRC. I made a list at one point–i think older linux was vuln. jduck might be right – (prolly is since he stated it with authority) It could check for contiguous mapping.. i don’t remember exactly. that might be gameable though as address + MAXUINT is just subtracting 1. If I wasn’t so lazy, I’d just go look.

    Basically, it screwed a couple of our examples at a point that it was too late to fix b/c they were in copy-edit, so we just slapped a big Disclaimer box around it explaining it.

    Anyway, I can go find the list of kernels and their behaviors but you can prolly recreate the search before I can find it on my HD.

    We ended up using snprintf’s length arg as our synthetic win case I think.. strncat might be ok, but i believe strncpy was fail because some implementations zero out the entire buffer before copying.

    It’s always fun to wildly speculate about specific technical facts that can be checked easily using my horrible memory as I end up making 15 mistakes that ppl can own me on. ;> Keeps me humble tho. :p

    Anyway, as far as your entry point nomenclature, yeah, I think we have similar words though we use it in a slightly different context. it’s funny to me that basically, if you do the work, and you’re smart, independent ppl end up at the same ideas even though they call them different things.

    It’s the dumb people that just a priori figure out how things should be in some platonic version of the real world of code, make a formal taxonomy, and assign them all names and create complicated causalities and logical relationships between completely fictional ideas. I think we call those people PHDs. ;>

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s