ELF Binary Code Injection, Loader/'Decrypter'

Finding code caves as a way to inject code into a binary has bee around for a long time, pioneered by virii writers years ago. Similarily, binary protection in the form of encryption has also been around for years, mostly pioneered by Shareware developers as a way to protect their software from being statically analyzed and patched.

Therefore, this article will not demonstrate any new or novel techniques. I simply attempt to show the process which a novice coder/security enthusiast (like myself) can understand at a basic level how this technique works and how to develop your own tools to both reverse engineer existing binary protection schemes and to develop your own.

I am neither a professional developer or reverse engineer. I am sure this paper and the accompanying code is riddled with bugs and mistakes. I'm sure I've taken the long way around in doing certain things due to simple lack of knowledge. Any constructive criticism is greatly appreciated as part of the reason I undertook this project was to simply learn myself and possibly assist others who could potentially benefit from the perspective of a novice such as myself.

Now that the disclaimers are out of the way, without further ado, my attempt to explain my last few days of hacking away at writing my own ELF loader/crypter and inject it into a binary, hope you enjoy!

One of the first things I had to do and a big reason for this project was to get famiiar with the Executable and Linking Format. The first thing I did was to set out to learn how to programmatically determine the intial entry point into a binary. In otherwords, when we run an ELF binary how can I determine where in memory the first instruction would lie.

Luckily, it's fairly simple and straight forward to not only obtain the Entrypoint but to change it to whatever we want. The Entrypoint is available to you in the ELF Header. Under the field e_entry. The ELF Header is the first 64 bytes (on 64bit systems which is the target of this paper) in any ELF binary file. The e_entry member of the header can be found 0x18 bytes into the file. You could easily use standard open/read system calls to read this 8 byte value to obtain the entrypoint but Linux already has built in ELF structures defined for use in the header file /usr/include/elf.h.

The very first code I wrote was to read and modify the entrypoint which you can see below. This code doesn't take much explanation, it's pretty self explanatory. We declare an Elf64_Ehdr structure and pass it to read() to read the structure out of the opened file descriptor. We then set the e_entry member to a new value and write it back. One caveat to remember here is that you'll have to do an lseek() back to the beginning of the file as the read call will move the file offset.

        Elf64_Ehdr ehdr;
        int fd = open(argv[1], O_RDWR);
        read(fd, &ehdr, sizeof(ehdr));
        printf("Entry: %x\n", ehdr.e_entry);
        ehdr.e_entry = 0x41414141;
        lseek(fd, 0, SEEK_SET);
        write(fd, &ehdr, sizeof(ehdr));
        close(fd);

At all facets of development process I was assisted greatly by the readelf and xxd programs which you should have available to you. readelf as its name implies will read an ELF file and output relevant data such as the ELF Header, Program Header, Section Headers etc. It was a good tool to confirm the changes I was making were correct. I used xxd as a way to examine bytes at certain offsets in the binary file I was working with and it was valueable to visually inspect and verify that my changes were correct.

Make a simple hello world program, compile and run the above code against it and note how the entrypoint changes to 0x41414141. Obviously, now when you try and run your hello world program you should get a segmentation fault due to trying to access memory at 0x41414141. So at this point we've changed the execution flow, specifically the starting address of this binary. Now we need to find somewhere in our binary that we can use to write some code and then redirect the entrypoint to this position in the code.

What I did next was try to find space inside the binary. I tried to find a sequence of bytes with a 0 value assuming that this space was not used and I could write my own code without clobberring actual code/data used in the program. These sequence of zeroes/space inside of a binary have been called "code caves" by reversers and virii writers for years.

In order to find these "code caves", I wrote some code to read in the binary file and step through it byte by byte locating all of the "code caves" available for use in the file and then reporting to me the file offset of the largest cave found. This would be the address where I would insert my code, or in virii terminology, the "parasite". The code is too large to include in this article but I have included the relevant function for you to peruse here, disclaimer again that the code is very poorly written with no error checking. I hacked it together over a few hours while undertaking this project, but it seems to work and maybe you can learn from it.

One thing to understand is that the file offset reported from the function above will not be the same address/offset in memory. A File Offset and a Memory Address are different. A File Offset is mapped into memory at an Address. The ELF format provides to you the base address at which your file is going to be mapped into memory so what I did was retrieve the Image Base and then add the File Offset to obtain the address in memory where that file offset would be.

The image Base Address can be located inside of a Program Header. ELF files will have multiple program headers. The first program header in an ELF file is located at ELF64_Ehdr.e_phoff. That is, there is a field in the ELF header that contains an offset into the file where you can read the first Program Header.

The image base address is located in the very first Program Header with a p_type of PT_LOAD. Once you have identified this Program Header you can retrieve the base address from the field p_vaddr. Once we have the base address we can simply add that to our file offset to obtain a virtual address where our code at the file offset will be when loaded into memory. The code below attempts to show you how to retrieve the base address.

long getImageBase(int fd)
{
    Elf64_Ehdr ehdr;
    Elf64_Phdr phdr;

    lseek(fd, 0, SEEK_SET);

    read(fd, &ehdr, sizeof(ehdr)); // read Elf Header
    int numheaders = ehdr.e_phnum; // how many program headers are there?

    lseek(fd, ehdr.e_phoff, SEEK_SET); // set file offset to first program header as provided by elf.e_phoff

    for(int i=0;i<numheaders;i++) // loop through all program headers
    {
        read(fd, &phdr, sizeof(phdr));
        if (phdr.p_type == 1) // check the p_type for 1 which is PT_LOAD
        {
            printf("ImageBase: %x\n", phdr.p_vaddr); // first header that is a PT_LOAD type contains image base
            break;
        }
    }

    return phdr.p_vaddr;
}

So at this point I had a file offset to the beginning of a code cave and I had the image base, adding them together gave me a virtual address when the program was loaded into memory. I now had to simply change the Entrypoint to this address and then I had some room with which I could write some assembly code to do whatever I want. I chose to write some code which would act as a ELF decrypter. What I mean by that is that I first wrote a program to encrypt all of the code in an ELF binary so that anyone trying to examine the code statically with a dissasembler like IDA would only see garbage instructions.

So you wonder to yourself why you would need to include code inside the binary to unencrypt it. Why not just unencrypt it with an external program built to do this? this is because we want the binary to be self contained and able to run without any external utilities. Without the 'loader'or 'unencryption' routines that we inject into the ELF you'd need an external program to unencrypt the binary file before running it (which then leaves it unencrypted on disk). This way the binary is self contained, can still run but is also encrypted on disk.

I use the term encryption loosely; as all I am doing on this first iteration of my projecy is a simple XOR. If you know anything about encryption, you understand this isn't strong and easily broken. I used a simple xor due to simplicity and so that I could perfect the actual process of injectig the loader code into the file and having it actually work before looking at making the encryption methods stronger.

Now we have found space in our file, computed the virual address of where that space is located in memory and changed the entrypoint of our target ELF binary to reflect this address. The next step is to develop the code that we plan to run at this location and then write it into the file at the file offset we found earlier.

I made a decision at this point to only encrypt the .text section of ELF binaries. The .text section will generally hold the executable code of an ELF binary. So other sections such as .rodata containig strings etc will still be unencrypted. I made this choice to simplify the code at this point to ensure everything works as I expect but encrypting other sections is trivial.

In order to encrypt just the .text section (again, the section that contains all of the ELF binary code) we have to first obtain the file offset of this section on disk. That is, where the actual bytes for this section lie in the file. We also need to know how big this section is so that we don't encrypt anything outside of just the data in the .text section.

Once again we can retrieve all of the necessary information from the various structures provided by elf.h to describe the ELF format. The first thing we have to do is find out where our section headers exist in the file. The ELF header has a field called e_shoff which contains the offset of the first section. The ELF header also contains several other useful fields for determining and reading in the .text section which are e_shnum which is the number of sections in this ELF binary and the e_shentsize which contains the size of each section.

Finding a certain section takes a little bit more work as there isn't a field which contains the name of the section we're looking at, instead each section header has an sh_name field which itself is an offset into the .strstab section which contains the name of the section. The strtab section can be found by reading in the value of e_shstrndx which is located in the ELF Header. Below is code on how to find a certain section, specifically the .text section.

First we must get the strtab index retrieved from the ELF Header

int get_str_table(int fd, int *size)
{
    Elf64_Ehdr ehdr;
    Elf64_Shdr shdr;

    lseek(fd, 0, SEEK_SET); // set file descriptor to beginning of file to read in ELF Header
    read(fd, &ehdr, sizeof(ehdr));
    lseek(fd, (ehdr.e_shoff + (ehdr.e_shentsize * ehdr.e_shstrndx)), SEEK_SET); // move to FIRST_SECTION_OFFSET + (SIZE OF SECTIONS * INDEX_WHERE_STRTAB_SECTION_IS)
    read(fd, &shdr, sizeof(shdr)); // read strtab section header

    *size = shdr.sh_size; // return the size of the strtab section, we need to know the size so we can read it all in and index into it.
    return shdr.sh_offset; // return offset of where the strtab section is located.

And once we have the offset of where our strtab section is in file, we can use it to read in section names.

int convert_section_names(elf_sections_t ***pelf_sections, int fd)
{
    Elf64_Ehdr ehdr;
    Elf64_Shdr shdr;

    int current_offset = 0;
    int size;

    lseek(fd, get_str_table(fd, &size), SEEK_SET); // as seen above, retrieve offset of where strtab section is
    void *mem = malloc(sizeof(char) * size); // allocate eonough memory to read in all of the strings in the table
    read(fd, mem, size);
    lseek(fd, 0, SEEK_SET); // set file descriptor to start of file to read in elf header

    read(fd, &ehdr, sizeof(ehdr));
    lseek(fd, ehdr.e_shoff, SEEK_SET); // start processing the sections at the very first section as provided by e_shoff
    int i = 0;

    while(i < ehdr.e_shnum) // read up to shnum, all sections
    {
        read(fd, &shdr, sizeof(shdr)); // read in section header
        (*pelf_sections)[i]->pname = malloc(sizeof(char) * strlen(mem + shdr.sh_name)); // allocate enough memory to hold the string which is at strtab(mem) + sh_name(offset into it)
        strcpy( (*pelf_sections)[i]->pname, (mem+shdr.sh_name) ); // and copy it into a structure holding data about this section
        i++;
    }

}

This function will essentially loop through every section header and then using the sh_name field as an index into the strtab section which contains the name of that section. We then copy the section into a structure with other relevant data for that section to be used later

Now that we have all of the section names in the pelf_setions array of structures we also need to read in other fields of the section, specifically we will want to know the file offset (where on disk this section starts) and the size of the section. The code below will do this.

int process_elf_sections(elf_sections_t ***pelf_sections, int fd)
{
    Elf64_Ehdr ehdr;
    Elf64_Shdr shdr;
    int i = 0;

    lseek(fd, 0, SEEK_SET); // start of file to read header

    read(fd, &ehdr, sizeof(ehdr));

    *pelf_sections = malloc( ( sizeof(elf_sections_t *) * ehdr.e_shnum ) ); // allocate pointers for all of the sections
    lseek(fd, ehdr.e_shoff, SEEK_SET);

    while (i < ehdr.e_shnum)
    {
       (*pelf_sections)[i] = malloc(sizeof(elf_sections_t)); // allocate memory to store the section data
        read(fd, &shdr, sizeof(Elf64_Shdr)); // read the next section
        (*pelf_sections)[i]->file_offset = shdr.sh_offset; // store file offset
        (*pelf_sections)[i]->size = shdr.sh_size; // file size
        (*pelf_sections)[i]->flags = shdr.sh_flags; // fags
        (*pelf_sections)[i]->vaddr = shdr.sh_addr; // virtual address
        i++;
    }

    return ehdr.e_shnum; 
}

You'll probably notice the above two functions could be combined into one, they are in my code but I felt like it was easier to explain the way to retrieve a section name if I seperated them in two.

Now we simply loop through all of the available sections which are contained in our pelf_sections array and compare each pname with the section we want to get data for, which in this case is .text, like so:

for(int i=0;i<size;i++)
        if (!strcmp(pelf_sections[i]->pname, ".text"))
        {
            text_section_addr = pelf_sections[i]->vaddr;
            text_section_size = pelf_sections[i]->size;
            text_section_offset = pelf_sections[i]->file_offset;
	
        }
So the above code will search for the .text section and when located it will return to us the virtual address, file offset and the size.

The next thing I did was to actually encrypt the .text section. Armed with the file offset and the size of the section it was trivial to simply open() the target binary, seek to the file offset and then read in each byte until I had read in the total size bytes of the section. I did a simple xor with a constant value on each byte to transform/encrypt the section. Due to the length of this code I have chosen to omit it but it's pretty straight forward and I will include all sources in the near future when I clean them up if anyone is interested.

With the .text encrypted the binary will no longer run. This may be the desired behavior as described earlier we could simply encrypt the binary and then use an external program to decrypt it when we want to run it. However, the point of this exercise is to inject code into the binary which will act as a "loader" and decrypt the binary on the fly; allowing it to be run yet remain encrypted on the disk.

We now had an encrypted binary, a file offset containing space for us to write new code to and the virtual address of said file offset. I now changed the entrypoint to point into this free space. Now when running the program it will start execution at the empty space we located, the code cave. I now set about developing some code that could unencrypt our .text segment.

First some pseudo code with what I needed it to do:

 call mprotect() to allow us to write to .text segment
 load address of .text segment
 get byte from address of .text
 xor byte with a value
 inc address of .text segment to get next byte
 cmp address to size of .text 
 jl back to get byte
 jmp to ORIGINAL entrypoint as all should be unencrypted now

The above code has to be written in assembly, translated byte by byte to oopcodes and then we can write() it into our code cave.

So the first thing to do is call mprotect to change the permissions of the memory page allocated to our .text section. I was stuck on this for hours before realizing the reason my code was segfaulting was due to a permission issue. Now, in order to call mprotect() we have to actually call the syscall directly, we have no access to libc, so what I did was write it out in assembly like below:

int mprotect(void *addr, size_t len, int prot);

System Call Information like the system call number, registers used etc can be found here

 __asm__("mov $0x400400, %rdi\n\t" // address in %rdi. I hard coded address to start so i could see what it looked like when I dumped the bytes of the instruction
            "mov $0x182, %rsi\n\t" // size in %rsi
            "mov $0x07, %rdx\n\t" // permissions in %rdx (read/write/exec)
            "mov $0xA, %rax\n\t" //0xA is the # for the protect syscall
            "syscall");        // make the call

In order to write the above to the target binary I had to actually translate each instruction into bytes which I did with the help of gdb. I think it's better explained in my video which you can find here. However, as an example if you compile the above assembly using GCC (inline assembly) you can then gdb the resulting file and do a "disas main" which will dissamble your main function and you should see your code to call mprotect syscall. Now for each line you'll want to eXamine the bytes with: x/?bx 0x???????? replacing the first question mark with the number of bytes and the proceeding question marks with the address to dissamble at, demonstrated below for the first instruction:

(gdb) disas main
Dump of assembler code for function main:
   0x00000000004004a6 <0>:     push   %rbp
   0x00000000004004a7 <+1>:     mov    %rsp,%rbp
   0x00000000004004aa <+4>:     mov    $0x400400,%rdi
   0x00000000004004b1 <+11>:    mov    $0x182,%rsi
   0x00000000004004b8 <+18>:    mov    $0x7,%rdx
   0x00000000004004bf <+25>:    mov    $0xa,%rax
   0x00000000004004c6 <+32>:    syscall 
(gdb) x/7bx 0x4004aa
0x4004aa <main+4>:      0x48    0xc7    0xc7    0x00    0x04    0x40    0x00
(gdb) 
So the returned bytes 0x48, 0xc7, 0x00, 0x04, 0x40, 0x00 are the bytes you'd have to write into your target to get the "mov $0x400400, %rdi" instruction. Take note that we get the length to examine by looking at the differences between each instruction which gdb shows us as (+4 to +11 = 7 bytes).

Our address wont always be 0x400400 so we have to take a variable ie: (text_section_addr) and somehow write that as the src of our mov operation. Now that we can see the bytes which make up the move instruction we can see that the last 4 bytes are the address so we will always write 0x48,0xC7,0xC7 for the first three bytes and use a bit of C to write out the address as such:

    /* mov address, %rdi */

    mprotect_call[0] = 0x48; // constant as described above
    mprotect_call[1] = 0xc7; //and byte two
    mprotect_call[2] = 0xc7; // and three
    /* the next four bytes contain the address, which is dynamic and not static */
    mprotect_call[3] = mprotect_addr & mask;
    mprotect_addr >>=8;
    mprotect_call[4] = mprotect_addr & mask;
    mprotect_addr >>=8;
    mprotect_call[5] = mprotect_addr & mask;
    mprotect_addr >>=8;
    mprotect_call[6] = mprotect_addr & mask;
Essentially we're building a buffer to write out to our file. You can see how we build the address dynamically using the mprotect_addr by simply masking off the low byte and then shift the address right by 8 bits. to get the next byte in the address. You will have to read up on bitwise operators and their use and effect to understand the above. Simply put we're stuffing a 4 byte value into a character array in order.

Now the above will encoded the first instruction (the mov instruction) but we must repeat the process for the next 3 mov instructions and the syscall instruction, ending up with a buffer that looks like this:

    /* mov size, %rsi */
    mprotect_call[7] = 0x48;
    mprotect_call[8] = 0xc7;
    mprotect_call[9] = 0xc6; // this byte changes due to using a different register to copy the value to

    /* size is dynamic so again we have to take it as a variable and then read in each byte into our array */
    mprotect_call[10] = mprotect_size & mask;
    mprotect_size >>= 8;
    mprotect_call[11] = mprotect_size & mask;
    mprotect_size >>= 8;
    mprotect_call[12] = mprotect_size & mask;
    mprotect_size >>= 8;
    mprotect_call[13] = mprotect_size & mask;

    /* mov access_rights, %rdx */
    mprotect_call[14] = 0x48;
    mprotect_call[15] = 0xc7;
    mprotect_call[16] = 0xc2; // again, different register
    mprotect_call[17] = 0x07; // this is a constat, so no bit manipulation required
    mprotect_call[18] = 0x00;
    mprotect_call[19] = 0x00;
    mprotect_call[20] = 0x00;

    /* mov syscall_number, %rax */
    mprotect_call[21] = 0x48;
    mprotect_call[22] = 0xc7;
    mprotect_call[23] = 0xc0; // different register
    mprotect_call[24] = 0x0A; // constant, syscall number will always be 0xA for mprotect
    mprotect_call[25] = 0x00;
    mprotect_call[26] = 0x00;
    mprotect_call[27] = 0x00;
	
    /* syscall */
    mprotect_call[28] = 0x0f; // syscall
    mprotect_call[29] = 0x05;

    write(fd, &mprotect_call, 30);


Now as long as we made sure that our file descriptor offset was placed at the beginning of our code cave when we change our entrypoint to point into the code cave it will execute our mprotect syscall and give us permission to write to the address we pass to it (our .text vaddr). So at this point we've actually rerouted control and executed code in a binary. All thats left to do is translate the rest of our pseudo code to assembly and then translate each instruction to bytes to write to our file. The rest of my assembly looked like this:

         __asm__("xorl %edx, %edx\n\t" // set edx = 0
                "leal ADDR, %eax\n\t" // lea ADDR into eax
                "1:\n\t" // temporary able to JUMP to (later we find out how many bytes away it is and hardcode the byte value)
                "xorb  $0xC, (%eax, %edx)\n\t" // xor byte pointed to by eax + edx bytes (essetially loop through what eax points to, .text section)
                "addl $1, %edx\n\t" // add 1 to our counter
                "cmp SIZE, %edx\n\t" // compare to see if we've processed the whole section
	        "jne 1b" // jmp to label, we find this offset in GDB, represented in number of bytes to jump

Translating the first two instructions (the xor and leal instructions):

    char lea_addr[7];
    lea_addr[0] = 0x8d; // obtained in gdb with eXamine
    lea_addr[1] = 0x04;
    lea_addr[2] = 0x25;
   
    /* addr is an argument passed in, we do the same thing we did with mprotect to stuff it into our array */
    lea_addr[3] = addr & mask;
    addr >>= 8;
    lea_addr[4] = addr & mask;
    addr >>= 8;
    lea_addr[5] = addr & mask;
    addr >>= 8;
    lea_addr[6] = addr & mask;

    char xor_edx[] = {0x31, 0xd2}; // obtained from gdb eXamine
    write(fd, &xor_edx, 2); // write out xor instruction
    write(fd, &lea_addr, 7); // followed by leal

After this we continue on and do the same for the rest of our instructions. In order to find the address for the conditional jump (jne) you simply observe how many bytes forward(positive number) or how many backwards(negative) the address you wish to jump to is and represent it as 2 bytes: 0x75 0x?? where ?? is the number of bytes to jump.

To finish off our code we have to jmp to the original entrypoint once we're done unencrypting our .text section. So when our jne fails it will land on an unonditional JMP. One last thing which confused me was that if we don't save %edx value (push %edx) and then retrieve it right before we jump to the original entrypoint with (pop %edx) your program will crash after running. I suspect %edx contains some kind of environment variable required by _libcstart/exit routines. So the last of the assembly code will look like this:

    __asm__("mov ORIGINAL_ENTRYPOINT, %rax\n\t" // move address into %rax
            "pop %rdx"); // restore %edx
            "jmp %rax\n\t" // jump to location %rax
And converted to bytes, something like this:
char jne[] = {0x75, 0xe9}; //75 is jne 0xe9 bytes away
    write(fd, &jne, 2);

//    char mov_jmp[] = {0x48, 0xc7, 0xc3, 0x00, 0x04, 0x40, 0x00};
    mov_jmp[0] = 0x48;
    mov_jmp[1] = 0xc7;
    mov_jmp[2] = 0xc0;
    mov_jmp[3] = original_ep_addr & mask;
    original_ep_addr >>= 8;
    mov_jmp[4] = original_ep_addr & mask;
    original_ep_addr >>= 8;
    mov_jmp[5] = original_ep_addr & mask;
    original_ep_addr >>= 8;
    mov_jmp[6] = original_ep_addr & mask;

    write(fd, &mov_jmp, 7);
    char pop_edx = 0x5A;
    write(fd, &pop_edx, 1); // pop rdx
    char jmp[] = {0xff, 0xe0};
    write(fd, &jmp, 2);

So to summarize the following is done: 1. We encrypted our binaries code (.text) section on disk. 2. We inject a parasite into a code cave we locate. 3. We change the entrypoint to our code cave location. 4. We write code into this location to unencrypt our .text on the fly, essentially a loader. 5. We jmp back into the original entry point when our code is finished being unencrypted.

I realize this was a long read and it's riddled with mistakes both technical and non-technical(editing, spelling, grammar) ad I apologize for that but I hope maybe someone was able to learn something.

For a demonstration view the video on YouTube