Skip to content
This repository has been archived by the owner on Oct 15, 2023. It is now read-only.

Latest commit

 

History

History

Securecomparator

TeamItalyCTF 2023

[rev] Secure comparator (1 solve)

My Italian friend just discovered a new way to compute securely!! Can you guess what the secret advice is?

Authors: Dario Petrillo <@dp_1>, Giulia Aloia <@daisy>

Setup

The challenge is made of two files, secure_comparator and checker.bin. Running the first without any arguments shows its intended usage:

$ ./secure_comparator
Usage: ./secure_comparator checker.bin

Let's do just that, and it looks like the classic "crackme" flag checker (With a progress bar! That took too much time not to mention it):

SECure COMParator v0.2
>>> flag{test}
[===============================================================]
Skill issue
Halted.

Initial exploration

strings secure_comparator gives some initial insights on what the challenge might be, with references to seccomp and some kind of cpu:

$ strings secure_comparator
[...]
Halted.
main.c
false && "Device not found"
Can't open program file
Reading code failed
Reading data failed
Reading call table failed
PR_SET_NO_NEW_PRIVS
PR_SET_SECCOMP
Usage: %s checker.bin
cpu_get_device

We can't find SECure COMParator v0.2 and the rest of the program output, which are instead in checker.bin:

$ strings checker.bin
SECure COMParator v0.2
>>>
Looks like a flag!
Go and submit it
Skill issue
 I'p
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

At this point you probably tried to look for side channels, and I really hope I didn't leave any. If you did find one, let us know!

Let's jump into IDA, and look at this binary in more detail.

Actual reversing

At a high level, the binary does the following:

  • Load a Seccomp filter, which we will analyze in a bit;
  • Perform some initialization on a pretty bit calloc-ed chunk of memory;
  • Load the checker.bin file, with some nice error messages for us to use;
  • Run sub_16B0 in a loop.

Given the error messages in sub_15C0, we can guess this is some sort of CPU. After renaming things a bit, and defining a Cpu structure, main looks something like this:

__int64 __fastcall main(__int64 argc, char **argv, char **envp)
{
  int v3;
  Cpu *cpu;

  v3 = argc;
  load_seccomp_filter();
  cpu = (Cpu *)calloc(1uLL, 0x8420uLL);
  cpu->running = 1;

  some_pointer_stuff(cpu, off_4BA0);
  some_pointer_stuff(cpu, qword_4BC0);
  some_pointer_stuff(cpu, qword_4B80);

  if ( v3 <= 1 )
  {
    __fprintf_chk(stderr, 1LL, "Usage: %s checker.bin\n", *argv);
    exit(1);
  }
  load_program(cpu, argv[1]);

  while ( cpu->running )
    cpu_step(cpu);

  free(cpu);
  return 0LL;
}

load_program (what was sub_15C0 before) helps in recostructing what the Cpu struct should look like, and at this point it should be something like:

00000000 Cpu             struc ; (sizeof=0x841C, mappedto_8)
00000000 field_0         dq ?
00000008 field_8         dq ?
00000010 field_10        dq ?
00000018 running         db ?
00000019                 db ? ; undefined
0000001A                 db ? ; undefined
0000001B                 db ? ; undefined
0000001C code            db 16384 dup(?)
0000401C data            db 16384 dup(?)
0000801C call_table      db 1024 dup(?)
0000841C Cpu             ends
0000841C

Most of the stuff is there, we are just missing a meaning for the first few bytes.

cpu_step is going to be our next target. It first performs two raw syscalls to SYS_sendto, passing some state from the cpu instance as its arguments. It then combines the errno value it gets from both, and interprets it as some sort of command.

At this point, the start of Cpu should look like this:

00000000 Cpu             struc ; (sizeof=0x841C, mappedto_8)
00000000 field_0         dq ?
00000008 field_8         dw ?
0000000A field_A         dw ?
0000000C field_C         dw ?
0000000E                 db ? ; undefined
0000000F                 db ? ; undefined
00000010 field_10        dq ?
00000018 running         db ?

The first two bits of the value it gets from errno mark the four major operations it handles. "opcode" 00 is the most complex, with a sub-operation encoded in the next three bits, while opcodes 01, 10, 11 simply set one of the three 16-bit variables contained in Cpu (field_8, field_A, and field_C, maybe some registers?).

While it's not clear yet how these errno values are generated, it's worth it to continue reversing this function, and especially the suboperations of opcode 00. Let's see the suboperations:

  • 0 and 1 both call sub_1530, passing it the next three bits of the result, and a value. For 0, the value is the lowest byte of the result, while for 1 it comes from memory. A good guess here is that it is setting some register, either with a constant or loading from some address.

    • Of note is field_8, one of the 16 bit registers that we mentioned before which is only ever used as a memory address, once in operation 1 and once in operation 2. Looks like a Memory address register, let's rename it as such.
    • sub_1530 (and its companion sub_1510) should be setting and getting the value of a register, if our guess is right. Indeed, the first is setting a single byte in field_0, while the latter is getting a specific byte from it:
    __int64 __fastcall sub_1530(Cpu *cpu, char a2, unsigned __int8 a3)
    {
    __int64 result; // rax
    
    result = ((unsigned __int64)a3 << (8 * a2)) | cpu->field_0 & ~(255LL << (8 * a2));
    cpu->field_0 = result;
    return result;
    }
    unsigned __int64 __fastcall sub_1510(Cpu *cpu, char a2)
    {
    return (unsigned __int64)cpu->field_0 >> (8 * a2);
    }

    Looks like this is an 8 bit cpu, with up to 8 registers (but what about the bigger registers we saw before?). Let's rename them and field_0 accordingly.

  • We saw 2 before, it's storing something in memory.

  • 3 is the only one referencing the call_table, and it behaves like a call should: it stores the program counter (field_A) in memory, decrements a stack pointer (field_C), and changes the program counter to some other value. Somehow, calls must be indirect, with the instruction only containing an index into che call table, which stores the actual addresses. This makes sense, as so far the best guess is 8 bit registers, but a 16 bit address bus.

  • 4 and 5 look pretty similar, they both get a pointer from sub_1570 (which, from its error message, is now cpu_get_device), and call a function pointer stored in the resulting struct. (Remember the three structs passed to some_pointer_stuff?)

  • 6 is a nop

  • finally, 7 reads two bytes from memory, sets pc to their concatenation and increases the stack pointer by two. Looks like a ret.

cpu_get_device also shows that the devices are kept in a linked list:

Device *__fastcall cpu_get_device(Cpu *cpu, int a2)
{
  Device *result; // rax

  result = cpu->devices;
  if ( !result )
LABEL_6:
    __assert_fail("false && \"Device not found\"", "main.c", 0x58u, "cpu_get_device");
  while ( result->id != a2 )
  {
    result = result->next;
    if ( !result )
      goto LABEL_6;
  }
  return result;
}

So our structs should now look like this:

00000000 Cpu             struc ; (sizeof=0x841C, mappedto_8)
00000000 registers       dq ?
00000008 mar             dw ?
0000000A pc              dw ?
0000000C stack_pointer   dw ?
0000000E                 db ? ; undefined
0000000F                 db ? ; undefined
00000010 devices         dq ?                    ; offset
00000018 running         db ?
00000019                 db ? ; undefined
0000001A                 db ? ; undefined
0000001B                 db ? ; undefined
0000001C code            dd 4096 dup(?)
0000401C data            db 16384 dup(?)
0000801C call_table      dd 256 dup(?)
0000841C Cpu             ends
0000841C
00000000 ; ---------------------------------------------------------------------------
00000000
00000000 Device          struc ; (sizeof=0x20, mappedto_9)
00000000 next            dq ?                    ; offset
00000008 id              db ?
00000009                 db ? ; undefined
0000000A                 db ? ; undefined
0000000B                 db ? ; undefined
0000000C                 db ? ; undefined
0000000D                 db ? ; undefined
0000000E                 db ? ; undefined
0000000F                 db ? ; undefined
00000010 read_byte       dq ?                    ; offset
00000018 write_byte      dq ?                    ; offset
00000020 Device          ends

with read_byte and write_byte in Device being the two function pointers called by suboperations 4 and 5.

To complete this part, let's look at the devices that get registered at the start. some_pointer_stuff is now register_device, and it links the given device at the start of the linked list:

void __fastcall register_device(Cpu *cpu, Device *dev)
{
  dev->next = cpu->devices;
  cpu->devices = dev;
}

And after some renaming the devices can look like this:

.data:0000000000004B80 ; Device device_42
.data:0000000000004B80 device_42       dq 0                    ; next
.data:0000000000004B80                                         ; DATA XREF: main+46↑o
.data:0000000000004B88                 db 2Ah                  ; id
.data:0000000000004B89                 db 7 dup(0)
.data:0000000000004B90                 dq offset dev_42_read   ; read_byte
.data:0000000000004B98                 dq offset dev_42_write  ; write_byte
.data:0000000000004BA0 ; Device device_128
.data:0000000000004BA0 device_128      dq 0                    ; next
.data:0000000000004BA0                                         ; DATA XREF: main+24↑o
.data:0000000000004BA8                 db 80h                  ; id
.data:0000000000004BA9                 db 7 dup(0)
.data:0000000000004BB0                 dq offset dev_128_read  ; read_byte
.data:0000000000004BB8                 dq 0                    ; write_byte
.data:0000000000004BC0 ; Device device_127
.data:0000000000004BC0 device_127      dq 0                    ; next
.data:0000000000004BC0                                         ; DATA XREF: main+3A↑o
.data:0000000000004BC8                 db 7Fh                  ; id
.data:0000000000004BC9                 db 7 dup(0)
.data:0000000000004BD0                 dq 0                    ; read_byte
.data:0000000000004BD8                 dq offset dev_127_write ; write_byte

x86 is boring

We looked at everything in the binary, except for the seccomp filter. Let's dump it:

$ seccomp-tools dump ./secure_comparator
 line  CODE  JT   JF      K
=================================
 0000: 0x20 0x00 0x00 0x00000004  A = arch
 0001: 0x15 0x00 0x02 0xc000003e  if (A != ARCH_X86_64) goto 0004
 0002: 0x20 0x00 0x00 0x00000000  A = sys_number
 0003: 0x15 0x02 0x01 0x0000002c  if (A == sendto) goto 0006 else goto 000
[...]
 0357: 0x74 0x00 0x00 0x00000008  A >>= 8
 0358: 0x54 0x00 0x00 0x000000ff  A &= 0xff
 0359: 0x44 0x00 0x00 0x00050000  A |= 0x50000
 0360: 0x16 0x00 0x00 0x00000000  return A

It's pretty long, and it definitely doesn't look like your classical syscall filter. It only handles sendto, the same syscall we were using before - that's promising.

Fun fact: this challenge almost didn't happen. See that A |= 0x50000 at the very end? It's effectively doing a return ERRNO(A), but the seccomp-tools assembler doesn't recognize that, and the seccomp documentation doesn't mention it as a possibility either. Very understandable, you usually return some constant value for errno, and not the result of a calculation. But if you have a look at seccomp.c you can see that the returned action is taken from the high 16 bits of the return code, and 0x50000 corresponds to SECCOMP_RET_ERRNO.

The filter is made up by 3 parts: it first decodes the instruction, then "executes" it generating the commands that we reversed before, and at the very end it chooses which byte to send out as errno.

Birds eye view of the filter

Bird's eye view of the filter, as output from my seccomp binaryninja plugin, which will hopefully be free enough of bugs by the time this writeup comes out to be published. It's definitely not needed for this challenge, but was fun to write.

The first section of the filter (up to instruction 0050) functions as an instruction decoder, and parses the opcode (passed as fifth argument) into its fields, stored as mem[0]...mem[5]:

 31      17   14    11    8    0
 | imm14 | rd | rs2 | rs1 | op |
    |imm8|
    23

The rs1 and rs2 filters have a more complex handling, where the filter extracts the corresponding register value and stores that in memory instead of the raw index found in the instruction. This is the rs1 decoder, it takes the registers as the first argument and select the byte indexed by the rs1 field.

 0011: 0x20 0x00 0x00 0x00000030  A = addr # sendto(fd, buff, len, flags, addr, addrlen)
 0012: 0x74 0x00 0x00 0x00000008  A >>= 8
 0013: 0x54 0x00 0x00 0x00000007  A &= 0x7
 0014: 0x24 0x00 0x00 0x00000008  A *= 0x8
 0015: 0x35 0x03 0x00 0x00000020  if (A >= 32) goto 0019
 0016: 0x07 0x00 0x00 0x00000000  X = A
 0017: 0x20 0x00 0x00 0x00000010  A = fd # sendto(fd, buff, len, flags, addr, addrlen)
 0018: 0x05 0x00 0x00 0x00000003  goto 0022
 0019: 0x14 0x00 0x00 0x00000020  A -= 0x20
 0020: 0x07 0x00 0x00 0x00000000  X = A
 0021: 0x20 0x00 0x00 0x00000014  A = fd >> 32 # sendto(fd, buff, len, flags, addr, addrlen)
 0022: 0x7c 0x00 0x00 0x00000000  A >>= X
 0023: 0x54 0x00 0x00 0x000000ff  A &= 0xff
 0024: 0x02 0x00 0x00 0x00000001  mem[1] = A

Then each instruction is handled, and finally the filter chooses one of two bytes to output based on the sixth argument. This is needed because errno can only go up to 4096, while the filter needs to output a full 16 bits.

 0351: 0x20 0x00 0x00 0x00000038  A = addrlen # sendto(fd, buff, len, flags, addr, addrlen)
 0352: 0x15 0x00 0x03 0x00000000  if (A != 0x0) goto 0356
 0353: 0x87 0x00 0x00 0x00000000  A = X
 0354: 0x54 0x00 0x00 0x000000ff  A &= 0xff
 0355: 0x05 0x00 0x00 0x00000003  goto 0359
 0356: 0x87 0x00 0x00 0x00000000  A = X
 0357: 0x74 0x00 0x00 0x00000008  A >>= 8
 0358: 0x54 0x00 0x00 0x000000ff  A &= 0xff
 0359: 0x44 0x00 0x00 0x00050000  A |= 0x50000
 0360: 0x16 0x00 0x00 0x00000000  return A

Mapping the output of each instruction with its interpretation in cpu_step we can decode the instructions as:

opcode name description
1 set rd = imm8
2-8 alu rd = rs1 <alu_operation> rs2
9 load rd = mem[mar]
10 store mem[mar] = rs1
11 in rd = read_byte_from_device(id: imm8)
12 out write_byte_to_device(id: imm8, source: rs1)
13-18 jumps conditional and unconditional jumps, the target address is always imm14
19 load_mar mar = concat(rs1, rs2)
24 call call call_table[imm8]
25 ret well, ret

I only included some of the instructions used in the challenge, see arch.h and filter.bpf in the source for all the details.

x86 is boring seccomp is boring

Time to reverse checker.bin, shall we?

Here, depending on how much info you got from the previous parts, you should be able to write some form of a disassembler. A simple, but working disassembler is here, and its output is here, with some annotations.

I split the assembly code in functions, by seeing which instructions were targets of a call. The entry point is at address 0 (cpu comes from a calloc, so pc == 0 at startup).

output_string and input_string are the shortest and don't call anything else, so reversing should probably start from those (of course, you'd get the names after reversing them). main calls most of the other functions, then based on the value stored at address 0x440 prints one of two strings, so 0x440 most likely marks whether the input is correct. The big remaining functions implement the rest of the crackme: init_something is called before user input, while work_with_input, well, does something with the user input. compare_char looks complex, but the important code is all at the start, where it sets 0x440 to 1 if its two inputs (r4 and r5) differ. The rest of it is a delay loop with code to display the progress bar; we can ignore it.

crypto?

kinda

Let's look at init_something in more detail. Here is an annotated version of it:

init_something:
0090: r0 = 10 # data1 starts at 0x1000
0094: r1 = 7  # data2 starts at 0x700
0098: r2 = 0
009c: r3 = 0
00a0:   mar = r0 || r2
00a4:   r5 = mem[mar]   # r5 = data1[r2]
00a8:   r7 = f
00ac:   r7 = r2 & r7
00b0:   mar = r1 || r7
00b4:   r6 = mem[mar]   # r6 = data2[r2 % 0x0f]
00b8:   r3 = r3 + r5
00bc:   r3 = r3 + r6    # r3 = r3 + r5 + r6
00c0:   mar = r0 || r2
00c4:   r5 = mem[mar]
00c8:   mar = r0 || r3
00cc:   r6 = mem[mar]
00d0:   mar = r0 || r2
00d4:   mem[mar] = r6
00d8:   mar = r0 || r3
00dc:   mem[mar] = r5  # swap data1[r2], data1[r3]
00e0:   r7 = 1
00e4:   r2 = r2 + r7   # r2++
00e8:   r7 = 0
00ec:   if r2 != r7: goto 00a0
00f0: ret

If you recognize this as the RC4 key scheduling, you can probably guess what work_with_input does, and invert it to get the flag. Otherwise, let's keep reversing.

work_with_input:
# main sets r0:r1 to the address of our input before calling this function
00f4: r3 = 10    # data1 starts at 0x1000
00f8: r4 = 0
00fc: r5 = 0
0100:   r7 = 1
0104:   r4 = r4 + r7    # r4++
0108:   mar = r3 || r4
010c:   r6 = mem[mar]
0110:   r5 = r5 + r6    # r5 += data1[r4]
0114:   mar = r3 || r4
0118:   r6 = mem[mar]
011c:   mar = r3 || r5
0120:   r7 = mem[mar]
0124:   mar = r3 || r4
0128:   mem[mar] = r7
012c:   mar = r3 || r5
0130:   mem[mar] = r6   # swap data1[r4], data1[r5]
0134:   r6 = r6 + r7
0138:   mar = r3 || r6
013c:   r6 = mem[mar]   # r6 = data1[data1[r4] + data1[r5]]
0140:   mar = r0 || r1
0144:   r7 = mem[mar]
0148:   r7 = r6 ^ r7
014c:   mem[mar] = r7   # *input++ ^= r6
0150:   r7 = 1
0154:   r1 = r1 + r7
0158:   r2 = r2 - r7
015c:   r7 = 0
0160:   if r2 != r7: goto 0100
0164: ret

This is, once again, RC4. If you recognize it, you can take the key from memory (stored at address 0x700) and decrypt with python:

import sys, random
from Crypto.Cipher import ARC4

rc4_key = bytes([157, 121, 177, 163, 127, 49, 128, 28, 209, 26, 103, 6, 251, 64, 214, 189])
enc_flag = bytes([32, 73, 39, 112, 140, 234, 6, 196, 10, 148, 154, 143, 41, 74, 203, 217, 7, 155, 212, 127, 181, 42, 2, 49, 8, 6, 176, 130, 68, 133, 171, 218, 240, 224, 247, 94, 140, 190, 230, 23, 246, 238, 206, 191, 102, 193, 221, 101, 172, 156, 254, 139, 176, 99, 151, 238, 171, 69, 184, 195, 167, 112, 207, 30])

flag = ARC4.new(rc4_key).decrypt(enc_flag)
print(bytes(flag))

Otherwise, you can see that work_with_input doesn't do anything with our input other than xor-ing it with the bytes it generates, so you can repeat the process and get the flag.