Using angr and symbolic execution for reverse engineering challenges (RPI MBE Labs)

2018-02-21 1169 words 6 minutes

Contents

This blog posts will highlight how you can utilize the angr dynamic binary analysis framework and symbolic execution for reverse engineering tasks.

More precisely, we will look at the first two tasks in the lab1 of the RPISEC MBE labs.

While angr’s internals are quite complex and require substantial effort for mastering, getting started for our simple examples requires not too much knowledge.

lab1C

The first example we will look at is lab1C from lab01, which requires the user to enter a certain password:

1
2
3
4
5
6
7
8
./lab1C
-----------------------------
--- RPISEC - CrackMe v1.0 ---
-----------------------------

Password: bluab

Invalid Password!!!

Disassembly

When inspecting the program’s disassembly, we see the system() function is initialized and called from address 0x08048711 onwards:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Disassembly of lab1C:

   0x080486ad <+0>: push   ebp
   0x080486ae <+1>: mov    ebp,esp
   0x080486b0 <+3>: and    esp,0xfffffff0
   0x080486b3 <+6>: sub    esp,0x20
   0x080486b6 <+9>: mov    DWORD PTR [esp],0x80487d0
   0x080486bd <+16>: call   0x8048560 <puts@plt>
   0x080486c2 <+21>: mov    DWORD PTR [esp],0x80487ee
   0x080486c9 <+28>: call   0x8048560 <puts@plt>
   0x080486ce <+33>: mov    DWORD PTR [esp],0x80487d0
   0x080486d5 <+40>: call   0x8048560 <puts@plt>
   0x080486da <+45>: mov    DWORD PTR [esp],0x804880c
   0x080486e1 <+52>: call   0x8048550 <printf@plt>
   0x080486e6 <+57>: lea    eax,[esp+0x1c]
   0x080486ea <+61>: mov    DWORD PTR [esp+0x4],eax
   0x080486ee <+65>: mov    DWORD PTR [esp],0x8048818
   0x080486f5 <+72>: call   0x80485a0 <__isoc99_scanf@plt>
   0x080486fa <+77>: mov    eax,DWORD PTR [esp+0x1c]
   0x080486fe <+81>: cmp    eax,0x149a
   0x08048703 <+86>: jne    0x8048724 <main+119>
   0x08048705 <+88>: mov    DWORD PTR [esp],0x804881b
   0x0804870c <+95>: call   0x8048560 <puts@plt>
   0x08048711 <+100>: mov    DWORD PTR [esp],0x804882b
   0x08048718 <+107>: call   0x8048570 <system@plt>
   0x0804871d <+112>: mov    eax,0x0
   0x08048722 <+117>: jmp    0x8048735 <main+136>
   0x08048724 <+119>: mov    DWORD PTR [esp],0x8048833
   0x0804872b <+126>: call   0x8048560 <puts@plt>
   0x08048730 <+131>: mov    eax,0x1
   0x08048735 <+136>: leave  
   0x08048736 <+137>: ret    

angr Script

Without looking further at the program logic, we have enough information to create a little script that will invoke angr and let us help with the challenge:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import angr

# generate angr project
p = angr.Project('./lab1C')


# get a generic representation of the possible program states at the program's entry point
state = p.factory.entry_state()


# get a SimulationManager to handle the flow of the symbolic execution engine
sm = p.factory.simulation_manager(state)


sm.explore(find=0x08048711)


if len(sm.found) <= 0:
	exit("Could not find an input to reach the target address!")
else:
	targetstate = sm.found[0]
	print("We found a satisfying input: {}".format(targetstate.posix.dumps(0).strip("\n")))
	

The main part of angr that is relevant to us is the SimulationManager object that guides the symbolic execution engine. We specify that we want to find an execution that reaches address 0x08048711 and start the symbolic execution of the program. After an execution has reached the address, we are interested in the input that led to the satisfying execution, which we can retrieve by specifying the file descriptor of stdin, which is 0.

Within a few seconds, the following output is generated:

1
2
3
4
python solve-lab1C.py
WARNING | 2018-02-21 13:12:01,239 | angr.analyses.disassembly_utils | Your verison of capstone does not support MIPS instruction groups.
WARNING | 2018-02-21 13:12:02,652 | angr.state_plugins.symbolic_memory | Concretizing symbolic length. Much sad; think about implementing.
We found a satisfying input: +0000005274

lab1B

While the program lab1C just compares the input to a hard-coded value, lab1B is a little bit more complicated. For the user it looks the same as lab1B, as a password has to be provided:

1
2
3
4
5
6
7
8
./lab1B 
.---------------------------.
|-- RPISEC - CrackMe v2.0 --|
'---------------------------'

Password: asas

Invalid Password!

Disassembly

Again, we first have a look at its disassembly, in particular the decrypt function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
Dump of assembler code for function decrypt:
   0x080489b7 <+0>: push   ebp
   0x080489b8 <+1>: mov    ebp,esp
   0x080489ba <+3>: sub    esp,0x38
   0x080489bd <+6>: mov    eax,gs:0x14
   0x080489c3 <+12>: mov    DWORD PTR [ebp-0xc],eax
   0x080489c6 <+15>: xor    eax,eax
   0x080489c8 <+17>: mov    DWORD PTR [ebp-0x1d],0x757c7d51
   0x080489cf <+24>: mov    DWORD PTR [ebp-0x19],0x67667360
   0x080489d6 <+31>: mov    DWORD PTR [ebp-0x15],0x7b66737e
   0x080489dd <+38>: mov    DWORD PTR [ebp-0x11],0x33617c7d
   0x080489e4 <+45>: mov    BYTE PTR [ebp-0xd],0x0
   0x080489e8 <+49>: push   eax
   0x080489e9 <+50>: xor    eax,eax
   0x080489eb <+52>: je     0x80489f0 <decrypt+57>
   0x080489ed <+54>: add    esp,0x4
   0x080489f0 <+57>: pop    eax
   0x080489f1 <+58>: lea    eax,[ebp-0x1d]
   0x080489f4 <+61>: mov    DWORD PTR [esp],eax
   0x080489f7 <+64>: call   0x8048810 <strlen@plt>
   0x080489fc <+69>: mov    DWORD PTR [ebp-0x24],eax
   0x080489ff <+72>: mov    DWORD PTR [ebp-0x28],0x0
   0x08048a06 <+79>: jmp    0x8048a28 <decrypt+113>
   0x08048a08 <+81>: lea    edx,[ebp-0x1d]
   0x08048a0b <+84>: mov    eax,DWORD PTR [ebp-0x28]
   0x08048a0e <+87>: add    eax,edx
   0x08048a10 <+89>: movzx  eax,BYTE PTR [eax]
   0x08048a13 <+92>: mov    edx,eax
   0x08048a15 <+94>: mov    eax,DWORD PTR [ebp+0x8]
   0x08048a18 <+97>: xor    eax,edx
   0x08048a1a <+99>: lea    ecx,[ebp-0x1d]
   0x08048a1d <+102>: mov    edx,DWORD PTR [ebp-0x28]
   0x08048a20 <+105>: add    edx,ecx
   0x08048a22 <+107>: mov    BYTE PTR [edx],al
   0x08048a24 <+109>: add    DWORD PTR [ebp-0x28],0x1
   0x08048a28 <+113>: mov    eax,DWORD PTR [ebp-0x28]
   0x08048a2b <+116>: cmp    eax,DWORD PTR [ebp-0x24]
   0x08048a2e <+119>: jb     0x8048a08 <decrypt+81>
   0x08048a30 <+121>: mov    DWORD PTR [esp+0x4],0x8048d03
   0x08048a38 <+129>: lea    eax,[ebp-0x1d]
   0x08048a3b <+132>: mov    DWORD PTR [esp],eax
   0x08048a3e <+135>: call   0x8048770 <strcmp@plt>
   0x08048a43 <+140>: test   eax,eax
   0x08048a45 <+142>: jne    0x8048a55 <decrypt+158>
   0x08048a47 <+144>: mov    DWORD PTR [esp],0x8048d14
   0x08048a4e <+151>: call   0x80487e0 <system@plt>
   0x08048a53 <+156>: jmp    0x8048a61 <decrypt+170>
   0x08048a55 <+158>: mov    DWORD PTR [esp],0x8048d1c
   0x08048a5c <+165>: call   0x80487d0 <puts@plt>
   0x08048a61 <+170>: mov    eax,DWORD PTR [ebp-0xc]
   0x08048a64 <+173>: xor    eax,DWORD PTR gs:0x14
   0x08048a6b <+180>: je     0x8048a72 <decrypt+187>
   0x08048a6d <+182>: call   0x80487c0 <__stack_chk_fail@plt>
   0x08048a72 <+187>: leave  
   0x08048a73 <+188>: ret    
End of assembler dump.

The goal of the program is here likewise the call of the system() function with a specific argument, starting from address 0x08048a47. The solving-script is thus almost identical to the previous example:

angr Script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import angr

# generate angr project
p = angr.Project('./lab1B')


# get a generic representation of the possible program states at the program's entry point
state = p.factory.entry_state()


# get a SimulationManager to handle the flow of the symbolic execution engine
sm = p.factory.simulation_manager(state)


sm.explore(find=0x08048a47)


if len(sm.found) <= 0:
	exit("Could not find an input to reach the target address!")
else:
	targetstate = sm.found[0]
	print("We found a satisfying input: {}".format(targetstate.posix.dumps(0).strip("\n")))

Running, however, requires more time due to the exploration of several if-conditions and checking their satisfiability:

1
2
3
4
python solve-lab1B.py 
WARNING | 2018-02-21 12:35:23,576 | angr.analyses.disassembly_utils | Your verison of capstone does not support MIPS instruction groups.
WARNING | 2018-02-21 12:35:25,180 | angr.state_plugins.symbolic_memory | Concretizing symbolic length. Much sad; think about implementing.
We found a satisfying input: +0322424827Z

Further examples that showcase applying angr to challenges of this kind are available on the Github repository of the angr developers.