Practical Reverse Engineering Exercise Solutions: Page 79 / Exercise 8
Exercise 8 on page 79 of the book Practical Reverse Engineering specifies the following ARM disassembly of a function called mystery8
:
|
|
The function was compiled in Thumb mode, as we can see from the presence of 16 bit instructions, PUSH
and POP
instructions and Thumb-specific instructions, e.g. CBZ
and instructions with the .W
suffix.
It apparently takes three input arguments in R0
, R1
and R3
. Both R0
and R1
are pointers to a structure from which successive single bytes are read (lines 7, 10, 27 and 29), so we assume type char* for these two arguments (i.e. a string). R2
, on the other hand, is a numerical value being decreased (lines 8, 20) and compared to 0. Thus, we assume it is a signed 32 bit integer value and used for counting.
The function either returns 0 (lines 24 and 25) or performs a subtraction involving R3
and R2
(line 31). Therefore, we assume the return value to also be a 32 bit integer value.
Our provisional function prototype is as follows:
|
|
As mentioned in the description, the variable byteArray is an array of 256 byte values, i.e. byteArray[] = {0,1,..,0xFE, 0xFF}
Due to the multitude of conditionals and load operations, it is in our opinion easier to first decompile it and afterwards analyze in detail what its purpose could potentially be. While the following C program can certainly be optimized, the resemblance to the disassembly could make it easier to understand:
|
|
The function takes two string values and inputs and uses their single characters for indexing into the array byteArray. However, for each possible index in the range 0, 1, … , 0xFF, the value of byteArray[index]
is exactly the index itself. So it is not entirely clear why the byteArray
is actually used. We feel it can be completely omitted for simplification:
|
|
The central loop always compares the same positions of arg1 and arg2 and exits if and only if one of the following conditions is satisfied:
- The counter variable arg3 reaches 0 OR
- We have reached the last character of the first string arg1, i.e. the null character OR
- We have found a character in the strings arg1 and arg2 that do not match.
Before exiting the function, the counter variable is decremented and checked for non-negativity. If the value is negative, we know that either arg3 was originally <= 0 or we have exited the loop because arg3 was decremented until it reached 0. The return value in this case is 0.
If it is not negative, we know the counter variable was not the limiting factor, but rather the input strings did not match at a certain character position or the end of the first string argument was reached. The return value in this case is the difference between the character values of the input strings arg1 and arg2 at the lastly changed position. Thus, the possible return values are:
0
when *arg1 == *arg2 (i.e. the characters are equal)< 0
when *arg1 < * arg2 (i.e. the character of arg1 is numerically smaller than the character of arg2)>0
when *arg1 > *arg2 (i.e. the character of arg1 is numerically greater than the character of arg2)
Lastly, let us consider a few examples
mystery8("abc","abc",5)
returns: 0 (because *arg1 - *arg2 = 0 - 0 = 0)mystery8("abc","abc",2)
returns: 0 (because the counter variable reached the limit)mystery8("abb","aba",5)
returns: 1 (because *arg1 - *arg2 = ‘b’ - ‘a’ = 0x62 - 0x61 = 1)mystery8("abb","aba",2)
returns: 0 (because the counter variable reached the limit)mystery8("aba","abb",5)
returns: -1 (because *arg1 - *arg2 = ‘a’ - ‘b’ = 0x61 - 0x62 = -1)mystery8("aba","abb",2)
returns: 0 (because the counter variable reached the limit)
To conclude, we can summarize the function behaviour in more descriptive words. The function takes two strings and a limiting position variable as inputs and compares the input strings for equality from the first character up to the position imposed by the limit variable. If no difference in the strings is found up to the limiting position, it returns 0. When a difference is found, it returns the numerical difference between the first mismatching character of the input strings.
Lastly, we provide a version of mystery8
with more descriptive names:
|
|