Practical Reverse Engineering Exercise Solutions: Page 35 / Exercise 9

soffensive included in Practical Reverse Engineering

2017-09-15 728 words 4 minutes

Contents

Our task:

Sample L. Explain what function sub_1000CEA0 does and then decompile it back to C.

Here we have the function’s disassembly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
                push    ebp
                mov     ebp, esp
                push    edi
                mov     edi, [ebp+8]
                xor     eax, eax
                or      ecx, 0FFFFFFFFh
                repne scasb
                add     ecx, 1
                neg     ecx
                sub     edi, 1
                mov     al, [ebp+0Ch]
                std
                repne scasb
                add     edi, 1
                cmp     [edi], al
                jz      short loc_1000CEC7
                xor     eax, eax
                jmp     short loc_1000CEC9

loc_1000CEC7:                       
                mov     eax, edi

loc_1000CEC9:              
                cld
                pop     edi
                leave
                retn
  endp

Firstly, the function takes two arguments, at ebp+0x8 (arg1) and ebp+0x0C (arg2) respectively. It follows the stdcall convention that arguments are pushed from right to left on the stack and the callee cleaning up the stack.

Presumably, arg1 is a string as the function makes use of the scasb function.

As I keep constantly forgetting about the meaning of scasb in conjunction with repne, here is an excellent refresher from stackoverflow, which I shamelessly copy/paste (https://reverseengineering.stackexchange.com/questions/2774/what-does-the-assembly-instruction-repne-scas-byte-ptr-esedi, thanks peter ferrie):

The SCAS instruction is used to scan a string (SCAS = SCan A String). It compares the content of the accumulator (AL, AX, or EAX) against the current value pointed at by ES:[EDI].
When used together with the REPNE prefix (REPeat while Not Equal), SCAS scans the string searching for the first string element which is equal to the value in the accumulator.

As we put simply the value 0x0 in the eax register, the function searches for the first occurrence of a null byte value in arg1. Meanwhile, it increments the value of edi and decrements ecx for every compared character. When the null byte has been found, the value in ecx is incremented by one and a bitwise not operation is performed, to get the two’s complement value of ecx. In other words, we thereby obtain the length of the string stored in arg1 (including the trailing null byte).

The function continues to store the byte value of arg2 (i.e. type char) in the register al and uses the interesting function std, which I haven’t heard of yet. The std command in assembly sets the direction flag, which actually reverses the way string operations such as scasb work. Instead of incrementing the value stored in edi for every processed character, edi is being decremented and the string therefore is processed from the end to start.

In order to ignore the null byte, edi is decremented beforehand. Afterwards, repne scasb is performed again to search for the last occurrence of the character arg2 in the string arg1. Note that it is crucial for ecx to hold the length of the string at the start of the respne scasb procedure, as otherwise the function would have no knowledge when the inspected string in edi ends.

When the repne scasb function is completed, the value of edi is incremented and the character compared to the passed arg2 value. If it matches, we have found the last occurrence of arg2 in the string arg1 and the function returns the pointer to the corresponding memory address.

In the other case, a null value is returned. Furthermore, it is worthwhile to mention that the operation cld is invoked to clear the previously set direction flag.

Finally, we provide a C-decompilation of the function with more comprehensible variable names:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
char* getLastOccurrenceOfCharacter(char* string, char key) {
    int countChars = 0;
    while (*string) {
        countChars++;
        string++;
    }

    while (countChars) {
        if (key == *string) {
            return string;
        }
        countChars --;
        string --;
    }

    return 0;
}

UPDATE:

Unfortunately, there is a bug in the disassembly from above. When the last and only occurrence of key in string is at the very first character, the function will not return the pointer correctly. Therefore, we have to adjust the second while-loop to take into consideration position 0 as well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
char* getLastOccurrenceOfCharacter(char* string, char key) {
    int countChars = 0;
    while (*string) {
        countChars++;
        string++;
    }

    while (countChars >= 0) {
        if (key == *string) {
            return string;
        }
        countChars --;
        string --;
    }

    return 0;
}