Monday, August 22, 2011

Experiments with Disassembly

Recognizing the assembly language patterns are essential to understand what the original high level language looked like. I'm planning to do a series of articles on assembly language patterns. This is the first post of this series. I'm going to use gcc 4.5.2 on linux 2.6 to compile my programs. For disassembly I'll use both gdb and objdump. The x86 assembly language syntax will be AT&T. If you are not familiar with it you can read about it here.

I'll start very simple and gradually move to more complex constructs. In this post I'm going to cover two very basic constructs of c/c++ languages - if and if-else. I'll assume a very basic knowledge of assembly language.

Compile the following program with the command gcc -g -o ifelse ifelse.c -
/*ifelse.c*/
#include <stdio.h>

void func()
{
    int i = 0;
    if (i == 1)
    {
	printf("i is set.\n");
    }

    i = 1;
    if (i == 1)
    {
	printf("i is set.\n");
    }
    else if(i == 2)
    {
	printf("i is really set\n");
    }
    else
    {
	printf("i is not set\n");
    }
}

void main()
{
    func();
}
Now open the compiled binary in gdb with the command gdb ifelse. At the gdb command prompt type disassemble func to disassemble the func function. You'll get the following assembly -
0x080483b4 <+0>:   push   %ebp
0x080483b5 <+1>:   mov    %esp,%ebp
0x080483b7 <+3>:   sub    $0x28,%esp
0x080483ba <+6>:   movl   $0x0,-0xc(%ebp)
0x080483c1 <+13>:  cmpl   $0x1,-0xc(%ebp)
0x080483c5 <+17>:  jne    0x80483d3 <func+31>
0x080483c7 <+19>:  movl   $0x80484e0,(%esp)
0x080483ce <+26>:  call   0x80482f0 <puts@plt>
0x080483d3 <+31>:  movl   $0x1,-0xc(%ebp)
0x080483da <+38>:  cmpl   $0x1,-0xc(%ebp)
0x080483de <+42>:  jne    0x80483ee <func+58>
0x080483e0 <+44>:  movl   $0x80484e0,(%esp)
0x080483e7 <+51>:  call   0x80482f0 <puts@plt>
0x080483ec <+56>:  jmp    0x804840e <func+90>
0x080483ee <+58>:  cmpl   $0x2,-0xc(%ebp)
0x080483f2 <+62>:  jne    0x8048402 <func+78>
0x080483f4 <+64>:  movl   $0x80484ea,(%esp)
0x080483fb <+71>:  call   0x80482f0 <puts@plt>
0x08048400 <+76>:  jmp    0x804840e <func+90>
0x08048402 <+78>:  movl   $0x80484fa,(%esp)
0x08048409 <+85>:  call   0x80482f0 <puts@plt>
0x0804840e <+90>:  leave  
0x0804840f <+91>:  ret    
For the time being you can ignore the lines <+0> to <+3> and <+90> to <+91>. These set of instructions are respectively the prolog and the epilog of the function. They do some essential bookkeeping when calling functions which I'll try to cover in some later post. At <+6> the local variable i is being assigned the value zero. How do you know this? The negative offset from ebp indicates that a local variable is being accessed.

Next at <+13> i is compared to one. This instruction sets the zero flag if the compared values are equal. At <+17> there is a conditional jump - jump if not zero - it says that jump to the address <func+31>  if zero flag is not set. It jumps over the next two instructions <+19> and <+26> which make the body of the if block.