Recognizing the assembly language patterns are essential to understand what the original high level language looked like. I'm planning to do a series of articles on assembly language patterns. This is the first post of this series. I'm going to use gcc 4.5.2 on linux 2.6 to compile my programs. For disassembly I'll use both gdb and objdump. The x86 assembly language syntax will be AT&T. If you are not familiar with it you can read about it here.
I'll start very simple and gradually move to more complex constructs. In this post I'm going to cover two very basic constructs of c/c++ languages - if and if-else. I'll assume a very basic knowledge of assembly language.
Compile the following program with the commandgcc -g -o ifelse ifelse.c -
gdb ifelse . At the gdb command prompt type disassemble func to disassemble the func function. You'll get the following assembly -
I'll start very simple and gradually move to more complex constructs. In this post I'm going to cover two very basic constructs of c/c++ languages - if and if-else. I'll assume a very basic knowledge of assembly language.
Compile the following program with the command
/*ifelse.c*/ #include <stdio.h> void func() { int i = 0; if (i == 1) { printf("i is set.\n"); } i = 1; if (i == 1) { printf("i is set.\n"); } else if(i == 2) { printf("i is really set\n"); } else { printf("i is not set\n"); } } void main() { func(); }Now open the compiled binary in gdb with the command
0x080483b4 <+0>: push %ebp 0x080483b5 <+1>: mov %esp,%ebp 0x080483b7 <+3>: sub $0x28,%esp 0x080483ba <+6>: movl $0x0,-0xc(%ebp) 0x080483c1 <+13>: cmpl $0x1,-0xc(%ebp) 0x080483c5 <+17>: jne 0x80483d3 <func+31> 0x080483c7 <+19>: movl $0x80484e0,(%esp) 0x080483ce <+26>: call 0x80482f0 <puts@plt> 0x080483d3 <+31>: movl $0x1,-0xc(%ebp) 0x080483da <+38>: cmpl $0x1,-0xc(%ebp) 0x080483de <+42>: jne 0x80483ee <func+58> 0x080483e0 <+44>: movl $0x80484e0,(%esp) 0x080483e7 <+51>: call 0x80482f0 <puts@plt> 0x080483ec <+56>: jmp 0x804840e <func+90> 0x080483ee <+58>: cmpl $0x2,-0xc(%ebp) 0x080483f2 <+62>: jne 0x8048402 <func+78> 0x080483f4 <+64>: movl $0x80484ea,(%esp) 0x080483fb <+71>: call 0x80482f0 <puts@plt> 0x08048400 <+76>: jmp 0x804840e <func+90> 0x08048402 <+78>: movl $0x80484fa,(%esp) 0x08048409 <+85>: call 0x80482f0 <puts@plt> 0x0804840e <+90>: leave 0x0804840f <+91>: ret
For the time being you can ignore the lines <+0> to <+3> and <+90> to <+91>. These set of instructions are respectively the prolog and the epilog of the function. They do some essential bookkeeping when calling functions which I'll try to cover in some later post. At <+6> the local variable i is being assigned the value zero. How do you know this? The negative offset from ebp indicates that a local variable is being accessed.
Next at <+13> i is compared to one. This instruction sets the zero flag if the compared values are equal. At <+17> there is a conditional jump - jump if not zero - it says that jump to the address <func+31> if zero flag is not set. It jumps over the next two instructions <+19> and <+26> which make the body of the if block.