Sometimes, you want to play with assembly language without coding it all yourself. Below I use a program which I wanted to analyze to see how its assembly code was written. For this whole exercise, I assume you've named this source file adder.c.

First, grab the code from this other project, AddingWithoutPlus.

Compiling with extra assembly step

Type this: gcc -O3 -c -S adder.c.

This means, compile with O3 optimizations, don't make it an executable (-c), and spit out the assembler (-S).

You should now see a file called adder.s in your working directory. Open up the file and take a look. Even if you don't know much about assembly you might notice some familiar stuff. Note that we did not compile with symbols (-g) but we can still see things like main and add1, the routine we'll be messing with.

Anyway, just to prove that this assembly file is actually the thing that turns into the binary you run, let's make our executable from it.

Type this: gcc adder.s -o adder

You should now see the adder executable. It should run just fine. Pay attention to the calls per ms in the hard way line of output. On my P3 1Ghz it's around 13.7. Note, if you were to go back and compile it like this...

gcc -O3 adder.c -o adder

...you would get the same performance. The reason this works this way is that gcc really consists of several tools including the compiler, assembler and linker. It just intelligently uses the right tool(s) for your command.

Hacking around with a routine

Ok, so now let's mess with the add1 routine. Open the adder.s file. Delete all the stuff before and after the add1: subroutine so that you end up with the file looking just like this. Note also, we have added the .globl add_asm entry. Without this, we wouldn't be able to link. We have also re-named add1 to add_asm.

.globl add_asm
add_asm:
   pushl %ebp
   movl  %esp, %ebp
   pushl %esi
   pushl %ebx
   movl  8(%ebp), %edx
   xorl  %ebx, %ebx
   movl  $-1, %esi
   movl  $1, %ecx
   .p2align 2,,3
.L43:
   movl  %edx, %eax
   xorl  %ecx, %eax
   shrl  $1, %ebx
   sall  $31, %eax
   andl  %edx, %ecx
   orl   %eax, %ebx
   sarl  $1, %edx
   shrl  $1, %esi
   jne   .L43
   movl  %ebx, %eax
   popl  %ebx
   popl  %esi
   leave
   ret

Now, let's compile it. Simply do...

gcc -c adder.s

Note we don't have to specify -O3. The optimizations have already been done. Ok, now you should have a new file in your directory called adder.o.

Now, before we link, we need to make one change. We need adder.c to call add_asm instead of add1. This is around line 75. The routine should now look like this...

   ...
   for ( test = 0; test < iters; test++)
   {
      check = add_asm(test);
#ifdef CHECK_ON
      if (test + 1 != check)
      {
         printf ("nope!  %d != %d\n", check, test + 1);
         return 1;
      }
#endif
   ...
   }

Ok, now we should be ready. Type...

gcc -O3 adder.c adder.o -o asm_adder

...and poof, you now have asm_adder. Interestingly, when I run it, my performance is now worse! This is probably because each call to add_asm is very short, and even a tiny bit of overhead makes a difference. I didn't expect that to happen just in linking.

Anyway, now you have the ability to do some cool stuff...

  • Take the compiler-generated assembler from a function, tweak it and re-try
  • Patch the assembler from one compiler into another

-- MattWalsh - 09 Aug 2004

Topic revision: r1 - 10 Aug 2004 - MattWalsh
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback