Sometimes, you want to play with assembly language without coding it all yourself. Below I use a program which I wanted to analyze to see how its assembly code was written. For this whole exercise, I assume you've named this source file
adder.c.
First, grab the
code from this other project,
AddingWithoutPlus.
Compiling with extra assembly step
Type this:
gcc -O3 -c -S adder.c.
This means, compile with
O3 optimizations, don't make it an executable (
-c), and spit out the assembler (
-S).
You should now see a file called
adder.s in your working directory. Open up the file and take a look. Even if you don't know much about assembly you might notice some familiar stuff. Note that we did not compile with symbols (
-g) but we can still see things like
main and
add1, the routine we'll be messing with.
Anyway, just to prove that this assembly file is actually the thing that turns into the binary you run, let's make our executable from it.
Type this:
gcc adder.s -o adder
You should now see the
adder executable. It should run just fine. Pay attention to the
calls per ms in the
hard way line of output. On my P3 1Ghz it's around
13.7. Note, if you were to go back and compile it like this...
gcc -O3 adder.c -o adder
...you would get the same performance. The reason this works this way is that
gcc really consists of several tools including the compiler, assembler and linker. It just intelligently uses the right tool(s) for your command.
Hacking around with a routine
Ok, so now let's mess with the
add1 routine. Open the
adder.s file. Delete all the stuff before and after the
add1: subroutine so that you end up with the file looking just like this. Note also, we have added the
.globl add_asm entry. Without this, we wouldn't be able to link. We have also re-named
add1 to
add_asm.
.globl add_asm
add_asm:
pushl %ebp
movl %esp, %ebp
pushl %esi
pushl %ebx
movl 8(%ebp), %edx
xorl %ebx, %ebx
movl $-1, %esi
movl $1, %ecx
.p2align 2,,3
.L43:
movl %edx, %eax
xorl %ecx, %eax
shrl $1, %ebx
sall $31, %eax
andl %edx, %ecx
orl %eax, %ebx
sarl $1, %edx
shrl $1, %esi
jne .L43
movl %ebx, %eax
popl %ebx
popl %esi
leave
ret
Now, let's compile it. Simply do...
gcc -c adder.s
Note we don't have to specify
-O3. The optimizations have already been done. Ok, now you should have a new file in your directory called
adder.o.
Now, before we link, we need to make one change. We need
adder.c to call
add_asm instead of
add1. This is around line 75. The routine should now look like this...
...
for ( test = 0; test < iters; test++)
{
check = add_asm(test);
#ifdef CHECK_ON
if (test + 1 != check)
{
printf ("nope! %d != %d\n", check, test + 1);
return 1;
}
#endif
...
}
Ok, now we should be ready. Type...
gcc -O3 adder.c adder.o -o asm_adder
...and poof, you now have
asm_adder. Interestingly, when I run it, my performance is now worse! This is probably because each call to
add_asm is very short, and even a tiny bit of overhead makes a difference. I didn't expect that to happen just in linking.
Anyway,
now you have the ability to do some cool stuff...
- Take the compiler-generated assembler from a function, tweak it and re-try
- Patch the assembler from one compiler into another
--
MattWalsh - 09 Aug 2004