- Test Program:
$ cat fp_test.c
double foo (double x, double y)
{
double z;
z = (x + y) / (x * y);
return z;
}
- Build for normal FPU support (using the
ppc_6xx
target architecture):
$ export CROSS_COMPILE=ppc_6xx-
$ ppc_6xx-gcc -S -O fp_test.c
Check results:
$ cat fp_test.s
.file "fp_test.c"
.section ".text"
.align 2
.globl foo
.type foo, @function
foo:
fadd 0,1,2
fmul 1,1,2
fdiv 1,0,1
blr
.size foo, .-foo
.ident "GCC: (GNU) 4.2.2"
.section .note.GNU-stack,"",@progbits
The use of floating point machine instructions ("fadd", "fmul", "fdiv")
and the fact that no additional register use is needed is a clear
indication that full support for the hardware FPU is available in this
configuration.
- Build for soft-float emulation (using the
ppc_8xx
target
architecure):
$ export CROSS_COMPILE=ppc_8xx-
$ ppc_8xx-gcc -S -O fp_test.c
$ cat fp_test.s
.file "fp_test.c"
.globl __adddf3
.globl __muldf3
.globl __divdf3
.section ".text"
.align 2
.globl foo
.type foo, @function
foo:
stwu 1,-48(1)
mflr 0
stw 24,16(1)
stw 25,20(1)
stw 26,24(1)
stw 27,28(1)
stw 28,32(1)
stw 29,36(1)
stw 0,52(1)
mr 28,3
mr 29,4
mr 26,5
mr 27,6
bl __adddf3
mr 24,3
mr 25,4
mr 3,28
mr 4,29
mr 5,26
mr 6,27
bl __muldf3
mr 5,3
mr 6,4
mr 3,24
mr 4,25
bl __divdf3
lwz 0,52(1)
mtlr 0
lwz 24,16(1)
lwz 25,20(1)
lwz 26,24(1)
lwz 27,28(1)
lwz 28,32(1)
lwz 29,36(1)
addi 1,1,48
blr
.size foo, .-foo
.ident "GCC: (GNU) 4.2.2"
.section .note.GNU-stack,"",@progbits
The fact that the compiler is calling helper functions (__adddf3,
__muldf3, __divdf3) combined with heavy use of the General Purpose
Registers is a clear indication for software-emulated FP support - and
explains why this is so slow compared to a real FPU.
- Build for SPE v2 support (as needed for example for a P2020 QorIQ processor, using the
ppc_85xxDP
target architecture):
$ export CROSS_COMPILE=ppc_85xxDP-
$ ppc_85xxDP-gcc -S -O fp_test.c
$ cat fp_test.s
.file "fp_test.c"
.section ".text"
.align 2
.globl foo
.type foo, @function
foo:
stwu 1,-48(1)
stw 3,8(1)
stw 4,12(1)
stw 5,16(1)
stw 6,20(1)
evmergelo 0,3,4
evmergelo 9,5,6
efdadd 11,0,9
efdmul 0,0,9
efddiv 11,11,0
evstdd 11,24(1)
evmergehi 9,11,11
mr 10,11
stw 9,32(1)
stw 10,36(1)
mr 3,9
mr 4,10
addi 1,1,48
blr
.size foo, .-foo
.ident "GCC: (GNU) 4.2.2"
.section .note.GNU-stack,"",@progbits
Here we can see moderate use of General Purpos Registers combined with
the use of SPE machine instructions (evmergelo, efdadd, efdmul, efddiv,
evstdd, evmergehi) which proves that the compiler really generates code
that supports the SPE.