Skip to main content.

14.1.10. How can I check if Floating Point support is working?

The floating point performance of my P2020 QorIQ processor is really poor. I am not using the ELDK, but a tool chain from FOOBAR. Can this be a problem? What can I do to verify this?

The P20xx QorIQ processors use an e500v2 core which does not include a normal Floating Point Unit (FPU), but instead a Signal Processing Engine (SPE Version 2). You can run FP calculations on the SPE, but there are no special FP registers available as on a normal FPU, so General Purpose Registers must be used for passing of FP operands. While this is still much faster than pure soft-float emulation, it is missing the advantages and the speed of a full-blown, separate standard FPU with a full FP register set.

Also, your tool chain needs to be aware of this feature, and must contain support for it. Eventually special compiler options are needed - check your documentation.

With the ELDK, the needed settings are automatically pre-set when you just chose the correct target architecture packages, cf. 3.4. Supported Target Architectures

To test what your tool chain is doing, you best compile a smal test program and check the generated code. The following examples were done with ELDK 4.2 for Power Architecture targets:
  1. Test Program:
    $ cat fp_test.c
    double foo (double x, double y)
            double z;
            z = (x + y) / (x * y);
            return z;
  2. Build for normal FPU support (using the ppc_6xx target architecture):
    $ export CROSS_COMPILE=ppc_6xx-
    $ ppc_6xx-gcc -S -O fp_test.c
    Check results:
    $ cat fp_test.s
            .file   "fp_test.c"
            .section        ".text"
            .align 2
            .globl foo
            .type   foo, @function
            fadd 0,1,2
            fmul 1,1,2
            fdiv 1,0,1
            .size   foo, .-foo
            .ident  "GCC: (GNU) 4.2.2"
            .section        .note.GNU-stack,"",@progbits
    The use of floating point machine instructions ("fadd", "fmul", "fdiv") and the fact that no additional register use is needed is a clear indication that full support for the hardware FPU is available in this configuration.
  3. Build for soft-float emulation (using the ppc_8xx target architecure):
    $ export CROSS_COMPILE=ppc_8xx-
    $ ppc_8xx-gcc -S -O fp_test.c 
    $ cat fp_test.s
            .file   "fp_test.c"
            .globl __adddf3
            .globl __muldf3
            .globl __divdf3
            .section        ".text"
            .align 2
            .globl foo
            .type   foo, @function
            stwu 1,-48(1)
            mflr 0
            stw 24,16(1)
            stw 25,20(1)
            stw 26,24(1)
            stw 27,28(1)
            stw 28,32(1)
            stw 29,36(1)
            stw 0,52(1)
            mr 28,3
            mr 29,4
            mr 26,5
            mr 27,6
            bl __adddf3
            mr 24,3
            mr 25,4
            mr 3,28
            mr 4,29
            mr 5,26
            mr 6,27
            bl __muldf3
            mr 5,3
            mr 6,4
            mr 3,24
            mr 4,25
            bl __divdf3
            lwz 0,52(1)
            mtlr 0
            lwz 24,16(1)
            lwz 25,20(1)
            lwz 26,24(1)
            lwz 27,28(1)
            lwz 28,32(1)
            lwz 29,36(1)
            addi 1,1,48
            .size   foo, .-foo
            .ident  "GCC: (GNU) 4.2.2"
            .section        .note.GNU-stack,"",@progbits
    The fact that the compiler is calling helper functions (__adddf3, __muldf3, __divdf3) combined with heavy use of the General Purpose Registers is a clear indication for software-emulated FP support - and explains why this is so slow compared to a real FPU.
  4. Build for SPE v2 support (as needed for example for a P2020 QorIQ processor, using the ppc_85xxDP target architecture):
    $ export CROSS_COMPILE=ppc_85xxDP-
    $ ppc_85xxDP-gcc -S -O fp_test.c 
    $ cat fp_test.s
            .file   "fp_test.c"
            .section        ".text"
            .align 2
            .globl foo
            .type   foo, @function
            stwu 1,-48(1)
            stw 3,8(1)
            stw 4,12(1)
            stw 5,16(1)
            stw 6,20(1)
            evmergelo 0,3,4
            evmergelo 9,5,6
            efdadd 11,0,9
            efdmul 0,0,9
            efddiv 11,11,0
            evstdd 11,24(1)
            evmergehi 9,11,11
            mr 10,11
            stw 9,32(1)
            stw 10,36(1)
            mr 3,9
            mr 4,10
            addi 1,1,48
            .size   foo, .-foo
            .ident  "GCC: (GNU) 4.2.2"
            .section        .note.GNU-stack,"",@progbits
    Here we can see moderate use of General Purpos Registers combined with the use of SPE machine instructions (evmergelo, efdadd, efdmul, efddiv, evstdd, evmergehi) which proves that the compiler really generates code that supports the SPE.
14.1.9. GDB Problems with BDI2000/BDI3000 on e500 Cores 1. Abstract 14.1.11. ELDK 2.x Installation Aborts
Prev Home Next