DENX . PPCEmbedded . Performance |
ICTRL
to 0x7).
To get maximum performance, you need to enable copyback data cache. This can be
disabled in order to make the standard Linux/PPC libraries work without
recompiling. If you build your own glibc as described under
RuntimeLibrary, you can enable copyback. Look for a make config
option, or grep for DC_SFWT
in arch/ppc/kernel/head.S
and change the
#if 0
to #if 1
.
/proc/profile
is a standard kernel feature which provides simple kernel
profiling based on Instruction Pointer sampling in the periodic timer interrupt
routine. It's simplistic but effective, and low overhead since the interrupt is
going to happen anyway. The data is processed with readprofile
which looks up
the System.map
to show which kernel functions are using the most CPU time. It
doesn't work for modules yet so at present you need to compile them in for
profiling.
You need to enable this at boot time by passing profile=2
on the command
line. The number gives the power of 2 granularity used for the counters -- 2
will give you a seperate counter for each PowerPC instruction (each 4 bytes).
Higher numbers consume less memory and give less precise results. The data from
/proc/profile
will be in target byte order, so if you're cross-developing you
may need to either byte swap it, or compile readprofile
to run on your
target.
The PowerPC branch of the Linux kernel has been slow to implement the
Instruction Pointer sampling function necessary to generate the /proc/profile
data. If it isn't implemented in your kernel, you'll see that readprofile
always shows zero time for every kernel function.
gprof
are available.
gprof
profiling available for the kernel.
However, it hasn't been ported to the PowerPC architecture yet.
-Os
option is likely to provide
both the smallest code size and best performance, because it inhibits loop
unrolling optimisation which tends to have a negative effect on embedded
processors with relatively small cache sizes. Furthermore, PowerPC processors
can speculatively execute branches overlapped with other loop instructions,
making the branch effectively execute in zero cycles so loop unrolling is
unnecessary in many circumstances.
----- Revision r1.3 - 18 Sep 2003 - 15:23 - DetlevZundel
|