Comparing Linux 2.4 and Linux 2.6 Kernels:
Many customers ask for support of Linux Kernel
version 2.6 for their embedded projects.
This is easy enough to understand: after all, 2.6 is the latest
and greatest version of the Linux kernel, and who wants
to base a new project on old stuff?
They are surprised when we tell them that we don't recommend to use
Linux 2.6 for most embedded systems.
Here is a comparison of 2.4 and 2.6 on real embedded hardware
that explains why:
Build and Test Environment, Kernel Versions and Test Hardware:
The ELDK 3.1 toolchain was used to build the Linux kernels and
to provide the necessary test environment (NFS root filesystem, ramdisk images).
The following versions of the Linux Kernel were used in the tests:
The following hardware was used for the tests:
- Sandpoint 8240 (MPC8240 at 132 MHz CPU clock, 64 MB RAM)
- TQM860L (MPC860-T at 50 MHz CPU clock, 16 MB RAM)
Building the Kernels:
2.4.25:
2.6.11.7:
Results:
Sandpoint | 2.4.25 | 2.6.11.7 | Delta |
real | 77.7s | 101.6s | + 31 % |
user | 268.2s | 348.6s | + 30 % |
sys | 28.3s | 39.4s | + 39 % |
Compressed Kernel | 887.9kB | 1123.5kB | + 27 % |
Uncompressed kernel | 2086.4kB | 2513.6kB | + 20 % |
TQM860L | 2.4.25 | 2.6.11.7 | Delta  |
sys | 19.6s | 29.4s | + 50 % |
real | 50.3s | 72.2s | + 43 % |
user | 175.3s | 251.5s | + 43 % |
Compressed Kernel | 495.8kB | 706.8kB | + 43 % |
Uncompressed kernel | 1189.6kB | 1602.8kB | + 35 % |
Booting Linux:
The kernel images were booted
in the same environment with
root filesystem over NFS in an idle network.
The timings were recorded in the following log files:
Results:
Time from begin of loading the kernel image into RAM ... | 2.4.25 | 2.6.11.7 | Delta |
... until "Linux version" message |
TQM860L | 1.9s | 2.0s | + 6 % |
Sandpoint | 13.0s | 15.0s | + 15 % |
... until "Freeing unused kernel memory" message (= enter user space) |
TQM860L | 3.7s | 3.8s | + 1 % |
Sandpoint | 23.0s | 22.0s | - 4 % |
... until "login:" message (= full multi-user mode) |
TQM860L | 49.7s | 58.0s | + 17 % |
Sandpoint | 56.2s | 59.6s | + 6 % |
The slight advantage for the 2.6 when booting
is at least partially due to the fact
that the configurations were slightly different:
in 2.4, the SCSI driver was enabled,
which was not the case in 2.6
Running with ramdisk based root file system:
The systems were booted using the ramdisk images provided by the ELDK.
The main purpose was to find out how much memory remained free for application code. A simple memory allocator (
mem_eat.c)
was used to test for free (= allocateable by an application process) memory.
2.4.25:
- Sandpoint Board:
- TQM860L Board:
...
Memory: 13396k available (892k kernel code, 252k data, 52k init, 0k highmem)
...
# free
total used free shared buffers
Mem: 15032 7112 7920 0 108
Swap: 0 0 0
Total: 15032 7112 7920
# ./mem_eat
...
Got total = 9240576 = 9024kB
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
VM: killing process mem_eat
2.6.11.7:
- Sandpoint Board:
- TQM860L Board:
...
Memory: 12980k available (1240k kernel code, 288k data, 88k init, 0k highmem)
...
# free
total used free shared buffers
Mem: 14676 6572 8104 0 4092
Swap: 0 0 0
Total: 14676 6572 8104
# ./mem_eat
...
Got total = 8519680 = 8320kB
oom-killer: gfp_mask=0xd2
...
Results:
TQM860L | 2.4.25 | 2.6.11.7 | Delta |
Kernel Code + Data | 1144kB | 1528kB | + 34 % |
total "free" | 15032kB | 14676kB | - 2 % = - 356kB |
Available for malloc() | 9024kB | 8320kB | - 8 % = - 704kB |
Running the lmbench
benchmark:
To get some more extensive performance data the
lmbench
benchmark (version 3.0-a4) was run on the systems).
The
results give a pretty clear
and somewhat disappointing picture:
- Even in tests that should not depend on the OS environment
at all (like basic integer or float operations) the 2.6 kernel
1...3% slower thanthe 2.4 kernel. This is probably casued by the
bigger kernel size which, especially in combination with the small
caches on embedded processors, results in some loss of performance.
- Context switching which is essential for the efficient operation
of a multi-tasking OS is
on average 55% slower (range: 10...94%).
- Local communication latencies are significantly slower, too:
80% for pipes,
79% for UNIX domain sockets,
34% for UDP,
12% for RPC over UDP,
25% for TCP,
12% for RPC over TCP,
and
25% for TCP connections.
- File system and VM system latencies go up dramatically, too:
76% for file creation,
17% for file deletion,
32% for the mmap() latency,
6% for page faults
and
2% for select() on 100 file descriptors.
- Local communication bandwidth suffers hard, too,
but the results differ depending on the processor used:
- Pipes become slightly (3%) faster on the MPC860, but reach only
49% on the MPC8240
- UNIX domain sockets are 11 slower on the MPC8240,
but same speed on MPC860
- TCP is 2% faster on the MPC8240, but 31% slower on the MPC860
- File re-reads drop by 6%
Note that some tests did not run under 2.6 on the MPC860 system because
the memory was exhausted (the test system had only 16 MB RAM).
Management Summary:
Using the 2.6 kernel on embedded systems implicates the following disadvantages:
- Slow to build:
2.6 takes 30...40% longer to compile
- Big memory footprint in flash:
the 2.6 compressed kernel image is 30...40% bigger
- Big memory footprint in RAM:
the 2.6 kernel needs 30...40% more RAM;
the available RAM size for applications is 700kB smaller
- Slow to boot:
2.6 takes 5...15% longer to boot into multi-user mode
- Slow to run: context switches up to 96% slower,
local communication latencies up to 80% slower,
file system latencies up to 76% slower,
local communication bandwidth less than 50% in some cases.
There may be a few cases where the use of Linux kernel version 2.6
makes sense even for embedded systems
(typically on "bigger" systems with more powerful processors),
but if memory footprint or system performance are important
you probably want to stick with a 2.4 kernel for now.
--
WolfgangDenk - 24 Apr 2005