Print

Print


Hi,

we have some trouble with our latest series of worker nodes:

Sometimes we see log messages like "kernel: Remounting
filesystem read-only". A filesystem check on this filesystem
fails.

First we have assumed a hardware problem. We have upgraded
all WNs BIOS. However the problems still occur.

The disks don't report any SMART failures.

In order to demonstrate this problem to the vendor of the
boxes, we tried to generate heavy IO load, using commands
like "dd if=/dev/zero of=/tmp/100gb bs=50000 count=2000000".
When this command has been started, the "cached" memory size
rises and occupies the whole RAM. The Oom killer then often
deletes one or more processes.

That is possibly a kernel bug. Have other sites already
encountered the same problems?

Upgrading the WNs to SL4 x86_64 could be a solution. Are
other sites already running 64 bit WNs?

Thanks in advance!

Best regards,
Manfred Alef