On Tue, Jul 19, 2005 at 04:17:22PM +0100, Greig A Cowan wrote:
> Hi everyone,
>
> We are currently involved in the file transfers from RAL. However, we have
> been having trouble with our pool node in that all the CPU (8*1.9 GHz)
> and memory (physical RAM is 32 GB) resources have been quickly used up,
> grinding the machine to a halt. This has prevented us from accepting
> files.
>
> When Steve Thorn (NeSC) analysed the machine, it appears that dCache was
> spawning java processes:
>
> 1195 ? S 0:00 /bin/sh /opt/d-cache/jobs/pool -pool=dcache
> -logfile=
> 1197 ? S 0:00 \_ /usr/java/j2sdk1.4.2_08/bin/java -server
> -Xmx256m
> 1200 ? S 9:55 \_ /usr/java/j2sdk1.4.2_08/bin/java
> -server -Xmx
> 1201 ? S 0:57 \_ /usr/java/j2sdk1.4.2_08/bin/java
> -server
> 1202 ? S 0:00 \_ /usr/java/j2sdk1.4.2_08/bin/java
> -server
> 1203 ? S 0:00 \_ /usr/java/j2sdk1.4.2_08/bin/java
> -server
> 1204 ? S 0:00 \_ /usr/java/j2sdk1.4.2_08/bin/java
> -server
> ...
>
> There were ~200 each using 57 MB RAM. At one point, the total RAM used was
> 31 GB. At the moment, dcache services have been stopped on the pool node
> and after a reboot the machine appears to have returned to normal. Has
> anyone seen/heard of this before?
I can see around 400 threads from the two java processes in one of our pool
nodes. Total memory in use is ~380MB for both of them (~100 for the pool, ~180
for gridftp). Are you sure that the problem was caused because of low memory?
Threads share all the data so it's more likely to me that the process was only
using 57MB total ;P
In our pool node, there are also 165 connections from csfnfs*.rl.ac.uk and the disk
spends most of it's seeking instead of doing something usefull (writing) which
causes a huge load.
Cheers,
Kostas
|