Martin,
On the GOC UI we also see lots of processes for python2.2.
Currently, there are some 100+ such processes, the earliest of which is
over a
week old, about the time I starting using this machine as a monitoring
server.
webman 6421 0.0 0.0 40172 4 ? S Dec02 0:00 python2.2
/opt/edg/bin/edg-job-submit -c /home/webman/GPPMon/cron/config/BUDRB.conf
--config-vo /home/webman/GPPMon/cron/config-vo/BUDRB.conf
/tmp/gppmon/monitor.BUDRB.IN2P3.jdl
The monitoring server, if you recall, submits jobs to all sites from each
RB. When the monitoring scripts run once an hour (4 RBs * 28 CEs) the
submission frequency
of the jobs is very high. Perhaps this puts too much load on the system?
I'll add a sleep command to the scripts to reduce the frequency and report
back.
For now, lcgui01 should not suffer from this problem as the monitoring is
no longer
running on this machine.
Cheers, Dave.
-----Original Message-----
From: LHC Computer Grid - Rollout
[mailto:[log in to unmask]]On Behalf Of Martin Bly
Sent: 11 December 2003 09:23
To: [log in to unmask]
Subject: [LCG-ROLLOUT] RAL UI rebooted twice in two days - out of memory
We have now had three incidents of the RAL UI becoming frozen due to being
out of memory - twice in two days thsi week, and once last week. The only
solution is a reboot.
I don't have a handle on what is causing this except that all the processes
killed as a result are running python2.2. I will attepmt to find out
more...
Martin.
|