Hi Rodney
I will ask the SAM administrators to look into this issue and will
report back to you.
Tiziana
On 20/11/2012 09:39, Rodney Walker wrote:
> Hi,
> At LRZ there are 30 ops jobs running, the oldest for 15hrs. The process
> is nagrun (which I list fully below). Tracing the python process, I see
> repeated timeouts on connection to nympha1-2.zcu.cz
> <http://nympha1-2.zcu.cz>
>
> connect(4, {sa_family=AF_INET, sin_port=htons(6163),
> sin_addr=inet_addr("147.228.240.74")}, 16) = -1 ETIMEDOUT (Connection
> timed out)
>
> Anyone know what this host is, and why it is so important to connect to
> it that it occupies cpu slots for so long?
>
> Cheers,
> Rod.
>
>
>
>
> ops001 3880 3877 0 06:20 ? 00:00:00 /bin/sh ./nagrun.sh -v
> ops -f /ops/NGI/Germany -d
> /queue/grid.probe.metricOutput.EGEE.rocmon-fzk_gridka_de -n PROD -t 600
> -w 1 -l prod-lfc-shared-central.cern.ch
> <http://prod-lfc-shared-central.cern.ch> -s fornax-se2.itwm.fhg.de
> <http://fornax-se2.itwm.fhg.de>,cmssrm-fzk.gridka.de
> <http://cmssrm-fzk.gridka.de>,lcg-se1.ifh.de <http://lcg-se1.ifh.de>
> ops001 3903 3880 0 06:20 ? 00:00:01 python
> /home/grid/lcg/home/ops001/home_crem3_055458426/CREAM055458426/nagios/bin/mta-simple
> --dirq /tmp/sam.3880.23222/msg-outgoing --destination
> /queue/grid.probe.metricOutput.EGEE.rocmon-fzk_gridka_de
> --broker-network PROD --pidfiledir
> /home/grid/lcg/home/ops001/home_crem3_055458426/CREAM055458426/nagios/var/
> -v info --bdii-uri lcg-bdii.cern.ch:2170 <http://lcg-bdii.cern.ch:2170>
>
>
> --
> Tel. +49 89 289 14152
--
Tiziana Ferrari
EGI.eu Operations
0031 (0)6 3037.2691
|