Hi Massimo,
I can see that there are 79 lhcb jobs running at Legnaro. Out of
them, 18 are actually "Running" the Monte Carlo Generation and the rest
are "Waiting for Data Transfer", most of them since early yesterday
morning.
All data transfer operations are supposed to be wrapped with
timeout to avoid these locks. I can not guarantee that there are no
errors in the logic. Can you please check a couple of this stuck jobs to
see what is currently being execute?
Doing a "netstat" could bring some more info about possible
reasons for the hangs.
Some affected WN are:
cmsfarm-01-14.lnl.infn.it
cmsfarmbl01.lnl.infn.it
cmsfarm-01-03.lnl.infn.it
If you can please do these checks and let me know the result, it
can help us to avoid the problem in the future.
If you see gridftp stuck processes please kill them, the
wrappers will retry.
Regards
Ricardo
PS: jobs will eventually be killed by proxy expiration, but this only
happened in about 2 day from now.
=======================================================================
========
Ricardo Graciani Diaz
Dept. Estructura i Constituents de la Materia
Facultat de Fisica Tel: +34 93 403 7062
Universitat de Barcelona Fax: +34 93 402 1198
Diagonal, 647
E-08028 Barcelona
=======================================================================
========
> -----Mensaje original-----
> De: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
En
> nombre de Massimo Biasotto
> Enviado el: viernes, 17 de diciembre de 2004 12:33
> Para: [log in to unmask]
> Asunto: [LCG-ROLLOUT] lhcb jobs stalled
>
> Hi,
>
> here in Legnaro we have 80 lhcb jobs running which have not
> been using any cpu time for the last couple of days, they are
> just waiting for some data transfer to happen. Maybe this is
> related to the castrogrid problem at cern (has it been solved
> now?).
> Anyway, I'm running a local production here, and I don't like
> having all these slots filled up by jobs which are just sitting
> there doing nothing. Is there any chance to see some improvement
> soon? As time goes by, I'm getting more and more tempted to
> issue a massive bkill...
>
> Cheers,
> Massimo Biasotto
>
> --
> Massimo Biasotto phone: +39 049 8068383
> INFN - Lab. Naz. di Legnaro fax: +39 049 641925
> Viale dell'Universita' 2 email: [log in to unmask]
> I-35020 Legnaro (Padova)
|