> There may be other causes. Some questions:
>
> - Is the problem intermittent _within_ affected jobs?
Not that I know of - and I don't think the job has a re-try if there's a
timeout problem (not my code though so I can't be sure).
> - Are different users affected?
Yes but all seems to be coming through panda pilots (so the setup etc
should be standard)
> - Does the problem only affect particular job _types_?
Hard for me to say but from talking with ATLAS people (well Rod Walker)
I don't think so.
It also doesn't seem to be related to a specific ATLAS sw version.
I think your original hints towards load on the nodes or network sound
like a good place to start.
cheers
johnk
--
+------------------------------------------------------------+
|Dr. John Alan Kennedy Rechenzentrum Garching (RZG) |
|Mail: [log in to unmask] Boltzmannstrasse 2 |
|Phone: +49 89 3299 2694 85748 Garching |
|Fax: +49 89 3299 1301 |
+------------------------------------------------------------+
|