Hi Andreas,
despite obvious problems with SFTs we all see from time to time (you can get
pretty annoyed when working hard to eliminate all problems, and then see
failures appearing out of thin air), most of the time SFTs work just fine
and are very useful tool for debugging, either of sites or central services.
We can take more proactive role in suggesting improvements to the existing
tests, ideas for new types of tests etc. - that way we will all profit...
Best regards, Antun
-----
Antun Balaz
Research Assistant
E-mail: [log in to unmask]
Web: http://scl.phy.bg.ac.yu/
Phone: +381 11 3160260, Ext. 152
Fax: +381 11 3162190
Scientific Computing Laboratory
Institute of Physics, Belgrade
Serbia and Montenegro
-----
---------- Original Message -----------
From: Andreas Haupt <[log in to unmask]>
To: [log in to unmask]
Sent: Fri, 19 May 2006 11:29:28 +0200
Subject: [LCG-ROLLOUT] SFTs again
> Hi everybody,
>
> I think it's time to talk about the sense (or better nonsense) of
> official SFTs. Since yesterday two main problems are occurring (and
> no one really seems to care). First thing is that most of the sites
> are in critical state as the central CERN SE reached its quota limit:
>
> lcg-rep -v --vo dteam -d lxn1183.cern.ch
> lfn:sft-lcg-rm-cr-globe30.ifh.de.0605190816
>
> 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
> 0 bytes 0.00 KB/sec avg 0.00 KB/sec instthe
> server sent an error response: 553 553 /storage/dteam/generated/2006-
> 05-19/file6555d706-1b02-410b-b7f2-7821c7f0bd41: No space left on device.
>
> lcg_rep: No such file or directory
>
> So there's actually nothing wrong with the site itself - it's blamed
> for this error, though. If it is not possible to distinguish between
> real site problems and central problems (and I see it is not that
> simple) this test simply make no sense.
>
> Another problem occured yesterday as well. Some (not really many)
> CEs constantly suffer from JS problems (including globe-ce1.ifh.de)
> . In the STDERR output of a failed job I can this error message:
>
> submit-helper script running on host heliade9 gave error:
> cache_export_dir
> (/usr1/localhome/dteamsm/.lcgjm/globus-cache-export.FP3859) on
> gatekeeper did not contain a cache_export_dir.tar archive
>
> As internal SFTs and other test jobs work without any problems I can
> only come to the conclusion that the real error must be situated at
> the RB the SFT uses.
>
> There would be nothing wrong with this if this "random failure
> generator" called SFT is taken as what it actually is: a toy.
> Unfortunately the SFT results are still taken into consideration
> when submitting production jobs to our site (at least ATLAS does it).
>
> As long as situations like these two ones cannot be avoided I do not
> think SFTs are useful at all. The damage they cause weights more
> than their profit.
>
> Greetings
> Andreas
>
> --
> | Andreas Haupt | E-Mail: [log in to unmask]
> | DESY Zeuthen | WWW: http://www-
zeuthen.desy.de/~ahaupt
> | Platanenallee 6 | Phone: +49/33762/7-7359
> | D-15738 Zeuthen | Fax: +49/33762/7-7216
------- End of Original Message -------
|