Print

Print


Christopher J.Walker wrote:
> QMUL seems to have problems with the CE-sft-lcg-rm-cr test
> 
> The offending piece of the output seems to be:
> 
> [SE][Mkdir] httpg://se03.esc.qmul.ac.uk:8444/srm/managerv2: CGSI-gSOAP: 
> Error reading token data header: Connection closed
> 
> We seem to have significantly more failures with ce01 than with ce03 - 
> but both should send jobs to the same set of worker nodes.
> 
> I've tried running the crl update by hand a few times - and still see 
> the problems - and have now set it to update every hour.
> 
> What I don't understand is why there should be more failures on ce01 
> than ce03 - they should both be using the same set of worker nodes.
> 
> Any ideas?
> 

And I presume this is what is causing our job failures in hammercloud:


- pilotlog.txt -
18 Aug 2009 14:56:16| !!WARNING!!2990!! Command failed: export 
X509_USER_PROXY=/tmp/globus-tmp.cn466.8212.0; which lcg-cr; lcg-cr 
--version; lcg-cr --verbose --vo atlas -T srmv2 -s ATLASSCRATCHDISK -b 
-l 
/grid/atlas/users/pathena/user09.JohannesElmsheuser/user09.JohannesElmsheuser.ganga.sitetest.ANALY_QMUL.1250602342.702660fc-7717-4699-9f46-2e8a7ba9ee1a_sub02956913/user09.JohannesElmsheuser.ganga.sitetest.ANALY_QMUL.1250602342.702660fc-7717-4699-9f46-2e8a7ba9ee1a.AANT._00303.root 
-g 54c4d213-f40d-4c49-a2d4-ded159ac4b72 -d 
srm://se03.esc.qmul.ac.uk:8444/srm/managerv2?SFN=/atlas/atlasscratchdisk/user09.JohannesElmsheuser/user09.JohannesElmsheuser.ganga.sitetest.ANALY_QMUL.1250602342.702660fc-7717-4699-9f46-2e8a7ba9ee1a_sub02956913/user09.JohannesElmsheuser.ganga.sitetest.ANALY_QMUL.1250602342.702660fc-7717-4699-9f46-2e8a7ba9ee1a.AANT._00303.root 
file:/data/scratch/tmp/condorg_bAYM8327/pilot3/Panda_Pilot_8350_1250605330/PandaJob_1019068164_1250605330/user09.JohannesElmsheuser.ganga.sitetest.ANALY_QMUL.1250602342.702660fc-7717-4699-9f46-2e8a7ba9ee1a.AANT._00303.root
18 Aug 2009 14:56:16| !!WARNING!!5000!! Abnormal termination: ecode=256, 
ec=1, sig=-, len(etext)=1478
18 Aug 2009 14:56:16| !!WARNING!!5000!! Error message: 
/opt/grid/glite/3.1.19/lcg/bin/lcg-cr lcg_util-1.6.15 
GFAL-client-1.10.17 Using grid catalog type: lfc Using grid catalog : 
lfc-atlas.gridpp.rl.ac.uk Using LFN : 
/grid/atlas/users/pathena/user09.JohannesElmsheuser/user09.JohannesElmsheuser.ganga.sitetest.ANALY_QMUL.1250602342.702660fc-7717-4699-9f46-2e8a7ba9ee1a_sub02956913/user09.JohannesElmsheuser.ganga.sitetest.ANALY_QMUL.1250602342.702660fc-7717-4699-9f46-2e8a7ba9ee1a.AANT._00303.root 
SE type: SRMv2 Using SURL : 
srm://se03.esc.qmul.ac.uk:8444/srm/managerv2?SFN=/atlas/atlasscratchdisk/user09.JohannesElmsheuser/user09.JohannesElmsheuser.ganga.sitetest.ANALY_QMUL.1250602342.702660fc-7717-4699-9f46-2e8a7ba9ee1a_sub02956913/user09.JohannesElmshe

- Walltime -
jobRetrival=1, StageIn=84, Execution=1867, StageOut=63, CleanUp=4
UHURA (37t)

Chris