Isn't this a firewall problem? Check with Jon Churchill. He had a similar problem with the NGS WMS.
John
-----Original Message-----
From: "[log in to unmask]" <[log in to unmask]>
To: "[log in to unmask]" <[log in to unmask]>
Sent: 17/02/10 09:20
Subject: SAM test issues with lcg-CE at RAL
Hi,
We are experiencing some problems with a lcg-CE at RAL Tier1. It appears
that jobs coming from WMSes outside RAL are never terminated. The errors
are like the one below
*************************************************************
BOOKKEEPING INFORMATION:
Status info for the Job :
https://wms208.cern.ch:9000/kQY1kKjUIacCjeuCgiMQfA
Current Status: Aborted
Logged Reason(s):
- File not available.Cannot read JobWrapper output, both from Condor
and from Maradona.
- File not available.Cannot read JobWrapper output, both from Condor
and from Maradona.
Status Reason: hit job shallow retry count (1)
Destination:
lcgce02.gridpp.rl.ac.uk:2119/jobmanager-lcgpbs-grid1000M
Submitted: Wed Feb 17 08:35:51 2010 CET
*************************************************************
Jobs submitted via local WMS appear to be OK, so we suspect some
miscommunications between external WMS and the CE or WNs.
In Derek's absence I am trying to solve this problem so I am asking for
any possible hints on this list. An option would be to restart services
on (or reboot) that machine...
Many thanks,
Catalin Condurache
RAL Tier1 Grid Services
|