Hi, > -----Original Message----- > From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] > On Behalf Of Alvise Dorigo > Sent: 30 September 2011 13:38 > > Restarting tomcat did not seem to fix it, though restarting the node > did > > do you mean the CE node ? Yes, full reboot of the host. > > (although since the error is apparently the WMS refusing to submit to > the > > CreamCE it is possible that blacklisting expired after my test job > after > > restarting tomcat and before my test job after restarting the node). > > The glite-cream-ce.log shows connections from the WMS in question > after I > > submit the job (only for delegation) with no apparent failures but no > > attempt to submit a job. (See attached fragments of the log file) > > Please remember that a CE remain in the Blacklist for 30 minutes (only > EventQuery is allowed to that CE during this period). > Yes, that's one of the things making getting to the bottom of this so hard. For this current incident I submitted a job to heplnx206.pp.rl.ac.uk via wms208.cern.ch at 10:19 and had it fail. I then rebooted the node and submitted another job at 10:31 which succeeded. However We still see SAM test failures via wms208.cern.ch at 11:15 and 12:15 before it succeeds at 13:15 (http://bit.ly/ppz5tW). After the reboot I made no other interventions on the CreamCE. Due to other changes at the site I don't really have the luxury of putting the node into unscheduled downtime and restarting services one at a time and waiting a couple of hours to see which one fixes the problem at the moment. Yours, Chris.