Il 29/08/12 18.05, Matt Doidge ha scritto:
> Hello,
> Our cream CE, since having a crash the other week, has stopped
> reporting job statuses to WMSes. This is causing us to fail ops tests
> as, although jobs are successfully submitted and a jobid is returned
> to the WMS the WMS never recieves word that the job completes and thus
> the "JobSubmit" test eventually times out. Checking the logs shows
> that the jobs achieve a "Done-OK" status. I've restarted services,
> made sure everything is running (tomcat, BNotifier, BUpdater).
> Following some anecdotal advice from this list I even tried rebooting
> the node, but nothing helps.
>
Hi Matt; I've some question:
- which log files did you check to verify that the jobs are in doneok ?
- are the CEs CREAM-CE ?
- if yes to prev question, is ICE running on the WMS node ?
thanks,
Alvise
> The cream CE is an older glite 3.2 version (3.2.10-0) due for a
> reinstall soon so it's not worth tearing apart to try to fix it, but
> on the other hand we're not in a position to do the upgrade just yet
> so would like to try to keep the CE going for a few more weeks. The
> cream is in front of an LSF batch system (just to be awkward).
>
> No changes to firewalls or anything like have been done, the cream
> crashed (stopped accepting connections, writing to logs or doing
> anything) and required a restart of all services.
>
> Any help would be appreciated, it seems like it should be a little
> thing that we're missing but I can't for the life of me think what it
> is, and I can't find any answers in the usual places (i.e. google and
> the lcg-rollout archives).
>
> Thanks in advance,
> Matt
|