Heya,
>
> Did you get anywhere with this problem?
I'm afraid not, after finding some hopeful threads in LCG-ROLLOUT I
found that RALPP had seen a similar sounding problem but could only
clear it with a reboot (which I tried, but no joy).
For some reason the WMS can submit jobs for Lancaster but isn't getting
updates about these jobs' statuses. I can't see anything wrong on the
cream (which is due for a reinstall soonish, but not for a fortnight and
due to how this CE integrates with the local LSF cluster I can't move up
the upgrade). So any help would be appreciated (even if it's pointing
out the obvious).
Thanks,
Matt
>
> Jeremy
>
>
> On 23 Aug 2012, at 12:25, Matt Doidge wrote:
>
>> Heya all, once again I come to my peers in search of aid.
>>
>> One of our CEs at Lancaster (the one in front of an LSF cluster), after crashing at the weekend, hasn't been right and is consistently failing the "JobSubmit" tests (although passing all the other tests). The failures are happening on tests from both nagios servers, and other sites aren't seeing this problem, so it's definitely us that's bad. The machine in question is a crusty glite 3.2 cream CE due for a reinstall, but I wasn't planning on upgrading it for a month (partly due needing to understand the risks to the cluster posed my reinstalling a licence holding node).
>>
>> The server in question is running atlas jobs fine, so it's not inherently broken, and I can't see anything exciting in the logs. The tests seem to get to the point where a jobid is returned, then the tests time out after a few hours. Checking the progress of one of these jobs I see that it lasted a few minutes and completed with a "DONE-OK", and I see nothing exciting leftover in the sandbox.
>>
>> I thought that perhaps the lb daemons weren't running, but the bnotifier and& bupdater daemons appear to be doing their job - they're running and the relevant logs are updating.
>>
>> Links to the failed tests:
>> https://gridppnagios.lancs.ac.uk/nagios/cgi-bin/extinfo.cgi?type=2&host=abaddon.hec.lancs.ac.uk&service=org.sam.CREAMCE-JobSubmit-%2Fops%2FRole%3Dlcgadmin
>> https://gridppnagios.physics.ox.ac.uk/nagios/cgi-bin/extinfo.cgi?type=2&host=abaddon.hec.lancs.ac.uk&service=org.sam.CREAMCE-JobSubmit-%2Fops%2FRole%3Dlcgadmin
>>
>>
>> Has anyone had this issue with this test before? I'm fairly stumped and would greatly appreciate some help or insight!
>>
>> Thanks in advance,
>> Matt
|