Hi Chris,
that explains the cancelled jobs I see (it would be nice to have
something of a notice for these cases), but the failures must then
have come before you turned it off, so it should be unrelated to that.
Do you have anything to add on what Sam wrote:
"It is likely that the issue is not with se03, but with QMUL WNs <->
the RAL and Imperial Top-level BDIIs." ?
Cheers,
Gustav
2011/1/17 Christopher J.Walker <[log in to unmask]>:
> Gustav Wikström wrote:
>> Hi Chris,
>>
>> OK, but shouldn't the wms take care of that, and not send jobs to QMUL then?
>>
> It should and in fact I killed the few remaining jobs before shutting
> things down.
>
> If you are experiencing problems at the moment, then something is wrong
> with the jobs. I've stopped the queues, so no jobs are running at the
> moment.
>
> Assuming the jobs specify their data requirements, then they should not
> be trying to pull data from QMUL.
>
> Chris
>
>> Cheers,
>> Gustav
>>
>> 2011/1/17 Christopher J.Walker <[log in to unmask]>:
>>> Sam Skipsey wrote:
>>>> Ah, this issue.
>>>>
>>>> 2011/1/17 Gustav Wikström <[log in to unmask]>:
>>>>> Hi experts,
>>>>>
>>>>> I'm having big trouble with my grid jobs running on qmul. The jobs
>>>>> seem to run ok but in the end lcg-lr fails:
>>> QMUL is (or at least should be ) in downtime for a power outage tomorrow
>>> morning. I've turned the SE off. That would explain any problems now,
>>> but not any before this morning.
>>>
>>>
>>> Scheduled to be back Wednesday evening - but will probably be back before.
>>>
>>> Chris
>>>
>>>>> lcg-cr -d srm://se03.esc.qmul.ac.uk//t2k.org/nd280/v8r5p11/unpk/ND280/ND280/00005000_00005999//oa_nd_spl_00005007-0003_ot3a2qrmcuec_unpk_000_v8r5p11.root
>>>>> -l lfn:/grid/t2k.org/nd280/v8r5p11/unpk/ND280/ND280/00005000_00005999/oa_nd_spl_00005007-0003_ot3a2qrmcuec_unpk_000_v8r5p11.root
>>>>> oa_nd_spl_00005007-0003_ot3a2qrmcuec_unpk_000_v8r5p11.root
>>>>>
>>>>> ['srm://se03.esc.qmul.ac.uk//t2k.org/nd280/v8r5p11/unpk/ND280/ND280/00005000_00005999//oa_nd_spl_00005007-0003_ot3a2qrmcuec_unpk_000_v8r5p11.root:
>>>>> Invalid argument\n', 'lcg_cr: Invalid argument\n']
>>>>>
>>>> This error (which is horribly non-specific) is an issue with
>>>> communication with the BDII used to get information about the source
>>>> and destination systems.
>>>> It is likely that the issue is not with se03, but with QMUL WNs <->
>>>> the RAL and Imperial Top-level BDIIs.
>>>>
>>>> I'll let Chris comment on what end the problem is at...
>>>>
>>>> Sam
>>>>
>>>>> A few files end up on se03, so not all lcg-lr fails, but the vast majority does.
>>>>> The jobs that end up on RAL are copied without problems to
>>>>> srm-t2k.gridpp.rl.ac.uk/castor/ads.rl.ac.uk/prod.
>>>>>
>>>>> Is it just se03.esc.qmul.ac.uk being flaky or is lcg-cr not to be run on se03?
>>>>>
>>>>> Any help appreciated!
>>>>> Cheers,
>>>>> Gustav
>>>>>
>
|