The thing is that I don't know even how to debug things like this
especially as they are intermittent faults. Is there anyway to find out
more information?
I have cc'ed the UK tb-support mailing list in case there is anybody
there who can offer some advice.
All the best,
david
Roman Poeschl wrote:
> Roman Poeschl wrote:
>
> ... and again.
>
> SE (SRM) service not found for host : grid05.lal.in2p3.fr
> CGSI-gSOAP: Error reading token data: Success
>
>
> this time on
>
> SL4 architecture detected
> 100
> SRM Server is: srm://srm-dcache.desy.de//pnfs/desy.de
> Default SE is: grid05.lal.in2p3.fr
> ############################################################################
>
> System information
> ==================
> Host:
> wd31.hep.ph.ic.ac.uk
> CPU(s):
> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
> RAM:
> 4036900 kB
> Swap:
> 2097144 kB
> ############################################################################
>
>
>> Mona Aggarwal wrote:
>> Hi Mona,
>>
>> thanks for your reply. In this very case the job failed during
>>
>> lcg-cr --verbose -d grid05.lal.in2p3.fr -l
>> /grid/calice/poeschl/pcbtestana/log/therec.steer.331473.tar.gz --vo
>> calice file:$CWD/therec.steer.331473.tar.gz
>>
>> where $CWD is the working directory in which the job was executed,
>>
>> But it happens also during lcg-cp actions. I also don't claim that
>> there is smthg. wrong at your site (or at LAL).
>> But the error message occurs (so far) only when accessing the LAL SE
>> and very often for jobs running at your site.
>> Mysterious in any case and least we should follow it up until the
>> issue is clarified even if it takes a while. It is no showstopper
>> for me but as you can imgaine a bit annoying
>>
>> Thanks for your support.
>>
>> Cheers,
>>
>> Roman
>>> Roman Poeschl wrote:
>>>> Dear Experts,
>>>>
>>>> Since quite some time I stumble not always but regularly over the
>>>> following
>>>> failure during exeution of my grid jobs
>>>>
>>>> SE (SRM) service not found for host : grid05.lal.in2p3.fr
>>>> CGSI-gSOAP: Error reading token data: Success
>>>>
>>>> In this very case a job running at imperial tried to (read) and
>>>> write a file to the SE at LAL.
>>>>
>>>> In this case the job failure occured roughly on Wed Nov 21 18:59:33
>>>> GMT 2007
>>>> and was running on
>>>>
>>>> SL4 architecture detected
>>>> 100
>>>> SRM Server is: srm://srm-dcache.desy.de//pnfs/desy.de
>>>> Default SE is: grid05.lal.in2p3.fr
>>>> ############################################################################
>>>>
>>>> System information
>>>> ==================
>>>> Host:
>>>> wd33.hep.ph.ic.ac.uk
>>>> CPU(s):
>>>> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
>>>> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
>>>> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
>>>> Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
>>>> RAM:
>>>> 4036900 kB
>>>> Swap:
>>>> 2097144 kB
>>>> ############################################################################
>>>>
>>>>
>>>> These kind of failures appear very often when the job is running at
>>>> the imperial site (though I have
>>>> also already observed it for other sites). But it occurs only during
>>>> the communication with the LAL SE grid05.
>>>>
>>>> Does someone have an idea about the reason for the problem. The LAL
>>>> IT suspects one of the
>>>> following reasons
>>>>
>>>> 1. Error of the server grid05. They somehow exclude this error since
>>>> it does not appear
>>>> during the regular SAM tests.
>>>>
>>>> 2. Temporary network problem. Might be, but I never see this message
>>>> in conjunction with
>>>> other SE s, e.g. desy or lyon dcache. Ok, a desy or lyon I use the
>>>> dcache system, i.e. another storage backend and thus
>>>> maybe a slightly different management of the data transfer.
>>>> I never use other SEs, might be worthwhile to test.
>>>>
>>>
>>> You can use Imperial SE.
>>>
>>> Endpoint:
>>>
>>> srm://gfe02.hep.ph.ic.ac.uk:8443/pnfs/hep.ph.ic.ac.uk/data/calice
>>>
>>>
>>>> 3. An obsolete version of the lcg-utils on the client side.
>>>>
>>>> I know that this is maybe not so easy to answer but at least we may
>>>> want to keep an eye on it.
>>>>
>>> lcg-utils installed on the WN's is the latest version.
>>>
>>> Could you please send us the exact command
>>> you are using to read/write files
>>> from the job running at Imperial.
>>>
>>> Regards,
>>> Mona
>>>
>>
>>
|