thanks for the info andrew. wed do not have a tape storage. think the
problem is just high load of dcache and/or fts at ral. thing which
seems stanege is that i started with having the fts channel to allow
six concurrent file transfers at once.
i then also saw that the first six files of any submit were fine but
the following were broken.( though the odd one here and there get
through eoither first time rouind or on the retry)
regards
brian
On 18/11/05, Andrew C. Smith <[log in to unmask]> wrote:
> Hi Brian,
>
> I have been working with FTS for the last three months or so
> integrating it with
> LHCb's grid software so I can appreciate the confusion with the errors it
> returns. The one you mention is a particularly special though as it does
> finally insist that the error was a success(?!)...
>
> Failed on SRM put: Failed To Put SURL. Error in srm__put: SOAP-ENV:Client -
> CGSI-gSOAP: Error reading token data: Success
>
> There are a few other errors that I have come across regularly since using the
> FTS:
>
> 'Failed on SRM get: SRM getRequestStatus timed out on get'
>
> This error is often occurs when trying to transfer a file that is not already
> staged in disk cache. When the FTS goes to the SRM it performs an SRM Get and
> will wait for 1 minute for the SRM to return the tURL. If the file is on tape
> the SRM will retrieve the file to disk before returning the tURL. The problem
> is that the file retrieval from tape will most likely take longer than the 1
> minute the FTS Agent waits for. There is no state in the FTS state machine to
> allow it to deal with a non-blocking 'stage-in'. To overcome this problem LHCb
> staged the data to be transferred for SC3 in CERN Castor. The staging problem
> is known by the developers and changes to the state machine to support
> asyncronous staging in is on the list of things to be added.
>
> There are also a few similar errors to the one you mentioned this
> morning which
> result from the FTS service not being able to contact the endpoint, possibly
> due to a large load:
>
> Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping:
> SOAP-ENV:Client - CGSI-gSOAP: Error reading token data: Success
>
> Failed on SRM put: Failed To Put SURL. Error in srm__put: SOAP-ENV:Client -
> CGSI-gSOAP: Error reading token data: Success
>
> SRMPUT Operation Timed out.
>
> TRANSFER - Transfer time out
>
> Failed on SRM put: SRM getRequestStatus timed out on put
>
> If all the files in your transfer are getting similar errors to this
> the machine
> is probably down or the service misconfigured. Although usually it is
> only a few
> files (if any) that experience this. Resubmitting the transfer will usually
> result in success.
>
> There are a couple of errors which can cause a little bit of trouble:
>
> Failed on SRM put: SRM getRequestStatus timed out on put; also failing to do
> 'advisoryDelete' on target
>
> Can occur when the load on the target machine is high. The problem being that
> when the FTS fails to do the advDel you are left with a (corrupted) file entry
> in the SE filesystem. For Castor systems this is not a problem because castor
> allows files to be overwritten by default. But, with dCache this is not the
> case and if the transfer is retried you get the following:
>
> Failed on SRM put: Failed SRM put on
> httpg://srm.grid.sara.nl:8443/srm/managerv1 ; id=-2147372632 call. Error is
> RequestFileStatus#-2147372631 failed with error:[ GetStorageInfoFailed : file
> exists, cannot write ]
>
> Which means you have to delete the file by hand before resubmitting, which can
> be annoying if you are transferring 1000s of files. This problem was raised at
> a recent FTS workshop and the developers are working to make the differing
> behaviour of Castor/dCache/DPM SRMs transparent to the users of FTS.
>
> It was also mentioned at the workshop that some of the experiments
> would like to
> see more instructive error messages so after a while these might dissapear.
> Until then you will have to put up with 'Error: Success' ;)
>
> Cheers,
> Andrew.
>
> Quoting Derek Ross <[log in to unmask]>:
>
> > Fraser Speirs wrote:
> >> On 18 Nov 2005, at 11:54, Brian Davies wrote:
> >>
> >>> Failed on SRM get: Failed To Get SURL. Error in srm__get: SOAP-
> >>> ENV:Client - CGSI-gSOAP: Error reading token data: Success
> >>> Also get them with
> >>> Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping:
> >>> SOAP-ENV:Client - CGSI-gSOAP: Error reading token data: Success
> >>
> >>
> >> Hi Brian,
> >>
> >> Yes, I've seen that today as well:
> >>
> >>> Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping:
> >>> SOAP-ENV:Client - CGSI-gSOAP: Error reading token data: Success
> >>
> >>
> >> This is a transfer on the RAL-GLA channel from
> >> dcache.gridpp.rl.ac.uk to our DPM on se2-gla.scotgrid.ac.uk.
> >>
> >> I know Jamie's seen the same thing under his DN.
> >>
> >
> > Hi,
> >
> > They're due to the FTS timing out talking to the SRM at
> > dcache.gridpp.rl.ac.uk due to that system being heavily loaded. We're
> > looking at how to improve this.
> >
> > Derek
> >
>
|