On 18/08/05, Dr D J Colling <[log in to unmask]> wrote:
> Hi,
>
> A few weeks ago I was trying to do some CMS production the last stage of
> which was to copy the output to the storage element at RAL....and we had
> lots of failures in the copy. In the end we cheated and forced all the
> jobs to go to the RAL CE and copy to the RAL SE.
>
> This worked (as you would hope) however it didn't seem to be very
> Grid-like so last night and today I submitted lots (hundreds) of very
> short jobs that just tried doing an lcg-cr to the RAL dcache. Most (a far
> greater fraction a few weeks ago) copied the files successfully. Those few
> that failed failed for two reasons:
>
> 1.
>
> SE type not found
> lcg_cr: Invalid argument
>
> This was the one that I saw most of when trying to do the MC production.
> However, there are far fewer of these now. This seemed to for a whole site
> rather than individual nodes.
>
> 2.
> SE endpoint not found
> SE endpoint not found
> SE endpoint not found
>
Is this FTS retrying ? ( default is that fts tries three times then
goes to HOLD state)
If you have access to your channel ( or know who does) see if the
files associated with this errror map to the files in the HOLD state
of FTS for your channel
> Usually repeated three times as shown.
>
> Does anybody know what causes these two errors? How can I protect against
> them? The first seemed to be for all nodes at a site so retrying would not
> help whereas the second seemed to be transitory.
>
> Sorry if these are "Numpty" questions answered elsewhere ... if they are
> please could somebody me to this information.
>
> All the best and thanks for your help,
> david
>
> PS For Stephen Burke:
> Numpty Dumpty didn't have a great fall ... he was hit by a car.
>
|