Hi,
A few weeks ago I was trying to do some CMS production the last stage of
which was to copy the output to the storage element at RAL....and we had
lots of failures in the copy. In the end we cheated and forced all the
jobs to go to the RAL CE and copy to the RAL SE.
This worked (as you would hope) however it didn't seem to be very
Grid-like so last night and today I submitted lots (hundreds) of very
short jobs that just tried doing an lcg-cr to the RAL dcache. Most (a far
greater fraction a few weeks ago) copied the files successfully. Those few
that failed failed for two reasons:
1.
SE type not found
lcg_cr: Invalid argument
This was the one that I saw most of when trying to do the MC production.
However, there are far fewer of these now. This seemed to for a whole site
rather than individual nodes.
2.
SE endpoint not found
SE endpoint not found
SE endpoint not found
Usually repeated three times as shown.
Does anybody know what causes these two errors? How can I protect against
them? The first seemed to be for all nodes at a site so retrying would not
help whereas the second seemed to be transitory.
Sorry if these are "Numpty" questions answered elsewhere ... if they are
please could somebody me to this information.
All the best and thanks for your help,
david
PS For Stephen Burke:
Numpty Dumpty didn't have a great fall ... he was hit by a car.
|