On Fri, Aug 19, 2005 at 11:21:48AM +0100 or thereabouts, Steve Traylen wrote: > On Thu, Aug 18, 2005 at 07:42:31PM +0100 or thereabouts, Dr D J Colling wrote: > > Hi, > > > > A few weeks ago I was trying to do some CMS production the last stage of > > which was to copy the output to the storage element at RAL....and we had > > lots of failures in the copy. In the end we cheated and forced all the > > jobs to go to the RAL CE and copy to the RAL SE. > > > > This worked (as you would hope) however it didn't seem to be very > > Grid-like so last night and today I submitted lots (hundreds) of very > > short jobs that just tried doing an lcg-cr to the RAL dcache. Most (a far > > greater fraction a few weeks ago) copied the files successfully. Those few > > that failed failed for two reasons: I've noticed the BDII is a little bit stressed out. http://ganglia.gridpp.rl.ac.uk/?c=LCG%20Others&h=lcgbdii02.gridpp.rl.ac.uk&m=%5Bnone%5D&r=day&s=descending and there is a spike in the return times. http://goc.grid.sinica.edu.tw/gstat/RAL-LCG2/BDIINode_Perf_tim_.html Which corresponds about to when your lump of CMS jobs landed here. http://ganglia.gridpp.rl.ac.uk/specials/pbs.php?h=OpenPBS%20server%2fcsflnx353.rl.ac.uk&m=%5Bnone%5D&r=day&s=descending How many replications are we talking about per job here? We can look into some load balancing or something if it just a matter of speed but I expect it is blocking or something.... Looking. Steve > > > > 1. > > > > SE type not found > > lcg_cr: Invalid argument > > > > This was the one that I saw most of when trying to do the MC production. > > However, there are far fewer of these now. This seemed to for a whole site > > rather than individual nodes. > > > > 2. > > SE endpoint not found > > SE endpoint not found > > SE endpoint not found > > > > Usually repeated three times as shown. > > Hi Dave, > > I don't know the answers. GIS of course could help but basically they > are all information failures of one sort or another. > > Steve > > > > Does anybody know what causes these two errors? How can I protect against > > them? The first seemed to be for all nodes at a site so retrying would not > > help whereas the second seemed to be transitory. > > > > Sorry if these are "Numpty" questions answered elsewhere ... if they are > > please could somebody me to this information. > > > > All the best and thanks for your help, > > david > > > > PS For Stephen Burke: > > Numpty Dumpty didn't have a great fall ... he was hit by a car. > > -- > Steve Traylen > [log in to unmask] > http://www.gridpp.ac.uk/ -- Steve Traylen [log in to unmask] http://www.gridpp.ac.uk/