Hi,
I went through the latest LCG report and tried to comment on errors
seen in the UK.
http://lcg-testzone-reports.web.cern.ch/lcg-testzone-reports/cgi-bin/lastreport.cgi
BHAM - OK
BRUNEL - NotOK
CAM - NotOK
Imperial - NotOK
LANCS - OK
LIV - NotOK
ManHEP - OK (both of them)
Oxford - OK
QMUL - NotOK
RAL - NotOK
RALPP - OK
RHUL - OK
Edin - NotOK
Glas - NotOK
Sheff - NotOK
UCL - OK (Both of them)
and now the details of the failures.
+ Brunel - Hi Paul
A failure on the web page but it may be working now. To many job
queued to understand if it works though. One is running so that is
a good sign.
+ Cambridge - Hi Santanu
Much the same as above but I don't understand why there are only 5
LHCB jobs, some free CPUs and lots of short dteam jobs waiting?
+ Imperial - Hi Barry
It looks like you can not find yourself. A worrying condition..
I see you are using your own BDII , gw37.hep.ph.ic.ac.uk
which is fine.
In fact
ldapsearch -x -H ldap://gw37.hep.ph.ic.ac.uk:2170 \
-b 'Mds-vo-name=local,o=Grid'
is completley empty.
The BDII should be pointing at the whole world these days.
ie, your site-cfg.h should contain
#define SITE_BDII_URL http://grid-deployment.web.cern.ch/grid-deployment/gis/lcg2-bdii/dteam/lcg2-all-sites.conf
If you already have this setting log into the BDII and
rm /opt/lcg/var/bdii/lcg-bdii-update.conf
to make sure you get the latest one at the above URL when the cron
runs.
+ Liverpool , Hi Andrew Michael
Jobs still waiting in queue but there appears to be from
http://goc.grid.sinica.edu.tw/gstat/hepgrid2.ph.liv.ac.uk/
a 100 CPUs and no jobs waiting.
Confused but I don't even seem to be able to get acaesss.
globus-job-run hepgrid2.ph.liv.ac.uk /usr/bin/qstat -q
GRAM Job submission failed because authentication with the \
remote server failed (error code 7)
+ QMUL
Looks to be a transient error reaching the MDS system
on lxn1189.cern.ch.
You want to change the very less busy and identical
lcgbdii02.gridpp.rl.ac.uk.
More generally the LCG version is now quite old.
+ RAL
This error makes no sense and is just a load of rubbish, ignoring
for now.
+ Edin , Hi Steve
Looks to be much the same as QMUL above.
+ Glas , Hi Fraser
Looks like ERROR contains is not defined
lcg-cr -v --vo dteam -d WN_DEFAULT_SE_DTEAM
So this implies that $VO_DTEAM_DEFAULT_SE is not set in the
end users environment for some reason.
+ Sheff, Hi Matt
This has the now famous error on an lcg-del of
lcg_del: Invalid argument
which of course means that the information system failed?
So it looks the same as QMUL above.
Steve
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
|