Dear Site Managers,
The following sites have configuration problems affecting the
current LHCb DC04, the order is my personal priority ranking:
CNAF: All submitted jobs beyond 337 (currently running) are
aborted with error:
https://egee-rb-01.cnaf.infn.it:9000/FAlLl42Ar1maOGiJJFuSxA
---
Event: Done
- exit_code = 1
- host = egee-rb-01.cnaf.infn.it
- reason = Cannot read JobWrapper output, both from
Condor and from Maradona.
- source = LogMonitor
- src_instance = unique
- status_code = FAILED
- timestamp = Wed Aug 18 04:07:07 2004
- user = /C=ES/O=DATAGRID-ES/O=UB/CN=Ricardo
Graciani
---
likely a WN configuration problem
SINP: All submitted jobs get aborted. Running queue is getting
empty (problem appear yesterday a around 15:00 (UTC).
https://lxn1182.cern.ch:9000/BgWmfx2xdDUfwYwIrv3JWg
---
Event: Done
- exit_code = 1
- host = lxn1182.cern.ch
- reason = Got a job held event, reason: Unspecified
gridmanager error
- source = LogMonitor
- src_instance = unique
- status_code = FAILED
- timestamp = Wed Aug 18 05:42:40 2004
- user = /C=ES/O=DATAGRID-ES/O=UB/CN=Ricardo
Graciani
---
QMUL: New Jobs get Aborted, we can load the queue with more than
~260 jobs, while there are still 31 free CPUs.
https://lcgrb01.gridpp.rl.ac.uk:9000/GPIpNpSt1lcxcon73UCnyw
---
Event: Done
- exit_code = 1
- host = lcgrb01.gridpp.rl.ac.uk
- reason = Got a job held event, reason: Globus
error 94: the jobmanager does not accept any new requests (shutting
down)
- source = LogMonitor
- src_instance = unique
- status_code = FAILED
- timestamp = Wed Aug 18 05:47:32 2004
- user = /C=ES/O=DATAGRID-ES/O=UB/CN=Ricardo
Graciani
---
[rgracian@lxb2007 LCG-Job-Submit]$ grep -c qmul
Aborted/2004_08_18_production.details
666
[rgracian@lxb2007 LCG-Job-Submit]$ grep -c cnaf
Aborted/2004_08_18_production.details
244
[rgracian@lxb2007 LCG-Job-Submit]$ grep -c msu
Aborted/2004_08_18_production.details
936
RAL: mostly not visible from RB. Ldap server very slow.
http://goc.grid.sinica.edu.tw/gstat/lxn1194.cern.ch/
TAU: not visible from RB's, slow response of ldap server
http://goc.grid.sinica.edu.tw/gstat/lcfgng.cs.tau.ac.il/
CNB: lhcb accounts miss-configured on WNs (working on them)
IFCA: wrong SI00 forbids scheduling
Please take the necessary actions to correct the problems.
Regards
Ricardo
|