Hi Maarten,
> Might there be a hardware problem?
>
> Indeed, but check that /etc/grid-security/vomsdir is up to date,
> e.g. with the latest lcg-vomscerts rpm and/or correct *.lsc contents:
>
> http://goc.grid.sinica.edu.tw/gocwiki/Generic_verification_error_for_VOMS_%28failure%29%21
>
> Did you configure your CE _only_ as CE or also as a UI or so?
indeed, we thought about this earlier, and found the latest event
related memory dimm ecc error while it's detect at 2007 and we might have
replace the memory module since last year in the batch for all blade
server.
03/09/2007 17:15:53 Major 32777 DIMM-4 Correctable ECC Error.
03/08/2007 19:53:08 Major 32777 DIMM-4 Correctable ECC Error.
thanks for the wiki page, indeed, we have 4.9.0 deployed rather than 5.0
as found in altest slc4 repos. i will updating the vomscerts later to
resolve the generic verification voms error.
and the CE is serving only CE, we dont bind any service on same box, the
load should have reduced to normal mark now, as shown below:
14:05:44 up 2:47, 0 users, load average: 9.25, 10.26, 10.31
914 processes: 898 sleeping, 15 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 327.6% 0.0% 68.0% 0.0% 1.6% 0.0% 2.4%
cpu00 88.4% 0.0% 10.7% 0.0% 0.8% 0.0% 0.0%
cpu01 77.6% 0.0% 19.8% 0.0% 0.0% 0.0% 2.4%
cpu02 82.5% 0.0% 16.6% 0.0% 0.8% 0.0% 0.0%
cpu03 79.1% 0.0% 20.8% 0.0% 0.0% 0.0% 0.0%
Mem: 4087856k av, 3365168k used, 722688k free, 0k shrd, 463888k buff
2432872k active, 480492k inactive
Swap: 4192956k av, 0k used, 4192956k free 836476k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
23769 cms001 25 0 88880 86M 1832 R 39.9 2.1 2:52 1 perl
12041 atlasprd 24 0 16744 16M 948 R 39.1 0.4 0:00 0 qstat
6452 root 16 0 1184 1148 900 S 18.3 0.0 0:48 3 pbs_mom
4025 atlasprd 21 0 7948 7948 1812 S 16.6 0.1 0:27 2 perl
6919 root 16 0 3172 3172 760 S 13.3 0.0 20:17 0 edg-fmon-agen
Br,
J
|