On Jul 21, 2008, at 10:48 AM, Arnau Bria wrote: > Hi all, > > last week we some problems with our CEs. They had high load > average (our record: 163). > > > Our new one, with gLite 3.1, was with a value of 50 > > [root@ce05 ~]# uptime > 12:17:44 up 2 days, 2:06, 3 users, load average: 49.77, 22.96, > 12.31 > > As some users have end points hardcoded we see thousands of queries > to a > CE from same user. i.e lhsgm003 ran 4830 jobs in our batch system last > Friday. And we have 4948 queries to our CE in that day. But we had > more > queries from other users: > > 1304 atprd020 > 353 cmprd029 > 54 cms019 > 69 cms057 > 95 cms072 > 52 cms086 > 17 cms098 > 45 cms100 > 48 cms127 > 128 cms163 > 17 cms167 > 7 dteam004 > 24 dteam018 > 479 dteam020 > 41 lhcb016 > 25 lhcb050 > 11 lhcb080 > 19 lhcb089 > 6 lhcb104 > 5 lhprd011 > 13 lhprd025 > 1063 lhprd026 > 135 lhprd027 > 4948 lhsgm003 > 12 lhsgm004 > 1361 lhsgm006 > > [...] > > So in a moment we could have about 150 globus-job-managers running > at a > same time. And in our record (ce07 with a load average of 163, we saw > 2000 job-managers). > > so, my question is, what could we do for preventing this problem? what > could we do if we see this problem again? bann some users? > > > Cheers, > Arnau