Hi Maarten and Jan,
> > > Otherwise jobs will not be cleaned up when the load is high, which could
> > > lead to an upward spiral. For example, the RB/WMS or the user may start
> > > canceling jobs, which cannot be cleaned up immediately, and new jobs may
> > > be sent instead, which also have to wait... Meanwhile the list of jobs
> > > gets longer, so it takes more and more time for the jobmanager to loop
> > > through them.
> >
> - it may take up to 1 hour before the change becomes effective, when the
> current grid_monitor for cms001 is restarted;
>
> - the load probably will become higher for (quite) a while, because the
> cleanup of jobs adds to the load.
just to let you know that two patch we have tried yesterday not able to
cease the situation. the load of CE incease to more than 600 earlier this
morning (local time). and pending jobs increase upto 8k now. i have no
idea why have to submit more than 8k jobs to zero free slots batch farm,
and i am forwarding this thread also to cms facility ops later.
1 cms001 H
8349 cms001 Q
939 cms001 R
2 cms011 R
1 cms027 Q
1 cms033 Q
8 cms034 Q
7 cms038 R
2 cms039 Q
196 cmsprd R
1 cmsprd W
will try blocking the DN referring to the mass submission. fyi,
ps: for the cpu load, you can refer to ganglia plot at
https://osadm.grid.sinica.edu.tw/ganglia/?r=day&c=Taipei+LCG2&h=w-ce01.grid.sinica.edu.tw
Br,
J
|