On Thu, 6 Sep 2007, Jeff Templon wrote:
> Yo *,
>
> So Ronald and I just spent a thoroughly enjoyable hour trying to figure
> out why the load on the CE has been higher than we feel comfortable
> with, the last little while ... and at some point, tracing lots of perl
> processes that seemed to be taking a lot of time, descended into the
> gram_job_state directory ... where we found 28,800 files laying around,
> despite the fact that we only have a few hundred active jobs. Throwing
> away the bulk of these files resulted in a factor of three decrease in
> the load on the CE machine.
>
> I got a strange sense of deja vu while doing all this, and indeed, it's
> not the first time. I reproduce for you below, verbatim, a message from
> almost precisely three years ago, containing an analysis of the problem.
Was there any answer then?
> Is there any new collective wisdom on why this problem happens? Why
> is it still happening??
I suspect the "pbs" job manager has a larger probability of getting you
into this situation than the "lcgpbs" job manager.
In any case we should add a cleanup of gram_job_state etc. to the
cleanup-grid-accounts cron job. Please open a bug in Savannah.
|