Yo Cristina,
Take a look at the LCG-ROLLOUT archives, September '07, for messages
with subjects containing the words "drane bamage". There is a lot of
information there, the executive summary is this:
"Hence (for the wiki) : if there are thousands of files lying around
in the gram_job_state tmp directory, missing a PBS job id, and causing
high load on the system, check the permissions on your batch queues."
From
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0709&L=LCG-ROLLOUT&T=0&O=A&P=127901
Actually the first message on this topic that I know of, is from 2004:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind04&L=LCG-ROLLOUT&T=0&O=A&P=2035038
J "perl in, perl out" T
On Apr 16, 2009, at 09:19 , Cristina Aiftimiei wrote:
> Hi,
>
> sorry to disturb you ... but having no answer from GGUS (https://gus.fzk.de/ws/ticket_info.php?ticket=47245
> ) I'm trying like this.
> Does any of you had problems like the ones described there? (I put
> bellow the content of the tkt)
>
> Hi,
>
> as presented in the tkts:46929, 45808:
> <tri.gif>https://gus.fzk.de/ws/ticket_info.php?
> ticket=46929&from=search
> <tri.gif>https://gus.fzk.de/ws/ticket_info.php?
> ticket=45808&from=search
>
> we had a prod-site that had problems with job-submission, solved
> only by removing the old files present in the /opt/globus/tmp/
> gram_job_state/ directory.
>
> The simptoms were that a submitted job managed to pass from the WMS
> to CE, on the CE - from globus to the batch-system (LSF 7.3),
> finished correctly,... and everything stoped here, with non error
> messeges to the user. The status presented allways the job in one of
> the states "Scheduled" or "Running"... but not the "Done" one.
>
> The number of the files accumulated in the directory /opt/globus/tmp/
> gram_job_state/ was ~31000. Once removed... the situation
> improved... but it's still a little slow in presenting the status
> "Done" to the user.
> I checked the comunication between the CE-WMS - it's working.
>
> The versions of CE, WMS are the last one released to the production
> (Update 41).
> Is there any way I could understand what happend - why the huge
> number of files in that directory?
>
> Thank you,
> Cristina
>
>
>
>
> --
> ---
> Cristina Aiftimiei - EGEE-III/ETICS-II Projects
> Ist. Naz. di Fisica Nucleare - Padova
> Address: via F. Marzolo, 8 - 35131 Padova - ITALY
> Phone: +39.049.8277005
> Mobile: +39.3460230488
|