Filippidis christos wrote:
> hi again,
>
> i have an sft job with a "problem" right know (actually its "running from
> yesterday )
>
> the info i can get for this job is:
> (i dont know how to get more)
Do not trust the output of "qstat" or "pbsnodes": PBS/Torque has bugs and
occasionally it will get into a bad state. Login on the WN and look around
with "ps" etc.
> arxiloxos6.inp.demokritos.gr
> state = free
> np = 2
> properties = lcgpro
> ntype = cluster
> jobs = 0/3572.xg009.inp.demokritos.gr
> status = arch=linux,uname=Linux arxiloxos6.inp.demokritos.gr
> 2.4.21-32.0.1.EL.cernsmp #1 SMP Thu May 26 12:29:50 CEST 2005
> i686,sessions=2389
> 6108,nsessions=2,nusers=2,idletime=5778321,totmem=1554432kb,availmem=1111516kb,physmem=510216kb,ncpus=2,loadave=0.00,rectime=1129124056
>
> [root@xg009 root]# qstat
> Job id Name User Time Use S Queue
> ---------------- ---------------- ---------------- -------- - -----
> 3572.xg009 STDIN dteam002 00:00:22 R dteam
>
> you can see at
> https://lcg-sft.cern.ch:9443/sft/sitehistory.cgi?site=xg009.inp.demokritos.gr
> this cause many problems because for today i dont have new sft jobs
> probably because its seams that there are a dteam job that is running,
>
> if i delete this job then i will have new sft jobs util 18:00 and then it
> will happen the same
>
>
> thanks
> xristos
>
>
>
>>Hi Guys,
>>
>>I was trying to figure out why the test job could hang, but I must
>>admit that I was unable to reproduce the problem. Normally all tests
>>are killed automatically after 15 minutes by the SIGALRM signal
>>handler (the signal handler sends KILL signal to test process), and
>>when I try to simulate hanging tests everything works fine for me.
>>
>>Could you please check the list of running processes on the WN when
>>it happens next time? And if it's possible if you could also note
>>down the time when the job actually started to execute and when you
>>checked the process table...
>>This is the most obvious way we can investigate what is happening.
>>
>>Piotr
>>
>>On Oct 12, 2005, at 1:00 PM, Gerhard Walzel wrote:
>>
>>
>>>Judit
>>>I have exact the same problem on site Hephy-Vienna
>>>Just starting at 0015 !
>>>Last days I have simply removed the job to enable
>>>Sft tests again...
>>>Gerhard
>>>
>>>
>>>On 10/12/05 11:59 AM, "NOVAK Judit" <[log in to unmask]> wrote:
>>>
>>>
>>>
>>>>Hi Christos,
>>>>
>>>>
>>>>In the site history I can see two Job Submission failures,
>>>>both from last week. The last one run to a timeout (while gstat
>>>>reports many free CPUs -- is it all OK with the batch system?).
>>>>
>>>>
>>>>Judit
>>>>
>>>>
>>>>
>>>>
>>>>On k, okt 11, Filippidis christos wrote:
>>>>
>>>>
>>>>>hi to all,
>>>>>
>>>>>i have the following problem:
>>>>>
>>>>>our site here at demokritos is passing the sft but the last week
>>>>>every day
>>>>> when dteam002 "/c=ch/o=cern/ou=grid/cn=judit novak 0973" send
>>>>>an sft at
>>>>>18:00 the job never ends or it stop the next day and the result
>>>>>is CT or js
>>>>>
>>>>>the same time when i send an sft from this site:
>>>>>https://monitoring.egee.man.poznan.pl/
>>>>>everythink is ok,
>>>>>
>>>>>
>>>>>it is also strange that when judit novak send an sft at an
>>>>>other period
>>>>>of the day ,for example the morning, the sft is succesfull.
>>>>>
>>>>>do you have any ideas?
>>>>>
>>>>>thanks xristos
>>>>>
>>>>>
>>>>>Christos Filippidis
>>>>>NCSR DEMOKRITOS
>>>>>Institute of Nuclear Physics
>>>>>office block 6(ktirion 6)
>>>>>Gr-15310 Agia Paraskevi
>>>>>GREECE
>>>>>Tel:2106503425
>>>>>
>>>>>http://consult.cern.ch/xwho/people/117002
>>>>>http://www.inp.demokritos.gr/~filippidisx/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----------------------------------------------
>>>>>
>>>>>"Institute of Nuclear Physics NCSR Demokritos"
>>>>> http://www.inp.demokritos.gr/
>>>>>
>>>>>
>>>>>
>>>>>Christos Filippidis
>>>>>NCSR DEMOKRITOS
>>>>>Institute of Nuclear Physics
>>>>>office block 6(ktirion 6)
>>>>>Gr-15310 Agia Paraskevi
>>>>>GREECE
>>>>>Tel:2106503425
>>>>>
>>>>>http://consult.cern.ch/xwho/people/117002
>>>>>http://www.inp.demokritos.gr/~filippidisx/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>----------------------------------------------
>>>>>
>>>>>"Institute of Nuclear Physics NCSR Demokritos"
>>>>> http://www.inp.demokritos.gr/
>>>
>
>
> Christos Filippidis
> NCSR DEMOKRITOS
> Institute of Nuclear Physics
> office block 6(ktirion 6)
> Gr-15310 Agia Paraskevi
> GREECE
> Tel:2106503425
>
> http://consult.cern.ch/xwho/people/117002
> http://www.inp.demokritos.gr/~filippidisx/
>
>
>
>
>
> ----------------------------------------------
>
> "Institute of Nuclear Physics NCSR Demokritos"
> http://www.inp.demokritos.gr/
|