Hi, all,
There is problem with our torque job manager. In our workernodes, I
got such information with "pbsnodes -a" :
====================================================================================
wn106.grid.ucy.ac.cy
state = free
np = 2
properties = lcgpro
ntype = cluster
jobs = 0/3940.ce101.grid.ucy.ac.cy, 1/3256.ce101.grid.ucy.ac.cy
status = arch=linux,uname=Linux wn106.grid.ucy.ac.cy
2.4.21-20.ELsmp #1 SMP Thu Sep 2 16:47:25 CDT 2004
i686,sessions=3359,nsessions=1,nusers=1,idletime=151543,totmem=3065116kb,availmem=2612552kb,physmem=1024872kb,ncpus=4,loadave=0.00,rectime=1108626381
======================================================================================
You can see, there are two jobs running on wn106, BUT the state is still
FREE. Thus the jobs continue comes to my CE, makes the queue full of jobs.
ce101.grid.ucy.ac.cy:
Req'd
Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time
S Time
--------------- -------- -------- ---------- ------ --- --- ------ -----
- -----
104.ce101.grid. lhcb001 lhcb STDIN 7798 1 -- -- 48:00
E 45:59
2190.ce101.grid lhcb001 lhcb STDIN 20952 1 -- -- 48:00
E 10:14
3256.ce101.grid lhcb001 lhcb STDIN 23316 1 -- -- 48:00
E 00:00
3870.ce101.grid lhcb001 lhcb STDIN 10357 1 -- -- 48:00
E 15:08
3872.ce101.grid lhcb001 lhcb STDIN 11166 1 -- -- 48:00
E 15:03
3940.ce101.grid lhcb001 lhcb STDIN 30751 1 -- -- 48:00
E 06:18
4014.ce101.grid dteam002 short STDIN 2826 1 -- -- 00:15
E 00:00
4652.ce101.grid lhcb001 lhcb STDIN -- 1 -- -- 48:00
Q --
4653.ce101.grid lhcb001 lhcb STDIN -- 1 -- -- 48:00
Q --
4654.ce101.grid lhcb001 lhcb STDIN -- 1 -- -- 48:00
Q --
4715.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4716.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4717.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4718.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4719.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4720.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4721.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4722.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4723.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4724.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4725.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4727.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4726.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4728.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4729.ce101.grid lhcb002 lhcb STDIN -- 1 -- -- 48:00
Q --
4929.ce101.grid dteam021 short qsub.test. -- -- -- -- 00:15
Q --
===========================================================
Does any one have any idea about it?
Regards
Wei
--
============================================================
Wei Xing, M.Sc.
Research Associate Tel: 00357-22892663
Dept. of Computer Science Fax: 00357-22892701
University of Cyprus email: [log in to unmask]
PO Box 20537
CY1678, Nicosia, CYPRUS
|