Print

Print


Hi, all,

There is problem with our torque job manager.  In our workernodes,  I
got such information with "pbsnodes -a" :

 ====================================================================================
wn106.grid.ucy.ac.cy
     state = free
     np = 2
     properties = lcgpro
     ntype = cluster
     jobs = 0/3940.ce101.grid.ucy.ac.cy, 1/3256.ce101.grid.ucy.ac.cy
     status = arch=linux,uname=Linux wn106.grid.ucy.ac.cy
2.4.21-20.ELsmp #1 SMP Thu Sep 2 16:47:25 CDT 2004
i686,sessions=3359,nsessions=1,nusers=1,idletime=151543,totmem=3065116kb,availmem=2612552kb,physmem=1024872kb,ncpus=4,loadave=0.00,rectime=1108626381

 ======================================================================================

You can see, there are two jobs running on wn106, BUT the state is still
FREE. Thus the jobs continue comes to my CE, makes the queue full of jobs.

ce101.grid.ucy.ac.cy:
                                                            Req'd
Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time
S Time
--------------- -------- -------- ---------- ------ --- --- ------ -----
- -----
104.ce101.grid. lhcb001  lhcb     STDIN        7798   1  --    --  48:00
E 45:59
2190.ce101.grid lhcb001  lhcb     STDIN       20952   1  --    --  48:00
E 10:14
3256.ce101.grid lhcb001  lhcb     STDIN       23316   1  --    --  48:00
E 00:00
3870.ce101.grid lhcb001  lhcb     STDIN       10357   1  --    --  48:00
E 15:08
3872.ce101.grid lhcb001  lhcb     STDIN       11166   1  --    --  48:00
E 15:03
3940.ce101.grid lhcb001  lhcb     STDIN       30751   1  --    --  48:00
E 06:18
4014.ce101.grid dteam002 short    STDIN        2826   1  --    --  00:15
E 00:00
4652.ce101.grid lhcb001  lhcb     STDIN         --    1  --    --  48:00
Q   --
4653.ce101.grid lhcb001  lhcb     STDIN         --    1  --    --  48:00
Q   --
4654.ce101.grid lhcb001  lhcb     STDIN         --    1  --    --  48:00
Q   --
4715.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4716.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4717.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4718.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4719.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4720.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4721.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4722.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4723.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4724.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4725.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4727.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4726.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4728.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4729.ce101.grid lhcb002  lhcb     STDIN         --    1  --    --  48:00
Q   --
4929.ce101.grid dteam021 short    qsub.test.    --   --  --    --  00:15
Q   --

===========================================================

Does any one have any idea about it?


Regards

Wei


--
============================================================
Wei Xing, M.Sc.
Research Associate                    Tel: 00357-22892663
Dept. of Computer Science             Fax: 00357-22892701
University of Cyprus                  email: [log in to unmask]
PO Box 20537
CY1678, Nicosia, CYPRUS