Hi Santanu,
looking at your bdii now, I can see that you've got 287 running jobs and
126 jobs. Actually, if I'm not wrong, you should actually only have 41
jobs running and 18 jobs waiting as I write. I think
lcg-info-dynamic-condor is not clever enough to distinguish between VOs
(maybe due to condor itself?), and publishes #jobs x #VOS, ie 287 running
and 126 waiting jobs since you support 7 VOs (because of the foreach loop
at the end of lcg-info-dynamic-condor script). I read in some mailing list
that lcg-info-dynamic-condor was not written within EGEE, it could have
been by someone maybe present at the Condor week???
Anyway, this is only one part of the problem. What scritps have you got in
/opt/lcg/var/gip/plugin/ ? Could any CamGrid job also affects what you
publish?
Just saw your last mail and the 4444 is the hardcoded default in
lcg-info-static-ce.conf... What happens if you restart globus-mds?
Yves
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Santanu Das
> Sent: 26 April 2006 16:18
> To: [log in to unmask]
> Subject: Running and waiting jobs
>
> According to SFT, there are 609 running jobs and 154 waiting
> jobs ( GStat:
> 14:58:58 04/26/06 GMT), which is actually only 1 running and
> 18 waiting.
>
> -- Submitter: serv03.hep.phy.cam.ac.uk : <172.24.116.151:9512> :
> serv03.hep.phy.cam.ac.uk
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> 22562.0 lhcb001 1/12 16:18 9+04:46:01 I 0 29.2 data
> 22564.0 lhcb001 1/12 16:32 9+02:53:26 I 0 19.1 data
> 22617.0 atlas007 1/12 21:04 12+10:58:52 I 0 27.9 data
> 22670.0 atlas007 1/12 21:19 12+06:12:42 I 0 27.8 data
> 22755.0 lhcb004 1/13 01:36 2+17:56:25 I 0 9.3 data
> 22896.0 atlas007 1/13 12:35 11+16:46:19 I 0 18.9 data
> 23112.0 atlas002 1/27 18:21 2+02:40:17 I 0 69.7 data
> 50137.0 cms004 3/27 17:00 3+01:42:25 I 0 113.9 data
> 50139.0 lhcb002 3/27 17:03 3+08:28:46 I 0 359.4 data
> 50196.0 lhcb002 3/28 12:48 2+11:20:31 I 0 373.3 data
> 50198.0 lhcb002 3/28 12:59 2+10:14:46 I 0 360.9 data
> 50231.0 lhcb002 3/28 17:49 2+04:40:29 I 0 360.9 data
> 50244.0 lhcb002 3/28 22:39 1+05:18:41 I 0 387.2 data
> 50278.0 lhcb002 3/29 05:15 1+16:17:27 I 0 358.4 data
> 50383.0 dteamsgm 3/31 10:23 21+16:33:02 I 0 27.8 data
> 50385.0 dteam001 3/31 14:17 4+18:35:03 I 0 86.7 data
> 51086.0 atlas002 4/21 15:17 0+09:08:55 I 0 53.3 data
> 51092.0 atlas002 4/21 15:49 0+06:23:45 I 0 53.3 data
> 51167.0 atlas006 4/25 17:17 0+22:04:33 R 0 123.0 data
>
> 19 jobs; 18 idle, 1 running, 0 held
>
>
> Sometimes it even reports about more than 1500 waiting jobs,
> which is, also not the actual case. Anybody knows what's
> wrong? I'm at Condor Week at this moment. If it's anything to
> do with the Condor, I can ask one of the Condor team members
> if that helps. Condor showing the correct result though, SFT not.
>
> Thanks,
> Santanu
>
|