Print

Print


Hi Steve

No, the box isn't overloaded:

10:46:11  up 145 days, 12:55,  1 user,  load average: 0.17, 0.20, 0.19
625 processes: 620 sleeping, 1 running, 4 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
            total    0.0%    0.0%    0.5%   0.0%     0.0%    0.9%    
98.3%
            cpu00    0.0%    0.0%    0.7%   0.0%     0.0%    0.0%    
99.2%
            cpu01    0.1%    0.0%    0.3%   0.0%     0.0%    1.9%    
97.4%
Mem:  8195868k av, 3505276k used, 4690592k free,       0k shrd,   
376700k buff
       1191472k active,            1615960k inactive
Swap: 4192880k av,       0k used, 4192880k free                  
2008148k cached

It's quite a beefy machine.

Most of the load comes from the top level BDII, which runs on the  
same box.

Where's the test client?

Cheers

Graeme

PS. It's actually been somewhat better this week (because the cluster  
is quiet?), but see the problems logged here last weekend

http://www.gridpp.ac.uk/wiki/Glasgow_Middleware_Operations_Logbook

On 21 Mar 2007, at 10:38, Fisher, SM (Steve) wrote:

> Is your mon box overloaded? Does the r-gma client check script run
> quickly?
>
> Steve
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes
>> [mailto:[log in to unmask]] On Behalf Of Graeme Stewart
>> Sent: 21 March 2007 10:16
>> To: [log in to unmask]
>> Subject: Re: SAME tests run in job wrapper
>>
>> Yes, I've noticed this at Glasgow - and in particular R-GMA is adding
>> 5-15 minutes of wallclock time to every job, which is a
>> terrible waste of resources (particularly for our GRAM GT2
>> user groups, who run some pretty short jobs).
>>
>> Something I really need to look into...
>>
>> g
>>
>> On 21 Mar 2007, at 09:26, Stephen Childs wrote:
>>
>>> Apologies if this has already been discussed, but is it the
>> case that
>>> some SAME tests now get run within the job wrapper? I
>> remember seeing
>>> something about this at some stage and looking on a node where I'm
>>> running a simple /bin/hostname using globus-job-run I see
>> this kind of
>>> stuff:
>>>
>>> 6142      7118  0.3  2.6  9372 6848 ?        S    09:19   0:01
>>> python /opt/lcg/same//client/bin/same-exec --nodetest WN
>>> gridmon.cs.tcd.ie -- stage=start.stage1 sh_results=/home/gitest042/
>>> globus-tmp.gridmon.6767.0/.same_result.xq7111 sh_results_all=/home/
>>> gitest042/globus-tmp.gridmon.6767.0/.same_details.Fv7113
>>> starttime=1174468780 stage2_dir=/home/gitest042/globus-tmp.gridmon.
>>> 6767.0/.same_stage2.oF7109
>>>
>>> Of course the problem is that the SAME stuff is getting
>> stuck at the
>>> moment. Is there any way to disable it?
>>>
>>> Stephen
>>> --
>>> Dr. Stephen Childs,
>>> Research Fellow, EGEE Project,    phone:
>>> +353-1-8961797
>>> Computer Architecture Group,      email:        Stephen.Childs @
>>> cs.tcd.ie
>>> Trinity College Dublin, Ireland   web: http://www.cs.tcd.ie/
>>> Stephen.Childs
>>
>> --
>> Dr Graeme Stewart - http://wiki.gridpp.ac.uk/wiki/User:Graeme_stewart
>> GridPP DM Wiki - http://wiki.gridpp.ac.uk/wiki/Data_Management
>> ScotGrid - http://www.scotgrid.ac.uk/ http://scotgrid.blogspot.com/
>>

--
Dr Graeme Stewart - http://wiki.gridpp.ac.uk/wiki/User:Graeme_stewart
GridPP DM Wiki - http://wiki.gridpp.ac.uk/wiki/Data_Management
ScotGrid - http://www.scotgrid.ac.uk/ http://scotgrid.blogspot.com/