Print

Print


Hi,
I expect CondorG submitted pilots from ATLAS are part of the perceived 
problem. The beauty of pilots is that we don`t care if they run or not, so 
feel free to kill the atlas prd queued jobs.

If you want to drain the site for downtime, then you should stop the 
queues in plenty of time beforehand, i.e. don`t start jobs. It is not 
sufficient to say you are closed, lock the door. Torque qstop(or qdisable) 
will stop new job starts and submissions and this has some effect on the 
info, but it doesn`t rely on the info to block submissions.

Killing running jobs is another matter and should be avoided, but ATLAS 
prodsys would deal with this too.

Info can be wrong - you publish

GlueHostMainMemoryRAMSize: 1024
GlueHostArchitectureSMPSize: 2

According to the glue spec I should divide the machine RAM by the number 
of slots, and to the best of my knowledge assume 512MB per slot. In this 
case ATLAS would send no jobs at all to Cambridge. Just so you understand 
why we are selective about the Info published.

Cheers,
Rod.




On Mon, 15 Sep 2008, Santanu Das wrote:

> Hi Steve,
>> On Mon, Sep 15, 2008 at 3:57 PM, Santanu Das <[log in to unmask]> 
>> wrote:
>> 
>>> Greetings all!!
>>> 
>>> What should a site do for the jobs from the VO and/or users, those who 
>>> don't
>>> pay attention to the GlueCEStateStatus?
>>> 
>>> Our site is logged "down" and all the queues published as "Closed" since
>>> midnight but jobs from several VOs are keep coming in. So, can I just 
>>> simply
>>> remove all the queuing/running jobs?
>>> 
>> 
>> It depends how ruthless you want to be.
>> 
>> Deleting the jobs is okay and then when the support requests come let them
>> know what happened but its best if you contact the users in question
>> and try and find out
>> why they are not respecting your GlueCEStateStatus first.
>> 
> That's what I did but that forces the site to reschedule the work plan, which 
> sometimes brings other inconveniences in. I put out "down" and "Closed" last 
> night for a day, assuming the site will be free in the morning, which did not 
> happen. Now, we are actually
>
>  1. down for nothing for the entire day,
>  2. didn't able to do the thing I supposed to do today
>  3. plus I need to log another down time
>  4. and reschedule my to do list.
>
>
> Couple of VOs (and their users) are just pain in the neck and almost certain 
> that they are not gonna get back with anything. So entire thing ends up with 
> a complete mess, which causes huge inconvenience(s) to the site.
>
> Cheers,
> Santanu
>
>

-- 
Tel. +1 604 222 7667