Hi Yves
The cronus executor has been shut down. Production jobs you are
seeing will be coming from the standard EGEE grid LEXOR executor.
Have these jobs consumed CPU yet, or are they trying to get started?
I agree this is a terrible waste of site's resources and that has
been a big motivating factor in the decision to move ATLAS production
to PanDA. Because PanDA stages input datasets on the site's SE and
puts outputs onto the site's SE as well (it does all other data
movements asynchronously using ATLAS DDM) we will not see these large
data management timeouts which currently cripple atlas production in
EGEE.
If you send me the output from ps auxwww I'll try and see what the
jobs are doing. It's possible you can kill them off - but please
don't do it yet.
Thanks
Graeme
PS. Yes, we also see inefficient atlasprd jobs at Glasgow.
On 15 Nov 2007, at 12:47, Yves Coppens wrote:
> Hello,
>
> While investigating while we were failing the Atlas test again, I
> found
> (once more) than many prd atlas jobs are sleeping.
>
> [root@epcf25 root]# ps -ef | grep sleep
> atlasprd 27603 23385 0 12:02 ? 00:00:00 sleep 9600
> atlasprd 27604 23386 0 12:02 ? 00:00:00 sleep 9600
> root 27712 8088 0 12:17 pts/0 00:00:00 grep sleep
>
> [root@epcf28 root]# ps -ef | grep sleep
> atlasprd 19537 6438 0 10:11 ? 00:00:00 sleep 9600
> atlasprd 19667 19352 0 11:42 ? 00:00:00 sleep 9600
> root 19873 19716 0 12:18 pts/0 00:00:00 grep sleep
> [root@epcf28 root]#
>
> and the same on three other worker nodes!
>
> I issued ggus ticket (25848) back in August about this. But no one has
> addressed it yet. Are they using CRONUS and is it really that bad!?
>
> I do not think this has got anything to do with my failing Steve's
> test:
> the failure is caused by a missing file which is actually available
> in the
> Atlas software area on all my workers - I shall take this offline with
> Frederic.
>
> Are VOs really claiming that pilot jobs are necessary because they
> allow
> them to make more effective use of resources?
>
> We should definitely do wall time accounting rather than CPU time
> accounting.
>
> Have other sites seen this too?
>
> Yves
--
Dr Graeme Stewart - http://wiki.gridpp.ac.uk/wiki/User:Graeme_stewart
ScotGrid - http://www.scotgrid.ac.uk/ http://scotgrid.blogspot.com/
|