Hello,
While investigating while we were failing the Atlas test again, I found
(once more) than many prd atlas jobs are sleeping.
[root@epcf25 root]# ps -ef | grep sleep
atlasprd 27603 23385 0 12:02 ? 00:00:00 sleep 9600
atlasprd 27604 23386 0 12:02 ? 00:00:00 sleep 9600
root 27712 8088 0 12:17 pts/0 00:00:00 grep sleep
[root@epcf28 root]# ps -ef | grep sleep
atlasprd 19537 6438 0 10:11 ? 00:00:00 sleep 9600
atlasprd 19667 19352 0 11:42 ? 00:00:00 sleep 9600
root 19873 19716 0 12:18 pts/0 00:00:00 grep sleep
[root@epcf28 root]#
and the same on three other worker nodes!
I issued ggus ticket (25848) back in August about this. But no one has
addressed it yet. Are they using CRONUS and is it really that bad!?
I do not think this has got anything to do with my failing Steve's test:
the failure is caused by a missing file which is actually available in the
Atlas software area on all my workers - I shall take this offline with
Frederic.
Are VOs really claiming that pilot jobs are necessary because they allow
them to make more effective use of resources?
We should definitely do wall time accounting rather than CPU time
accounting.
Have other sites seen this too?
Yves
|