Dear Rollout.
An ATLAS user has been adviced to use this list to post messages about a
problem he is having. In particular, seems several sites are attracting
jobs even if the queues are overfull and the ranking expression he is
using seems reasonable.
He ca not post to ROLLOUT since he is not in the list and is trying to
use the GGUS portal. In the meanwhile I forward his message. The
interested users are in cc
Cheers
Simone Campana
***** Start of forwarded message from Andreas *****
Dear lcg-rollout,
I encounter a problem with the distribution of my jobs.
I submitted 100 jobs yesterday and more than 90 were still scheduled
after 24 hours.
So I canceled them and resubmitted them. It is a bit better today. 30
out of 90 start. The rest is scheduled. But there is no surprise because
they are all sent to the same site:
skurut17.cesnet.cz
So I wonder if I can use a different ranking statement in my jdl file so
that jobs are not sent to sites which are completely overloaded anyway
(even a random distribution would be fine with me, if that is possible
:-) )
Btw.: I cannot even find this site (skurut17.cesnet.cz) when I type
lcg-infosites --vo atlas ce | grep skurut17.cesnet.cz
My LCG_GFAL_INFOSYS is set to atlas-bdii.cern.ch:2170.
Any hint is welcome on how I improve the job submission.
Thanks
Andi
P.S.:
here is my ranking statement
Rank = (other.GlueCEStateWaitingJobs == 0) ? (
(other.GlueCEStateFreeCPUs * 100) / ((other.GlueCEStateRunningJobs == 0)
? 1 : other.GlueCEStateRunningJobs) ) : ( -(other.GlueCEStateWaitingJobs
* 100) / other.GlueCEStateRunningJobs);
|