FYI -------- Original Message -------- Subject: ATLAS sites, please attention to this! Date: Mon, 1 Oct 2007 12:36:18 +0200 From: Simone Campana <[log in to unmask]> To: atlas-comp-oper (ATLAS Computing Operations) <[log in to unmask]> Dear ATLAS sites, please pay particular attention to this. Most sites are publishing wrong numbers in the ATLAS VOView. This is particularly bad for job distribution, since the WMS looks at the VOView infos to decide weather a site is empty or full. Therefore jobs are piling up in sites which are already full and leave empty some sites which could run jobs. Jeff offered an explanation (see mail below) with a description of how he fixed the problem, in addition the problem was reported at the last LCG operation meeting but the situation looks particularly bad still. I put a list of problematic CEs in http://voatlas01.cern.ch/atlas/data/VOViewProblem.log Beside the CE name, you find some numbers, which represent the number of waiting and running jobs from the "all inclusive" view (showing infos about all VOS supported in that queue) and the number of waiting and running jobs obtained adding up all the VOViews for VOs supported by that site. Generally the two numbers for both waiting and running jobs should be the same, but they dont Some further docs about debugging of Inforation Providers, beside jeff' s explanation, is in this Twiki from Laurence. http://twiki.cern.ch/twiki/bin/view/EGEE/TestingDynamicInformation Could atlas sites please investigate and give a statement, possibly fixing the problem (there may be some false positives, but generally the problem is there). If this situation lasts long it would be quite bad for production and at some point drastic measures like site banning will be necessarily enforced. Thanks for the attention. Simone > -----Original Message----- > From: Jeff Templon [mailto:[log in to unmask]] > Sent: Friday, September 28, 2007 3:43 PM > To: Nicholas Thackray; atlas-comp-oper (ATLAS Computing Operations); > Gergely Debreczeni > Subject: changed lcg-info-dynamic-scheduler.conf > > Hi *, > > It was reported at the ops meeting (and associated tickets opened) > that most ATLAS jobs were invisible from the VOViews published by > sites. > > The case here was that the dynamic-scheduler was being configured to > map special groups to FQANs, while publishing of these FQANs was turned > off in the machine BDII. Hence yes, the special groups have been > configured to be invisible. > > I turned them back on, by configuring the dynamic scheduler (by > hand) to map all VO special groups to the generic VO. I suspect this > is what is needed at other sites as well, perhaps you could check and > announce the necessary changes. Please make sure that the announcement > is worded correctly, people seem to be getting the impression that > there is a bug in the information provider, this is definitely not > responsible for what is seen; the info provider is doing exactly what > is requested! > > Gergo could you check : I suspect that YAIM is still doing this > group-to-FQAN mapping, even though publishing is turned off. That's > the only way I can understand that 130 separate queues are affected. > > JT > > ps: what I did here to fix it as attachment. Don't blindly apply the > patch because your site's mix of VOs may be different.