Hi Stephen,
Burke, S (Stephen) via RT wrote:
>>multiple brokers?
> https://savannah.cern.ch/bugs/?func=detailitem&item_id=7582
>
> Basically you can give a list of brokers (NS) and each NS can have a set
> of LBs specified.
In fact, that looks more like a "fail-over" solution,
rather than a true "multiple brokers"=="load balancing" solution.
All in all, it doesn't solve the scalability issues we are talking about.
Confirmation, please?
>>Now, we are just sysadmins reaching the point of speculation (...),
>>and the true developers of the software should pop-in for
>>real profiling, with concrete numbers and facts.
>
> The developers mostly don't read this list, and we don't seem to have
> very much contact in general between SA1 and JRA1/3.
OK, I certainly agree with that and think lots frustration is due to this.
Some of the dev's we now, because they show up in this list, but not all.
It might be that their emails are part of the source/bin code or something,
but I haven't really checked it that well... does anyone know an rb developer?
> What do you mean by "known"? I assume that no sysadmins mean to have a
> broken site, but unfortunately there are lots of things which can go
> wrong and sites are often broken anyway in some way. Even filtering on
> the results of the SFT and gstat tests doesn't guarantee that it will
> work, they can't test everything (in particular I think they still only
> test things for the dteam VO, so other VOs could have problems).
Yeap, so the least acceptable common denominator should be correct job runs
from within the dteam VO, as seen from eg. lxplus.cern.ch, so that any site
which is not able even for dteam is automatically pushed to maintenance mode.
>>My advice to the CIC-on-duty is to always ask any relevant
>>sites to move to aintenance status or do it themselves anyway.
>
> It depends what the problem is, many things can be fixed quickly once
Quickly or not, the grid is no longer a party of people in a single room,
so there should be some more formal way of telling the other poor admins
that something is not optimal with that site's maintenance.
This is a way to easily discriminate what is worthwhile for debugging
and what isn't. IMHO, anything else wastes plenty of sysadmin time.
> they are spotted. Anyway, the CIC can't force a site to go to
> maintenance, and it also can't remove it from BDIIs in general since
> they can be run by anyone.
OK, I didn't mean removing them from BDIIs (Testzone BDII in particular).
That would definatelly be Bad Practice(tm); up to over-reaction...
It might be that a site *does* work well within another VO anyway...
--
echo "sysadmin know better bash than english" | sed s/min/mins/ \
| sed 's/better bash/bash better/' # Yelling in a CERN forum
--
echo "sysadmin know better bash than english" | sed s/min/mins/ \
| sed 's/better bash/bash better/' # Yelling in a CERN forum
|