Hi,
This message is intended to those who at the moment are submitting jobs
via the RAL Resource Broker (lcgrb01.gridpp.rl.ac.uk) (probably the
whole GridPP UK community and not only).
The RAL Tier1 team has installed, configured and started to operate a
second LCG Resource Broker (lcgrb02.gridpp.rl.ac.uk) in addition to the
existing one (lcgrb01.gridpp.rl.ac.uk). Also a load-balancing mechanism
was implemented (with help from CERN specialists) and tested and now is
ready to be used.
For this to happen some changes are needed at the UI (or central job
submission mechanism if any) level i.e. manual modification of some
config files:
1. In $EDG_LOCATION/etc/edg_wl_ui_cmd_var.conf comment out the line that
specifies the LoggingDestination.
Probably
#LoggingDestination = "lcgrb01.gridpp.rl.ac.uk:9002";
2. For each supported VO, in $EDG_LOCATION/etc/$VO/edg_wl_ui.conf name
the load-balanced RBs like this:
NSAddresses =
{"lcgrb01.gridpp.rl.ac.uk:7772","lcgrb02.gridpp.rl.ac.uk:7772"};
LBAddresses =
{{"lcgrb01.gridpp.rl.ac.uk:9000"},{"lcgrb02.gridpp.rl.ac.uk:9000"}};
Beware the exact syntax of the curly braces!
At least in the RAL UIs case
$EDG_LOCATION=/opt/edg
The theory (and the tests as well) says that the edg-job-submit command
will pick a random RB. If that RB fails to accept the job, the next RB
will be tried, and so on. Once the job has been submitted successfully,
it is tied to the RB that accepted it.
So I would ask people who maintain the UIs across the Tier2s just to
modify the configuration files (for the VOs they support) appropriately.
No restart of services is needed.
Also, if you are using local config files (instead of standard config
files as above) when submitting specific jobs (or you know of users
doing so), the same changes could be done at local config files level.
Finally, I would need feedback from you i.e. where these changes were
implemented (and for what VOs) and/or where it wasn't possible to be
implemented (and maybe why so we could work together to sort it out).
Please reply to me in case you need more information.
Best regards,
Catalin Condurache
Tier1 Application and Experiment Support RAL
|