Hi all.
I thought I'd follow up this morning's discussion about enabling pilot accounts for all VOs with some possibly helpful examples from the Oxford site. Everyone's existing config is almost certainly different already, so you might not want to use our pool account names, and you almost certainly don't want to use our numeric uid and gids, so this is very much a source of inspiration, not a set of configs to just use. And when I say 'inspiration', I mainly mean 'illustration of just how tediously straightforward this is'.
The key principles to bear in mind are that:
- You can forget about storage entirely; pilots are a purely CPU side thing.
- So that means dealing with CEs, batch servers, worker nodes, ARGUS server.
- You already have pilot accounts for the LHC VOs, so all you really need to do is copy the relevant bits for them and change the names.
- It's good to make this a one time thing if you can, so now is also a good time to review your supported VO list, add some new ones like cernatschool.org, hyperk.org and lsst, and drop any old and dead ones like Hone, Dzero, CDF and superB, which might also free up some account IDs.
Below is a more detailed memoir of what I did on our systems and suggestions based on it, but really, if you start from the principle that you already have pilot enabled VOs and then just copy anything that says 'pilot' for all the other VOs, you're basically there.
Ewan
----------------------------------
So, for our change I started with a test worker node and deleted the old VOs, and re-YAIMed it, then manually removed all the old accounts from /etc/passwd, /etc/group, their shadows, and their home directories, then re-YAIMed it again to make sure nothing came back from the dead.
For each VO I added a set of pilot pool accounts (we've got twenty each) with unique UIDs, and memberships in the VO pilot group, and the main VO group (I'm not completely sure how important this is now, but given that the principle was always to copy from what we had and not overthink it, that's what I did). SO, for example, a cernatschool pilot account entry in users.conf looks like this:
36200:cernschpilot000:3650,3600:cernschpilot,cernatschool:cernatschool.org:pilot:
which breaks down as:
uid : accountname : primary gid,secondary gid : primary group,secondary group : VO name : flag to make pilot roles map to this
and as opposed to a normal pool account looking like this:
36003:cernsch001:3600:cernatschool:cernatschool.org::
In practice I did this by copying an existing chunk of config, selecting it in vim, and doing simple regex substitutions to change the names. It's a bit of a faff, but once it's done, it's done.
Then the corresponding bits in groups.conf should look like this:
"/cernatschool.org/ROLE=pilot":::pilot:
"/cernatschool.org/ROLE=lcgadmin":::sgm:
"/cernatschool.org/ROLE=production":::prd:
"/cernatschool.org"::::
of which the top line is new and handles the mapping of proxies with the pilot role to the 'pilot' flagged accounts. Then it can be YAIMed again, and after a couple of rounds of it pointing out that you've left the VO out of the 'VOS' variable in the main site-info.def, and then out of the vo.d directory, it'll all work and you'll have a load of new pilot accounts on the worker node. Rinse/repeat/puppet for the rest of the worker nodes. There is some glExec config that needs updating, but if you're using YAIM it does it for you.
Next up, ARGUS server. I'm moderately confident it doesn't need YAIMing for this, but I think I maybe did it anyway. Don't forget to put the central banning config back in if it nukes it, there will be a test later. The main bit is the policy, which you should have a copy of in a file somewhere. For each section that you need (any/all of the worker node (i.e. glExec) section, an ARC section and a Cream CE section), just make sure that all the VOs are included. Our policy file is attached, but really, this is just a matter of cut/paste/search-and-replace-the-VO-name. Then remove all the policy by doing 'pap-admin rap', and apply the policy from the file with 'pap-admin apf name_of_file'.
At the time I did this we didn't have any Cream CEs or any torque left, but it would probably be a matter YAIMing them to create the pool accounts, and making sure that the new VOs have permissions to submit to the queues.
For our condor batch system and ARC CEs the pool accounts were created by manually copying the relevant chunks of /etc/passwd, /etc/group and their shadows from one of the YAIMed worker nodes, and then creating the home directories. If you've done the obviously sensible thing that we never have and put all your pool accounts into something like NIS or LDAP you can obviously skip this step in favour of feeling smugly well organised and farsighted.
Finally the arc.conf also has a list of authorized VOs that might need updating if you've added whole new ones, but not if you've just added the pilot support to existing VOs.
Then, once you're done, poke Daniela to test it.
|