On 22 Nov 2011, at 16:56, Ewan MacMahon wrote:
> So far my day has
> been email, meeting, lunch, meeting, paperwork, so you've not
> missed anything interesting actually happening because nothing's
> happened.
Email, meeting, meeting, coffee, meeting, email for me...
>> (I can't submit any job as I just found out that my dteam
>> membership expires, now in the progress of renewal)
>>
> And we're mostly going to start off using the gridpp VO for testing
> at Jeremy's suggestion - it means that we can muck about with things
> a bit more freely if need be.
Testing proxy renewal is annoying, because it, by definition, requires time. In order to minimise the time taken, here's what we ended up doing last time:
* _Always_ submit a fresh proxy to MyProxy just before starting a batch of tests, or otherwise check that there is plenty left. It's too easy to let it expire accidently, because you get no clear error message about that, different to any other failed renewal.
* Submit jobs with a 1 to 2 hour local proxy. Proxy renewal takes about half an hour, so much less than that, and it fails for other reasons. Much more than that, and you get to wait longer.
* Have the job report every 15 - 20 minutes. We just had it append voms-proxy-info -all >> /some/shared/path as we were just testing our cluster. It aught not to be too tricky to use glite-wms-job-perusal to achieve much the same effect on arbitrary clusters.
* Once it's clear that the job has, or has not, renewed, kill it. In the end, I set the job to do a maximum number of voms-proxy-info's, then die - typically set to 3 hours as a back stop.
Interesting point that occurs now: I did not test proxy-renewal under load. However, if 1000 pheno jobs arrive at once, that's exactly what might happen, and so we should maybe keep that in mind. Testing for that is ... tricky ... however.
|