Print

Print


All,

I'm sorry for the awkward title, but I thought it would be better to 
start a new topic on this because the other thread got shredded with 
side-issues.  There may be a problem with the planning that we have to 
move  out of the way before we can use the new software.

Part 1 - Recap on original gLite 3.2 thread
--------------------------------------------------------

In the original thread, we pronounced these things:

* gLite 3.1 goes before 1 Oct (JH)

* There will be no exceptions ! (LC)

* gLite 3.2 should go before 1 Oct; must go before 1 Nov, with 
exceptions for 3.2 WN (LC)

* Reason: EMI-1  WN has problems (LC)

* But EMI-2 also has no usable WN, and more testing to be discussed (JC)

* Aside: arguably pointless to go from working gLite 3.1 to working 
gLite 3.2, as 3.2 will go soon; yet there is no EMI 1 or 2 WN to go to. 
Therefore, stick on 3.1??

* Anyway, gLite 3.2 WN may be extended to the end of 2012 because of all 
this (LC)


Part 2 - Potential  Problem with this plan
------------------------------------------------------

I have tried an EMI Torque (version 2.5.7-7) with the gLite 3.2 WN 
(2.3.6-2), and it won't work. Someone else should verify this 
independently, but I am quite
convinced that it is incompatible. When I tried to get jobs to run, this 
error came out: 08/13/2012 
14:36:01;0080;PBS_Server;Req;req_reject;Reject reply 
code=15043(Execution server rejected request MSG=cannot send job to mom, 
state=PRERUN), aux=0, type=RunJob, from [log in to unmask]

The "Cluster Resources" documentation on the problem was unhelpful, but 
from what I can see of it, it means PBSE_STAGEIN; i.e. the torque server 
couldn't even get its stagein files on the WN client. To get it to work, 
I had to kludge the old gLite 3.2 WN client with new RPMs for TORQUE 
2.5.7-7, and get a proper MUNGE key on the WN system. After that, things 
kicked in and it worked OK.

Part 3 - Implications
--------------------------

If it is true that gLite 3.2 WN can't work with EMI torque, then it is 
pointless to extend security support only for WN - the whole stack would 
need special dispensation. It would be good if someone can independently 
verify this. Maybe I missed a trick somewhere in the validation. If not, 
then we're going to have to come up with a better plan for our upgrades.

Cheers,


Steve


-- 
Steve Jones                             [log in to unmask]
System Administrator                    office: 220
High Energy Physics Division            tel (int): 42334
Oliver Lodge Laboratory                 tel (ext): +44 (0)151 794 2334
University of Liverpool                 http://www.liv.ac.uk/physics/hep/