Hi Adeel,
the only peculiar thing I can notice is the fact that you have users
mapped to <VO>sgm accounts even if you have disabled such accounts in
the users.conf file. I think you should make sure you don't have such
old style entries in /etc/passwd, /etc/shadow and /etc/group files
and make sure that home directories of such pool accounts are erased.
Then you should erase such entries from /etc/grid-security/
gridmapdir/ (on CE, SE, RB and WMS) and install and reconfigure nodes
with yaim.
For further details you can also consult the following guide
http://wiki.egee-see.org/index.php/SG_GLITE-3_0_2_Guide
Regards,
Paschalis
On Nov 9, 2007, at 9:24 AM, Adeel-ur-Rehman wrote:
>
>
> Dear Maarten,
>
> On Friday, November 09, 2007 4:57 AM, Maarten Litmaath wrote:
>
>> How did you configure Torque?
>
> I am running SL-3.0.9 on all the nodes now. I installed Torque via
> yaim
> installation command by specifying as lcg-CE_torque meta-package:
> (/opt/glite/yaim/bin/yaim -i -s
> /opt/glite/yaim/examples/siteinfo/site-info.def -m lcg-CE_torque)
>
> and configured via yaim configuration command by specifying
> CE_torque as the
> node-type:
> (/opt/glite/yaim/bin/yaim -c -s
> /opt/glite/yaim/examples/siteinfo/site-info.def -n CE_torque -n
> BDII_site)
>
>
>
>> Any special settings?
>
>
> I haven't applied any special settings. I only configured the
> queues via the
> following commands:
>
> qmgr -c "set queue atlas max_running = 4"
> .... for all queues(of course, the value is not the same for all
> the queues)
>
>
> qmgr -c "set queue atlas Priority = 200"
> .... for all queues(of course, the value is not the same for all
> the queues)
>
> qmgr -c "set queue ops resources_max.walltime = 01:00:00"
> qmgr -c "set queue ops resources_max.cput = 00:30:00"
> .... for only dteam and ops
>
>
>> Please send me your users.conf and the output of these commands on
>> that
> node:
>
> rpm -qa | grep yaim
>
> ls -li /etc/grid-security/gridmapdir/
>
>
> The output of:
>
> rpm -qa | grep yaim is:
>
> [root@pcncp04 root]# rpm -qa | grep yaim
> glite-yaim-core-3.1.1-9
>
> and the output of the command:
>
> ls -li /etc/grid-security/gridmapdir/ can be found in the file
> attached
> "gridmapdir-contents".
>
> The file users.conf can also be found in the file attached
> "users.conf".
>
>
> Thanks,
>
> -- Best Regards --
> Adeel-ur-Rehman
>
>
> -----Original Message-----
> From: [log in to unmask] [mailto:[log in to unmask]]
> Sent: Friday, November 09, 2007 4:57 AM
> To: Adeel-ur-Rehman
> Cc: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] Jobs hanging in Running state
>
> On Thu, 8 Nov 2007, Adeel-ur-Rehman wrote:
>
>> After performing re-installation on almost all of our nodes, we
>> are still
>> facing the same problem. That is, some of the jobs start running
>> and then
>> after certain time, get stucked there forever without any further
>> progression in their elapsed time. This eventually ends with a Job
>> Proxy
>> Expired message. While some of the jobs execute successfully.
>
> How did you configure Torque? Any special settings?
>
>> [...]
>>
>> P.S. Maarten, I am again having accounts like opssgm, alicesgm,
>> atlassgm
> in
>> my /etc/grid-security/gridmapdir/ starting with "%" character. Do
>> I again
>> need to lock those accounts or not?
>
> Please send me your users.conf and the output of these commands on
> that
> node:
>
> rpm -qa | grep yaim
>
> ls -li /etc/grid-security/gridmapdir/
> <gridmapdir-contents>
> <users.conf>
|