Dear Maarten,
On Friday, November 09, 2007 4:57 AM, Maarten Litmaath wrote:
> How did you configure Torque?
I am running SL-3.0.9 on all the nodes now. I installed Torque via yaim
installation command by specifying as lcg-CE_torque meta-package:
(/opt/glite/yaim/bin/yaim -i -s
/opt/glite/yaim/examples/siteinfo/site-info.def -m lcg-CE_torque)
and configured via yaim configuration command by specifying CE_torque as the
node-type:
(/opt/glite/yaim/bin/yaim -c -s
/opt/glite/yaim/examples/siteinfo/site-info.def -n CE_torque -n BDII_site)
> Any special settings?
I haven't applied any special settings. I only configured the queues via the
following commands:
qmgr -c "set queue atlas max_running = 4"
.... for all queues(of course, the value is not the same for all the queues)
qmgr -c "set queue atlas Priority = 200"
.... for all queues(of course, the value is not the same for all the queues)
qmgr -c "set queue ops resources_max.walltime = 01:00:00"
qmgr -c "set queue ops resources_max.cput = 00:30:00"
.... for only dteam and ops
> Please send me your users.conf and the output of these commands on that
node:
rpm -qa | grep yaim
ls -li /etc/grid-security/gridmapdir/
The output of:
rpm -qa | grep yaim is:
[root@pcncp04 root]# rpm -qa | grep yaim
glite-yaim-core-3.1.1-9
and the output of the command:
ls -li /etc/grid-security/gridmapdir/ can be found in the file attached
"gridmapdir-contents".
The file users.conf can also be found in the file attached "users.conf".
Thanks,
-- Best Regards --
Adeel-ur-Rehman
-----Original Message-----
From: [log in to unmask] [mailto:[log in to unmask]]
Sent: Friday, November 09, 2007 4:57 AM
To: Adeel-ur-Rehman
Cc: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] Jobs hanging in Running state
On Thu, 8 Nov 2007, Adeel-ur-Rehman wrote:
> After performing re-installation on almost all of our nodes, we are still
> facing the same problem. That is, some of the jobs start running and then
> after certain time, get stucked there forever without any further
> progression in their elapsed time. This eventually ends with a Job Proxy
> Expired message. While some of the jobs execute successfully.
How did you configure Torque? Any special settings?
> [...]
>
> P.S. Maarten, I am again having accounts like opssgm, alicesgm, atlassgm
in
> my /etc/grid-security/gridmapdir/ starting with "%" character. Do I again
> need to lock those accounts or not?
Please send me your users.conf and the output of these commands on that
node:
rpm -qa | grep yaim
ls -li /etc/grid-security/gridmapdir/
|