On Wed, Sep 28, 2005 at 01:35:21PM +0100 or thereabouts, Morag Burgon-Lyon wrote:
> Thanks Steve,
>
> I found I could not scp/ssh from the new nodes to ce without a password. This has been resolved following your instructions, but I still get the bad UID message when I try to qsub a job as dteam001.
This is something different, you can only qsub from hosts that contained
in the pbs_server:/etc/hosts.equiv
there is no need to allow qsub from WNs. In fact it would be a really
bad idea to do so.
>
> The original worker nodes (wn1 and wn2) appear to be accepting and running grid jobs however they give the same error on qsub.
>
> There are no new files in /var/spool/pbs/undelivered on the wn's.
>
> Thanks,
> Mòrag
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:[log in to unmask]] On Behalf Of Steve Traylen
> Sent: 28 September 2005 12:18
> To: [log in to unmask]
> Subject: Re: Problem submitting jobs after adding new worker nodes
>
> On Wed, Sep 28, 2005 at 11:34:19AM +0100 or thereabouts, Morag Burgon-Lyon wrote:
> > Hi,
> >
> >
> >
> > Steve is away on holiday this week, so I'm holding the fort at Scotgrid Edinburgh. I added in two new worker nodes (wn0 and wn4) yesterday by altering the wn-list.conf and running the yaim script with WN_torque on the new worker nodes, and CE_torque on the ce. I amended the processor numbers, copied across the maui config and restarted maui, pbs_server and pbs_mom on ce.
> >
> >
> >
> > However, the queues haven't been filling back up and new jobs appear briefly and then disappear. Also qsub doesn't work from any node (including the existing nodes that worked fine before adding the new ones, such as wn2):
>
> Hi Morag,
>
> The problem sounds like unchallenged ssh from WN to CE not working.
>
> See.
>
> http://goc.grid.sinica.edu.tw/gocwiki/ssh_problem_from_WN_to_CE
>
> To test login into your new WNs
>
> # su - dteam050
> dteam050> ssh yource.ed.ac.uk
>
> Use the full hostname, does it work unchalleged?
>
> Also if it is this there will be some files in
> /var/spool/pbs/undelivered on the affected WNs.
>
> Steve
>
>
> >
> >
> >
> > [dteam001@wn2 dteam001]$ qsub qsubtest.sh
> >
> > qsub: Bad UID for job execution
> >
> > [dteam001@wn2 dteam001]$
> >
> >
> >
> > I've compared the /etc/passwd files for ce and the old worker nodes, and dteam001 has the same uid and gid in both, however I noticed that alice001 was different (ce looked wrong as it has a uid of 10000). Also, the UIDs and GIDs in the users.conf file are different to the ones in /etc/passwd on all nodes and ce. The upgrade was done using lcg-yaim-2.6.0-9.
> >
> >
> >
> > Any suggestions?
> >
> >
> >
> > Thanks,
> >
> > Mòrag
> >
>
> --
> Steve Traylen
> [log in to unmask]
> http://www.gridpp.ac.uk/
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
|