The mounts on the admin node are looking like a mess, and there are 3
mounts from the pnfs node on our admin node:
fal-pygrid-31.lancs.ac.uk:/pnfs on /pnfs/lancs.ac.uk type nfs
(rw,intr,hard,addr=194.80.35.29)
fal-pygrid-31.lancs.ac.uk:/pnfs on /pnfs/fs type nfs
(rw,intr,hard,addr=194.80.35.29)
fal-pygrid-31.lancs.ac.uk:/pnfsdoors on /pnfs/lancs.ac.uk type nfs
(rw,intr,hard,addr=194.80.35.29
Are any of them right? I don't like the looks of the mount on /pnfs/lancs.ac.uk.
cheers,
Matt
On 01/06/07, Greig Alan Cowan <[log in to unmask]> wrote:
> pnfs is definitely mounted on the SRM node?
>
> Did you happen to close off a firewall while you were reconfiguring things?
>
>
> Matt Doidge wrote:
> > Greig wins my vote for King of the World once more, that got it,
> > pnfsManager was trying to start on the admin node, simply stopping it
> > there and restarting it has solved some of my problems. I've got
> > globus-url-copy's to work, but not srmcp's or dccp's. I get the
> > errors:
> >
> >
> > Fri Jun 01 18:01:14 BST 2007: starting SRMGetClient
> > Fri Jun 01 18:01:14 BST 2007: In SRMClient ExpectedName: host
> > Fri Jun 01 18:01:14 BST 2007: SRMClient(https,srm/managerv1,true)
> > SRMClientV1 : user credentials are:
> > /C=UK/O=eScience/OU=Lancaster/L=Physics/CN=matthew doidge
> > SRMClientV1 : SRMClientV1 calling
> > org.globus.axis.util.Util.registerTransport()
> > SRMClientV1 : connecting to srm at
> > httpg://fal-pygrid-20.lancs.ac.uk:8443/srm/managerv1
> > Fri Jun 01 18:01:15 BST 2007: connected to server, obtaining proxy
> > Fri Jun 01 18:01:15 BST 2007: got proxy of type class
> > org.dcache.srm.client.SRMClientV1
> > SRMClientV1 : get:
> >
> surls[0]="srm://fal-pygrid-20.lancs.ac.uk:8443/pnfs/lancs.ac.uk/data/dteam/pooltest/fal23_test"
> >
> > SRMClientV1 : get: protocols[0]="http"
> > SRMClientV1 : get: protocols[1]="dcap"
> > SRMClientV1 : get: protocols[2]="gsiftp"
> > copy_jobs is empty
> > SRMClientV1 : java.net.ConnectException: Connection refused
> > SRMClientV1 : get : try # 0 failed with error
> > SRMClientV1 : java.net.ConnectException: Connection refused
> > copy_jobs is empty
> > stopping copier
> > srm copy of at least one file failed or not completed
> >
> > It's never just fixed is it! :-D
> >
> > cheers,
> > Matt
> >
> > On 01/06/07, Greig Alan Cowan <[log in to unmask]> wrote:
> >> Have you removed pnfs from the admin node? Presumably the startup script
> >> is still in /etc/init.d .
> >>
> >> Greig
> >>
> >> Matt Doidge wrote:
> >> > Well this guy shouldn't be on the admin node:
> >> > [root@fal-pygrid-20 root]# ps aux|grep -i pnfs
> >> > root 16046 0.0 0.0 4408 1308 pts/1 S 16:53 0:00 /bin/sh
> >> > /opt/d-cache/jobs/pnfs start
> >> > root 14455 0.0 0.0 3692 676 pts/1 S 17:38 0:00 grep -i
> >> pnfs
> >> >
> >> > The pnfsDomain logs on the admin node contain complaints about not
> >> > beign able to find a mount point (as pnfs itself isn't running), the
> >> > logs on the pnfs node contain complaints seemingly about the location
> >> > manager:
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : runIO : java.io.EOFException
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : java.io.EOFException
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : java.io.EOFException
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : at
> >> >
> >>
> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2498)
> >>
> >> >
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : at
> >> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273)
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : at
> >> > java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : at
> >> > dmg.cells.network.LocationMgrTunnel.runIo(LocationMgrTunnel.java:283)
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : at
> >> >
> >>
> dmg.cells.network.LocationMgrTunnel.connectionThread(LocationMgrTunnel.java:202)
> >>
> >> >
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : at
> >> > dmg.cells.network.LocationMgrTunnel.run(LocationMgrTunnel.java:347)
> >> > 06/01 16:53:15 Cell(c-100@pnfsDomain) : at
> >> > java.lang.Thread.run(Thread.java:595)
> >> >
> >> > Any way of stopping this starting when I hit dcache-core start? The
> >> > pnfsManager is switched to no for the admin node node_config?
> >> >
> >> > cheers,
> >> > Matt
> >> >
> >> > On 01/06/07, Greig Alan Cowan <[log in to unmask]> wrote:
> >> >> Matt,
> >> >>
> >> >> What does the PnfsDomain.log file say?
> >> >>
> >> >> Can you do a
> >> >>
> >> >> ps aux|grep -i pnfs
> >> >>
> >> >> on the admin node to make sure that no pnfs processes are running.
> >> >>
> >> >> Cheers,
> >> >> Greig
> >> >>
> >> >> Matt Doidge wrote:
> >> >> > Heya,
> >> >> >
> >> >> > On the Pnfs Node:
> >> >> > serviceLocatorHost=fal-pygrid-20.lancs.ac.uk
> >> >> > serviceLocatorPort=11111
> >> >> >
> >> >> > On the Admin node:
> >> >> > serviceLocatorHost=fal-pygrid-20.lancs.ac.uk
> >> >> > serviceLocatorPort=11111
> >> >> >
> >> >> > so the same for both of them. It's also the same on all my other
> >> nodes.
> >> >> >
> >> >> > cheers,
> >> >> > Matt
> >> >> >
> >> >> > On 01/06/07, Greig Alan Cowan <[log in to unmask]> wrote:
> >> >> >> Matt,
> >> >> >>
> >> >> >> What are the entries:
> >> >> >>
> >> >> >> serviceLocatorHost
> >> >> >> serviceLocatorPort
> >> >> >>
> >> >> >> set to on the admin and pnfs nodes? The host should be the admin
> >> node
> >> >> >> hostname.
> >> >> >>
> >> >> >> Greig
> >> >> >>
> >> >> >> Matt Doidge wrote:
> >> >> >> > I uncommented out that line, reran the install scripts and
> >> restarted
> >> >> >> > stuff, but still no joy. Checking the pnfsDomain logs on the Pnfs
> >> >> node
> >> >> >> > I see a lot of complaints that look's ike it can't find the
> >> location
> >> >> >> > manager.
> >> >> >> >
> >> >> >> > I've increased some log verbosity to help find clues, and am
> >> looking
> >> >> >> > for references to "localhost" in my dcache configs on the Pnfs
> >> node.
> >> >> >> >
> >> >> >> > cheers,
> >> >> >> > Matt
> >> >> >> >
> >> >> >> > On 01/06/07, Greig Alan Cowan <[log in to unmask]> wrote:
> >> >> >> >> Matt,
> >> >> >> >>
> >> >> >> >> Why is this line commented out in the pnfs_node_config ?
> >> >> >> >>
> >> >> >> >> #ADMIN_NODE=fal-pygrid-20.lancs.ac.uk
> >> >> >> >>
> >> >> >> >> Greig
> >> >> >> >>
> >> >> >> >> Matt Doidge wrote:
> >> >> >> >> > Here are the node_configs for both of the nodes,
> >> >> >> >> >
> >> >> >> >> > cheers,
> >> >> >> >> > Matt
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>
|