>
> I -put-lifetime (-get-lifetime) had been overridden by the upgrade. I've
> now reduced them again. However I'd like to know who's this guy and why
> his requests accumulates so fast.
Alessandra ,can you confirm if it is -put-lifetime or not? I don't see
this option in srm.batch, but there are a variety of other possibilities.
Greig
>
> cheers
> alessandra
>
> Alessandra Forti wrote:
> > There is no firewall.
> >
> > This is tcpdump on the head node grepping on the UI name
> >
> > 14:39:28.212737 niels003.tier2.hep.manchester.ac.uk.35625 >
> > dcache01.tier2.hep.m
> > anchester.ac.uk.8443: P 2987:3845(858) ack 5622 win 17376
> > <nop,nop,timestamp 129
> > 050588 34551678> (DF)
> > 14:39:28.212743 dcache01.tier2.hep.manchester.ac.uk.8443 >
> > niels003.tier2.hep.ma
> > nchester.ac.uk.35625: . ack 3845 win 14480 <nop,nop,timestamp 34551680
> > 129050588
> > > (DF)
> > 14:39:28.317297 dcache01.tier2.hep.manchester.ac.uk.8443 >
> > niels003.tier2.hep.ma
> > nchester.ac.uk.35625: P 5622:5697(75) ack 3845 win 14480
> > <nop,nop,timestamp 3455
> > 1691 129050588> (DF)
> > 14:39:28.318038 niels003.tier2.hep.manchester.ac.uk.35625 >
> > dcache01.tier2.hep.m
> > anchester.ac.uk.8443: P 3845:3874(29) ack 5697 win 17376
> > <nop,nop,timestamp 1290
> > 50599 34551691> (DF)
> > 14:39:28.318087 dcache01.tier2.hep.manchester.ac.uk.8443 >
> > niels003.tier2.hep.ma
> > nchester.ac.uk.35625: . ack 3874 win 14480 <nop,nop,timestamp 34551691
> > 129050599
> > > (DF)
> > 14:39:28.318930 niels003.tier2.hep.manchester.ac.uk.35625 >
> > dcache01.tier2.hep.m
> > anchester.ac.uk.8443: P 3874:4727(853) ack 5697 win 17376
> > <nop,nop,timestamp 129
> > 050599 34551691> (DF)
> > 14:39:28.318937 dcache01.tier2.hep.manchester.ac.uk.8443 >
> > niels003.tier2.hep.ma
> > nchester.ac.uk.35625: . ack 4727 win 17376 <nop,nop,timestamp 34551691
> > 129050599
> > > (DF)
> > 14:39:28.318942 niels003.tier2.hep.manchester.ac.uk.35625 >
> > dcache01.tier2.hep.m
> > anchester.ac.uk.8443: F 4727:4727(0) ack 5697 win 17376
> > <nop,nop,timestamp 12905
> > 0599 34551691> (DF)
> > 14:39:28.356206 dcache01.tier2.hep.manchester.ac.uk.8443 >
> > niels003.tier2.hep.ma
> > nchester.ac.uk.35625: . ack 4728 win 17376 <nop,nop,timestamp 34551695
> > 129050599
> > > (DF)
> > 14:39:28.360362 dcache01.tier2.hep.manchester.ac.uk.8443 >
> > niels003.tier2.hep.ma
> > nchester.ac.uk.35625: . 5697:7145(1448) ack 4728 win 17376
> > <nop,nop,timestamp 34
> > 551695 129050599> (DF)
> > 14:39:28.360371 dcache01.tier2.hep.manchester.ac.uk.8443 >
> > niels003.tier2.hep.ma
> > nchester.ac.uk.35625: P 7145:8286(1141) ack 4728 win 17376
> > <nop,nop,timestamp 34
> > 551695 129050599> (DF)
> > 14:39:28.360428 dcache01.tier2.hep.manchester.ac.uk.8443 >
> > niels003.tier2.hep.ma
> > nchester.ac.uk.35625: F 8286:8286(0) ack 4728 win 17376
> > <nop,nop,timestamp 34551
> > 695 129050599> (DF)
> > 14:39:28.360874 niels003.tier2.hep.manchester.ac.uk.35625 >
> > dcache01.tier2.hep.m
> > anchester.ac.uk.8443: . ack 8286 win 23168 <nop,nop,timestamp
> > 129050603 34551695
> > > (DF)
> > 14:39:28.360885 niels003.tier2.hep.manchester.ac.uk.35625 >
> > dcache01.tier2.hep.m
> > anchester.ac.uk.8443: . ack 8287 win 23168 <nop,nop,timestamp
> > 129050603 34551695
> >
> >
> > Greig A Cowan wrote:
> >> Have you tried looking at the output of tcpdump during a transfer to
> >> see what connections are being established?
> >>
> >> What is the firewall setup with your door nodes?
> >>
> >>
> >> On Tue, 25 Apr 2006, Alessandra Forti wrote:
> >>
> >>
> >>> How can I know that? Besides each line has a dissereant "ID". If I
> >>> try a cacheinfoof of those numbers it returns a null pointer as it
> >>> says in the log files.
> >>>
> >>> I tried to netstat my UI to see if there was any connection with one
> >>> of the gridftp doors but there isn't. There are only 2 time_wait
> >>> connections with the head node and the head node doesn't show any
> >>> connection with the UI.
> >>> cheers
> >>>
> >>> Greig A Cowan wrote:
> >>>
> >>>> So 00010000000000000003D378 is the PNFS ID of the file being
> >>>> transferred into the dCache using lcg-cr?
> >>>>
> >>>> On Tue, 25 Apr 2006, Alessandra Forti wrote:
> >>>>
> >>>>
> >>>>> The only log file that gets changed is the pnfsDomain.log one.
> >>>>> And there is this
> >>>>> ========================================================================
> >>>>>
> >>>>>
> >>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
> >>>>> PnfsManager for 00010000000000000003D378 [4] Pnfs looku
> >>>>> p failed
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
> >>>>> getCacheLocations java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
> >>>>> java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
> >>>>> PnfsManager for 00010000000000000003D398 [4] Pnfs looku
> >>>>> p failed
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
> >>>>> getCacheLocations java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
> >>>>> java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
> >>>>> PnfsManager for 00010000000000000003D3C8 [4] Pnfs looku
> >>>>> p failed
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
> >>>>> getCacheLocations java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
> >>>>> java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
> >>>>> PnfsManager for 00010000000000000003D3F8 [4] Pnfs looku
> >>>>> p failed
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
> >>>>> getCacheLocations java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
> >>>>> java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
> >>>>> PnfsManager for 00010000000000000003D408 [4] Pnfs looku
> >>>>> p failed
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
> >>>>> getCacheLocations java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
> >>>>> java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
> >>>>> PnfsManager for 00010000000000000003D430 [4] Pnfs looku
> >>>>> p failed
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
> >>>>> getCacheLocations java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
> >>>>> java.lang.NullPointerException
> >>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
> >>>>> PnfsManager for 00010000000000000003D438 [4] Pnfs looku
> >>>>> p failed
> >>>>> ..................
> >>>>>
> >>>>> And then it starts again with logging every minute the following
> >>>>> (that's how you get 3GB files in 2 days).
> >>>>>
> >>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : Exception in mapPath
> >>>>> (pathfinder) java.lang.NullPointerException
> >>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) :
> >>>>> java.lang.NullPointerException
> >>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
> >>>>> diskCacheV111.namespace.provider.BasicNameSpaceProvider.pnfsi
> >>>>> dToPath(BasicNameSpaceProvider.java:351)
> >>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
> >>>>> diskCacheV111.namespace.PnfsManagerV3.pathfinder(PnfsManagerV
> >>>>> 3.java:892)
> >>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
> >>>>> diskCacheV111.namespace.PnfsManagerV3.mapPath(PnfsManagerV3.j
> >>>>> ava:906)
> >>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
> >>>>> diskCacheV111.namespace.PnfsManagerV3.processPnfsMessage(Pnfs
> >>>>> ManagerV3.java:1060)
> >>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
> >>>>> diskCacheV111.namespace.PnfsManagerV3$ProcessThread.run(PnfsM
> >>>>> anagerV3.java:952)
> >>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
> >>>>> java.lang.Thread.run(Thread.java:534)
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Greig A Cowan wrote:
> >>>>>
> >>>>>> Do these logfiles show the output from the dCache during the
> >>>>>> lcg-cr command?
> >>>>>>
> >>>>>> Some comments below:
> >>>>>>
> >>>>>>
> >>>>>>> ========================================================================
> >>>>>>>
> >>>>>>> /var/log/messages
> >>>>>>>
> >>>>>>> Apr 25 12:36:07 dcache01 kernel: nfs_refresh_inode: inode number
> >>>>>>> mismatch
> >>>>>>> Apr 25 12:36:07 dcache01 kernel: expected (0xa/0x1027), got
> >>>>>>> (0xa/0x1020)
> >>>>>>>
> >>>>>>> ========================================================================
> >>>>>>>
> >>>>>>>
> >>>>>> This is OK. See this bug:
> >>>>>>
> >>>>>> https://savannah.cern.ch/bugs/index.php?func=detailitem&item_id=10131
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> /var/log/PnfsDomain.log
> >>>>>>>
> >>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : Exception in
> >>>>>>> mapPath (pathfinder) java.lang.NullPointerException
> >>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) :
> >>>>>>> java.lang.NullPointerException
> >>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
> >>>>>>> diskCacheV111.namespace.provider.BasicNameSpaceProvider.pnfsidToPath(BasicNameSpaceProvider.java:351)
> >>>>>>>
> >>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
> >>>>>>> diskCacheV111.namespace.PnfsManagerV3.pathfinder(PnfsManagerV3.java:892)
> >>>>>>>
> >>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
> >>>>>>> diskCacheV111.namespace.PnfsManagerV3.mapPath(PnfsManagerV3.java:906)
> >>>>>>>
> >>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
> >>>>>>> diskCacheV111.namespace.PnfsManagerV3.processPnfsMessage(PnfsManagerV3.java:1060)
> >>>>>>>
> >>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
> >>>>>>> diskCacheV111.namespace.PnfsManagerV3$ProcessThread.run(PnfsManagerV3.java:952)
> >>>>>>>
> >>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
> >>>>>>> java.lang.Thread.run(Thread.java:534)
> >>>>>>>
> >>>>>>> ==========================================================================
> >>>>>>>
> >>>>>>>
> >>>>>> This seems to be OK.
> >>>>>>
> >>>>>>
> >>>>>>> /var/log/srm-dcache01Domain.log
> >>>>>>>
> >>>>>>> 04/25 11:11:00 Cell(SRM-dcache01@srm-dcache01Domain) :
> >>>>>>> PutRequestHandler error: changing fr#-2147431311 to Done
> >>>>>>> 04/25 11:11:48 Cell(SRM-dcache01@srm-dcache01Domain) :
> >>>>>>> PutRequestHandler error: copy request state changed to Done
> >>>>>>>
> >>>>>>> ==========================================================================
> >>>>>>>
> >>>>>>>
> >>>>>> Yep, this is fine.
> >>>>>>
> >>>>>>
> >>>>>>> /var/log/gridftp-dcache01Domain.log
> >>>>>>>
> >>>>>>> 04/24 15:56:44
> >>>>>>> Cell(GFTP-dcache01-Unknown-228@gridftp-dcache01Domain) :
> >>>>>>> SocketAdapter: SocketRedirector(Thread-217):Starting a
> >>>>>>> SocketRedirector
> >>>>>>>
> >>>>>>> ==========================================================================
> >>>>>>>
> >>>>>>> /var/log/gridftp-<gridftp-node>.log
> >>>>>>>
> >>>>>>> same as above.
> >>>>>>>
> >>>>>>> ==========================================================================
> >>>>>>>
> >>>>>>>
> >>>>>> This all seems normal behaviour to me.
> >>>>>>
> >>>>>> Can you give me an example of the command that you are running?
> >>>>>>
> >>>>>> gc
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> if you need anything else let me know. BTW it would be good if
> >>>>>>> the log files were grouped under a common dcache directory in
> >>>>>>> /var/log. I've also noticed that on all the nodes the log files
> >>>>>>> of all domains are created (probably because I started
> >>>>>>> dcache-core). It would be good that only the one that are
> >>>>>>> actually used were there the others just confuse things.
> >>>>>>>
> >>>>>>> ta
> >>>>>>>
> >>>>>>> cheers
> >>>>>>> alessandra
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Greig A Cowan wrote:
> >>>>>>>
> >>>>>>>> Hi Alessandra,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> is it possible to delete directories in dcache? I'm trying to
> >>>>>>>>> do some clean up. I'm also having problems with the SFTs right
> >>>>>>>>> now. They fail because they often time out on the lcg-cr
> >>>>>>>>> command. However srm commands seem to work perfectly.
> >>>>>>>>>
> >>>>>>>> You can delete empty directories using rmdir as root on the
> >>>>>>>> pnfs node (or any of the nodes where pnfs is mounted).
> >>>>>>>>
> >>>>>>>> Hmm, not sure about the lcg-cr commands. Do you see anything in
> >>>>>>>> the log files to give us more of a clue?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> I solved the problems I had with the WEB interface not
> >>>>>>>>> reporting doors correctly after the upgrade. I thought it was
> >>>>>>>>> either network or the the overlooaded head node but it was
> >>>>>>>>> stale java processes on the nodes that I couldn't reboot.
> >>>>>>>>>
> >>>>>>>> Thinking about it again, I have seen stale java processes
> >>>>>>>> causing problems when you try and start up gridftp doors. I'll
> >>>>>>>> add something to the wiki about this.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Greig
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>
--
=======================================================================
Dr Greig A Cowan http://www.ph.ed.ac.uk/~gcowan1
School of Physics, University of Edinburgh, James Clerk Maxwell Building
TIER-2 STORAGE SUPPORT PAGES: http://wiki.gridpp.ac.uk/wiki/Grid_Storage
=======================================================================
|