Hi Greig,
yeah it is not even there and it is not documented (which makes me
suspect there are other parameters in the same situation). It was
something Derek suggested last year.
cheers
alessandra
Greig A Cowan wrote:
>> I -put-lifetime (-get-lifetime) had been overridden by the upgrade. I've
>> now reduced them again. However I'd like to know who's this guy and why
>> his requests accumulates so fast.
>
> Alessandra ,can you confirm if it is -put-lifetime or not? I don't see
> this option in srm.batch, but there are a variety of other possibilities.
>
> Greig
>
>> cheers
>> alessandra
>>
>> Alessandra Forti wrote:
>>> There is no firewall.
>>>
>>> This is tcpdump on the head node grepping on the UI name
>>>
>>> 14:39:28.212737 niels003.tier2.hep.manchester.ac.uk.35625 >
>>> dcache01.tier2.hep.m
>>> anchester.ac.uk.8443: P 2987:3845(858) ack 5622 win 17376
>>> <nop,nop,timestamp 129
>>> 050588 34551678> (DF)
>>> 14:39:28.212743 dcache01.tier2.hep.manchester.ac.uk.8443 >
>>> niels003.tier2.hep.ma
>>> nchester.ac.uk.35625: . ack 3845 win 14480 <nop,nop,timestamp 34551680
>>> 129050588
>>>> (DF)
>>> 14:39:28.317297 dcache01.tier2.hep.manchester.ac.uk.8443 >
>>> niels003.tier2.hep.ma
>>> nchester.ac.uk.35625: P 5622:5697(75) ack 3845 win 14480
>>> <nop,nop,timestamp 3455
>>> 1691 129050588> (DF)
>>> 14:39:28.318038 niels003.tier2.hep.manchester.ac.uk.35625 >
>>> dcache01.tier2.hep.m
>>> anchester.ac.uk.8443: P 3845:3874(29) ack 5697 win 17376
>>> <nop,nop,timestamp 1290
>>> 50599 34551691> (DF)
>>> 14:39:28.318087 dcache01.tier2.hep.manchester.ac.uk.8443 >
>>> niels003.tier2.hep.ma
>>> nchester.ac.uk.35625: . ack 3874 win 14480 <nop,nop,timestamp 34551691
>>> 129050599
>>>> (DF)
>>> 14:39:28.318930 niels003.tier2.hep.manchester.ac.uk.35625 >
>>> dcache01.tier2.hep.m
>>> anchester.ac.uk.8443: P 3874:4727(853) ack 5697 win 17376
>>> <nop,nop,timestamp 129
>>> 050599 34551691> (DF)
>>> 14:39:28.318937 dcache01.tier2.hep.manchester.ac.uk.8443 >
>>> niels003.tier2.hep.ma
>>> nchester.ac.uk.35625: . ack 4727 win 17376 <nop,nop,timestamp 34551691
>>> 129050599
>>>> (DF)
>>> 14:39:28.318942 niels003.tier2.hep.manchester.ac.uk.35625 >
>>> dcache01.tier2.hep.m
>>> anchester.ac.uk.8443: F 4727:4727(0) ack 5697 win 17376
>>> <nop,nop,timestamp 12905
>>> 0599 34551691> (DF)
>>> 14:39:28.356206 dcache01.tier2.hep.manchester.ac.uk.8443 >
>>> niels003.tier2.hep.ma
>>> nchester.ac.uk.35625: . ack 4728 win 17376 <nop,nop,timestamp 34551695
>>> 129050599
>>>> (DF)
>>> 14:39:28.360362 dcache01.tier2.hep.manchester.ac.uk.8443 >
>>> niels003.tier2.hep.ma
>>> nchester.ac.uk.35625: . 5697:7145(1448) ack 4728 win 17376
>>> <nop,nop,timestamp 34
>>> 551695 129050599> (DF)
>>> 14:39:28.360371 dcache01.tier2.hep.manchester.ac.uk.8443 >
>>> niels003.tier2.hep.ma
>>> nchester.ac.uk.35625: P 7145:8286(1141) ack 4728 win 17376
>>> <nop,nop,timestamp 34
>>> 551695 129050599> (DF)
>>> 14:39:28.360428 dcache01.tier2.hep.manchester.ac.uk.8443 >
>>> niels003.tier2.hep.ma
>>> nchester.ac.uk.35625: F 8286:8286(0) ack 4728 win 17376
>>> <nop,nop,timestamp 34551
>>> 695 129050599> (DF)
>>> 14:39:28.360874 niels003.tier2.hep.manchester.ac.uk.35625 >
>>> dcache01.tier2.hep.m
>>> anchester.ac.uk.8443: . ack 8286 win 23168 <nop,nop,timestamp
>>> 129050603 34551695
>>>> (DF)
>>> 14:39:28.360885 niels003.tier2.hep.manchester.ac.uk.35625 >
>>> dcache01.tier2.hep.m
>>> anchester.ac.uk.8443: . ack 8287 win 23168 <nop,nop,timestamp
>>> 129050603 34551695
>>>
>>>
>>> Greig A Cowan wrote:
>>>> Have you tried looking at the output of tcpdump during a transfer to
>>>> see what connections are being established?
>>>>
>>>> What is the firewall setup with your door nodes?
>>>>
>>>>
>>>> On Tue, 25 Apr 2006, Alessandra Forti wrote:
>>>>
>>>>
>>>>> How can I know that? Besides each line has a dissereant "ID". If I
>>>>> try a cacheinfoof of those numbers it returns a null pointer as it
>>>>> says in the log files.
>>>>>
>>>>> I tried to netstat my UI to see if there was any connection with one
>>>>> of the gridftp doors but there isn't. There are only 2 time_wait
>>>>> connections with the head node and the head node doesn't show any
>>>>> connection with the UI.
>>>>> cheers
>>>>>
>>>>> Greig A Cowan wrote:
>>>>>
>>>>>> So 00010000000000000003D378 is the PNFS ID of the file being
>>>>>> transferred into the dCache using lcg-cr?
>>>>>>
>>>>>> On Tue, 25 Apr 2006, Alessandra Forti wrote:
>>>>>>
>>>>>>
>>>>>>> The only log file that gets changed is the pnfsDomain.log one.
>>>>>>> And there is this
>>>>>>> ========================================================================
>>>>>>>
>>>>>>>
>>>>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
>>>>>>> PnfsManager for 00010000000000000003D378 [4] Pnfs looku
>>>>>>> p failed
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
>>>>>>> getCacheLocations java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
>>>>>>> java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
>>>>>>> PnfsManager for 00010000000000000003D398 [4] Pnfs looku
>>>>>>> p failed
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
>>>>>>> getCacheLocations java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
>>>>>>> java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
>>>>>>> PnfsManager for 00010000000000000003D3C8 [4] Pnfs looku
>>>>>>> p failed
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
>>>>>>> getCacheLocations java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
>>>>>>> java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
>>>>>>> PnfsManager for 00010000000000000003D3F8 [4] Pnfs looku
>>>>>>> p failed
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
>>>>>>> getCacheLocations java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
>>>>>>> java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
>>>>>>> PnfsManager for 00010000000000000003D408 [4] Pnfs looku
>>>>>>> p failed
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
>>>>>>> getCacheLocations java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
>>>>>>> java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
>>>>>>> PnfsManager for 00010000000000000003D430 [4] Pnfs looku
>>>>>>> p failed
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) : Exception in
>>>>>>> getCacheLocations java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(PnfsManager@pnfsDomain) :
>>>>>>> java.lang.NullPointerException
>>>>>>> 04/25 13:34:53 Cell(cleaner@pnfsDomain) : Got error from
>>>>>>> PnfsManager for 00010000000000000003D438 [4] Pnfs looku
>>>>>>> p failed
>>>>>>> ..................
>>>>>>>
>>>>>>> And then it starts again with logging every minute the following
>>>>>>> (that's how you get 3GB files in 2 days).
>>>>>>>
>>>>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : Exception in mapPath
>>>>>>> (pathfinder) java.lang.NullPointerException
>>>>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) :
>>>>>>> java.lang.NullPointerException
>>>>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
>>>>>>> diskCacheV111.namespace.provider.BasicNameSpaceProvider.pnfsi
>>>>>>> dToPath(BasicNameSpaceProvider.java:351)
>>>>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
>>>>>>> diskCacheV111.namespace.PnfsManagerV3.pathfinder(PnfsManagerV
>>>>>>> 3.java:892)
>>>>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
>>>>>>> diskCacheV111.namespace.PnfsManagerV3.mapPath(PnfsManagerV3.j
>>>>>>> ava:906)
>>>>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
>>>>>>> diskCacheV111.namespace.PnfsManagerV3.processPnfsMessage(Pnfs
>>>>>>> ManagerV3.java:1060)
>>>>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
>>>>>>> diskCacheV111.namespace.PnfsManagerV3$ProcessThread.run(PnfsM
>>>>>>> anagerV3.java:952)
>>>>>>> 04/25 13:35:07 Cell(PnfsManager@pnfsDomain) : at
>>>>>>> java.lang.Thread.run(Thread.java:534)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Greig A Cowan wrote:
>>>>>>>
>>>>>>>> Do these logfiles show the output from the dCache during the
>>>>>>>> lcg-cr command?
>>>>>>>>
>>>>>>>> Some comments below:
>>>>>>>>
>>>>>>>>
>>>>>>>>> ========================================================================
>>>>>>>>>
>>>>>>>>> /var/log/messages
>>>>>>>>>
>>>>>>>>> Apr 25 12:36:07 dcache01 kernel: nfs_refresh_inode: inode number
>>>>>>>>> mismatch
>>>>>>>>> Apr 25 12:36:07 dcache01 kernel: expected (0xa/0x1027), got
>>>>>>>>> (0xa/0x1020)
>>>>>>>>>
>>>>>>>>> ========================================================================
>>>>>>>>>
>>>>>>>>>
>>>>>>>> This is OK. See this bug:
>>>>>>>>
>>>>>>>> https://savannah.cern.ch/bugs/index.php?func=detailitem&item_id=10131
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> /var/log/PnfsDomain.log
>>>>>>>>>
>>>>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : Exception in
>>>>>>>>> mapPath (pathfinder) java.lang.NullPointerException
>>>>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) :
>>>>>>>>> java.lang.NullPointerException
>>>>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
>>>>>>>>> diskCacheV111.namespace.provider.BasicNameSpaceProvider.pnfsidToPath(BasicNameSpaceProvider.java:351)
>>>>>>>>>
>>>>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
>>>>>>>>> diskCacheV111.namespace.PnfsManagerV3.pathfinder(PnfsManagerV3.java:892)
>>>>>>>>>
>>>>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
>>>>>>>>> diskCacheV111.namespace.PnfsManagerV3.mapPath(PnfsManagerV3.java:906)
>>>>>>>>>
>>>>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
>>>>>>>>> diskCacheV111.namespace.PnfsManagerV3.processPnfsMessage(PnfsManagerV3.java:1060)
>>>>>>>>>
>>>>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
>>>>>>>>> diskCacheV111.namespace.PnfsManagerV3$ProcessThread.run(PnfsManagerV3.java:952)
>>>>>>>>>
>>>>>>>>> 04/24 01:11:20 Cell(PnfsManager@pnfsDomain) : at
>>>>>>>>> java.lang.Thread.run(Thread.java:534)
>>>>>>>>>
>>>>>>>>> ==========================================================================
>>>>>>>>>
>>>>>>>>>
>>>>>>>> This seems to be OK.
>>>>>>>>
>>>>>>>>
>>>>>>>>> /var/log/srm-dcache01Domain.log
>>>>>>>>>
>>>>>>>>> 04/25 11:11:00 Cell(SRM-dcache01@srm-dcache01Domain) :
>>>>>>>>> PutRequestHandler error: changing fr#-2147431311 to Done
>>>>>>>>> 04/25 11:11:48 Cell(SRM-dcache01@srm-dcache01Domain) :
>>>>>>>>> PutRequestHandler error: copy request state changed to Done
>>>>>>>>>
>>>>>>>>> ==========================================================================
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yep, this is fine.
>>>>>>>>
>>>>>>>>
>>>>>>>>> /var/log/gridftp-dcache01Domain.log
>>>>>>>>>
>>>>>>>>> 04/24 15:56:44
>>>>>>>>> Cell(GFTP-dcache01-Unknown-228@gridftp-dcache01Domain) :
>>>>>>>>> SocketAdapter: SocketRedirector(Thread-217):Starting a
>>>>>>>>> SocketRedirector
>>>>>>>>>
>>>>>>>>> ==========================================================================
>>>>>>>>>
>>>>>>>>> /var/log/gridftp-<gridftp-node>.log
>>>>>>>>>
>>>>>>>>> same as above.
>>>>>>>>>
>>>>>>>>> ==========================================================================
>>>>>>>>>
>>>>>>>>>
>>>>>>>> This all seems normal behaviour to me.
>>>>>>>>
>>>>>>>> Can you give me an example of the command that you are running?
>>>>>>>>
>>>>>>>> gc
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> if you need anything else let me know. BTW it would be good if
>>>>>>>>> the log files were grouped under a common dcache directory in
>>>>>>>>> /var/log. I've also noticed that on all the nodes the log files
>>>>>>>>> of all domains are created (probably because I started
>>>>>>>>> dcache-core). It would be good that only the one that are
>>>>>>>>> actually used were there the others just confuse things.
>>>>>>>>>
>>>>>>>>> ta
>>>>>>>>>
>>>>>>>>> cheers
>>>>>>>>> alessandra
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Greig A Cowan wrote:
>>>>>>>>>
>>>>>>>>>> Hi Alessandra,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> is it possible to delete directories in dcache? I'm trying to
>>>>>>>>>>> do some clean up. I'm also having problems with the SFTs right
>>>>>>>>>>> now. They fail because they often time out on the lcg-cr
>>>>>>>>>>> command. However srm commands seem to work perfectly.
>>>>>>>>>>>
>>>>>>>>>> You can delete empty directories using rmdir as root on the
>>>>>>>>>> pnfs node (or any of the nodes where pnfs is mounted).
>>>>>>>>>>
>>>>>>>>>> Hmm, not sure about the lcg-cr commands. Do you see anything in
>>>>>>>>>> the log files to give us more of a clue?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I solved the problems I had with the WEB interface not
>>>>>>>>>>> reporting doors correctly after the upgrade. I thought it was
>>>>>>>>>>> either network or the the overlooaded head node but it was
>>>>>>>>>>> stale java processes on the nodes that I couldn't reboot.
>>>>>>>>>>>
>>>>>>>>>> Thinking about it again, I have seen stale java processes
>>>>>>>>>> causing problems when you try and start up gridftp doors. I'll
>>>>>>>>>> add something to the wiki about this.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Greig
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
--
*******************************************
* Dr Alessandra Forti *
* Technical Coordinator - NorthGrid Tier2 *
* http://www.hep.man.ac.uk/u/aforti *
*******************************************
|