Hi,
Iıve vacuumed the database (and switched on the autovacuum as detailed in
the wiki article) but if I move the gridftp door to the pool node I still
get the errors.
How much work is it to move PNFS off to a separate server? Would it be
possible to move it back if it doesnıt solve the problem?
Yours,
Chris.
On 25/5/06 18:04, "Greig A Cowan" <[log in to unmask]> wrote:
>> We don't run gridftp doors on our pools - the gridftp doors run on
>> completely seperate systems. Yes, pnfs is on its own server.
>>
>> Have you vacuumed your database recently?
>>
>> http://www.postgresql.org/docs/8.1/interactive/maintenance.html#ROUTINE-VACUU
>> MING
>
> Hi Derek,
>
> In this article in the wiki:
>
> https://www.gridpp.ac.uk/wiki/PostgreSQL_administration
>
> I think it should be
>
> stats_start_collector = on
>
> and not
>
> stat_start_collector = on
>
> DO you agree? I can't find the stat_start_collector variable in the
> relevant postgreSQL file, but I can find the other form.
>
> Greig
>
>>
>> Derek
>>
>>
>>>
>>> Thanks,
>>> Chris.
>>>
>>>> -----Original Message-----
>>>> From: Greig A Cowan [mailto:[log in to unmask]]
>>>> Sent: 24 May 2006 18:13
>>>> To: Brew, CAJ (Chris)
>>>> Cc: [log in to unmask]
>>>> Subject: RE: dCache SFT Failures
>>>>
>>>>
>>>> Hi Chris,
>>>>
>>>> I've just read your post on the user-forum. That's very
>>>> interesting what you've found. Could we be seeing a scaling
>>>> problem with dCache? I hadn't realised that you were
>>>> supporting 24 VOs, each with their own database.
>>>>
>>>> I'll need to look into it, but there might be an option
>>>> within pnfs that lets you control things like this.
>>>>
>>>> Greig
>>>>
>>>>
>>>> On Wed, 24 May 2006, Brew, CAJ (Chris) wrote:
>>>>
>>>>> Hi Grieg,
>>>>>
>>>>> I've just been running some tests and have a bit more info
>>>> which I've
>>>>> just posted to the dcache user-forum it looks like the file
>>>> info isn't
>>>>> getting into the pnfs databases quickly enough.
>>>>>
>>>>> That's probably why I'm failing the sfts but haven't heard
>>>> complaints
>>>>> from users.
>>>>>
>>>>> I'm not sure where to take this from here unless I can tune
>>>> the DB to
>>>>> get the info in quicker.
>>>>>
>>>>> Yours,
>>>>> Chris.
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: GRIDPP2: Deployment and support of SRM and local storage
>>>>>> management [mailto:[log in to unmask]] On
>>>> Behalf Of Greig
>>>>>> A Cowan
>>>>>> Sent: 24 May 2006 17:56
>>>>>> To: [log in to unmask]
>>>>>> Subject: Re: dCache SFT Failures
>>>>>>
>>>>>> Hi Chris,
>>>>>>
>>>>>> I see that you are still failing the SFTs, in fact, the
>>> situation
>>>>>> seems worse than before!
>>>>>>
>>>>>> You are definitely using the correct pnfs mount options,
>>>> aren't you?
>>>>>> Have you tried rebooting the machine?
>>>>>>
>>>>>> Greig
>>>>>>
>>>>>> On Tue, 23 May 2006, Brew, CAJ (Chris) wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: GRIDPP2: Deployment and support of SRM and
>>>> local storage
>>>>>>>> management [mailto:[log in to unmask]] On
>>>>>> Behalf Of Greig
>>>>>>>> A Cowan
>>>>>>>> Sent: 23 May 2006 12:14
>>>>>>>> To: [log in to unmask]
>>>>>>>> Subject: Re: dCache SFT Failures
>>>>>>>>
>>>>>>>> Hi Chris,
>>>>>>>>
>>>>>>>> what are the permissions of the generated directory that
>>>>>> the SFT is
>>>>>>>> trying to write into?
>>>>>>>
>>>>>>> dteam001:dteam drwxr-xr-x
>>>>>>>
>>>>>>> As all the dteam directories appear to be.
>>>>>>>
>>>>>>>> What options are you using when mounting pnfs on pool nodes?
>>>>>>>
>>>>>>> Hmm, from /etc/mtab:
>>>>>>>
>>>>>>> heplnx204.pp.rl.ac.uk:/pnfsdoors /pnfs/pp.rl.ac.uk nfs
>>>>>>> rw,addr=130.246.47.204 0 0
>>>>>>> heplnx204.pp.rl.ac.uk:/fs /pnfs/fs nfs
>>>>>>> rw,hard,intr,noac,addr=130.246.47.204 0 0
>>>>>>>
>>>>>>> I had a problem earlier where the /fs filesystem
>>> hadn't mounted
>>>>>>> and the doors weren't working on the pool node, I ended
>>>> up fixing
>>>>>>> it by putting it in /etc/fstab. I've remounted it
>>> with the same
>>>>>> options as
>>>>>>> the
>>>>>>> pnfsdoors:
>>>>>>>
>>>>>>> heplnx204.pp.rl.ac.uk:/pnfsdoors /pnfs/pp.rl.ac.uk nfs
>>>>>>> rw,addr=130.246.47.204 0 0
>>>>>>> heplnx204.pp.rl.ac.uk:/fs /pnfs/fs nfs
>>>> rw,addr=130.246.47.204 0 0
>>>>>>>
>>>>>>> Are the dCache filesystems in your fstab? what are
>>> the options?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Chris.
>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Greig
>>>>>>>>
>>>>>>>> On Tue, 23 May 2006, Brew, CAJ (Chris) wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Removing the cron job doesn't seem to have solved the
>>>>>> problem, the
>>>>>>>>> load on the machine is pretty low. Any other things I
>>>>>> can try, My
>>>>>>>>> reliability is really low at the moment because of this.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Greig A Cowan [mailto:[log in to unmask]]
>>>>>>>>>> Sent: 22 May 2006 12:33
>>>>>>>>>> To: Brew, CAJ (Chris)
>>>>>>>>>> Cc: [log in to unmask]
>>>>>>>>>> Subject: RE: dCache SFT Failures
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hmmm, yes there's a houly cron (on the hour so it's
>>>>>>>> probably still
>>>>>>>>>>> running if the SFT gets through the queue
>>> quickly) that
>>>>>>>> du's the
>>>>>>>>>>> dCache area to get a per VO breakdown of usage.
>>>>>> I'll disable
>>>>>>>>>>> it and see if the SFT pass rate improves.
>>>>>>>>>>
>>>>>>>>>> You could run the cron at half past the hour instead. Do
>>>>>>>> you really
>>>>>>>>>> need to run the cron every hour? The Tier-1 just run
>>>>>> a similar
>>>>>>>>>> command each night at 12pm.
>>>>>>>>>>
>>>>>>>>>>> p.s. Anyone know of another way of getting the
>>>> information
>>>>>>>>>> (A query on
>>>>>>>>>>> the DB perhaps)?
>>>>>>>>>>
>>>>>>>>>> Unfortunately not. I asked about this, but it's
>>>> not possible
>>>>>>>>>> with dCache at the moment. It should be available in a
>>>>>>>>>> future
>>>>>>>> release...
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: GRIDPP2: Deployment and support of SRM and
>>>>>>>> local storage
>>>>>>>>>>>> management [mailto:[log in to unmask]] On
>>>>>>>>>> Behalf Of Greig
>>>>>>>>>>>> A Cowan
>>>>>>>>>>>> Sent: 22 May 2006 12:15
>>>>>>>>>>>> To: [log in to unmask]
>>>>>>>>>>>> Subject: Re: dCache SFT Failures
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Chris,
>>>>>>>>>>>>
>>>>>>>>>>>> I've seen this before, but it's unclear to me
>>>>>> what causes it.
>>>>>>>>>>>> Looking at your latest SFT failure (10:10),
>>> the lcg-cp
>>>>>>>>>> command was
>>>>>>>>>>>> successful, but the subsequent lcg-rep failed.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there something else running on your dCache node
>>>>>>>>>>>> which
>>>>>>>>>> could be
>>>>>>>>>>>> interfering with pnfs? Maybe a cron job of
>>> some sort?
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Greig
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, 22 May 2006, Brew, CAJ (Chris) wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm getting a lot of random failures in the
>>>> SFTs from
>>>>>>>>>>>> my
>>>>>>>>>>>> dCache where
>>>>>>>>>>>> the write of the file to the dCache appears
>>>>>> successful but
>>>>>>>>>>>> then when
>>>>>>>>>>>> the SFT tries to read the file back you get:
>>>>>>>>>>>>
>>>>>>>>>>>> + lcg-cp -v --vo dteam
>>>>>>>>>>>> +
>>> lfn:sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> file:///scratch/WMS_heplnx48_018249_https_3a_2f_2fgdrb02.cern.ch_3a9
>>>>>>>>>>>> 00
>>>>>>>>>>>> 0_ 2fLxXmsliu9ehFjCWOYEcxQg/sft-lcg-rm-cp.txt
>>>>>>>>>>>> the server sent an error response: 553 553
>>>> Permission
>>>>>>>>>>>> denied, reason:
>>>>>>>>>>>> CacheException(rc=666;msg=can't get pnfsId (not a
>>>>>>>>>>>> pnfsfile))
>>>>>>>>>>>>
>>>>>>>>>>>> lcg_cp: Permission denied Using grid
>>>> catalog type: lfc
>>>>>>>>>>>> Using grid catalog :
>>>>>>>>>>>> prod-lfc-shared-central.cern.ch
>>>>>>>>>>>>
>>>>>>>>>>>> It appears that the write was indeed successful
>>>>>>>>>>>> because the
>>>>>>>>>>>> same SFT
>>>>>>>>>>>> can later replicate it to CERN:
>>>>>>>>>>>>
>>>>>>>>>>>> Replicate the file from the default SE to
>>>>>>>>>>>> castorgrid.cern.ch
>>>>>>>>>>>>
>>>>>>>>>>>> + lcg-rep -v --vo dteam -d castorgrid.cern.ch
>>>>>>>>>>>> lfn:sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
>>>>>>>>>>>>
>>>>>>>>>>>> 0 bytes 0.00 KB/sec avg
>>> 0.00
>>>>>>>> KB/sec inst
>>>>>>>>>>>> 0 bytes 0.00 KB/sec avg
>>> 0.00
>>>>>>>> KB/sec inst
>>>>>>>>>>>> 0 bytes 0.00 KB/sec avg
>>>>>> 0.00 KB/sec
>>>>>>>>>>>> instUsing grid
>>>>>>>>>>>> catalog type: lfc
>>>>>>>>>>>> Using grid catalog :
>>>>>>>> prod-lfc-shared-central.cern.ch Source URL:
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>> lfn:/grid/dteam/SFT/sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.060522072
>>>>>>>>>> 2
>>>>>>>>>>>> File size: 233
>>>>>>>>>>>> VO name: dteam
>>>>>>>>>>>> Destination specified: castorgrid.cern.ch Source
>>>>>>>> URL for copy:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> gsiftp://heplnx204.pp.rl.ac.uk:2811//pnfs/pp.rl.ac.uk/data/dteam/gen
>>>>>>>>>>>> er
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>> ed/2006-05-22/file330985b9-5368-4e67-82ec-5ee6f6fd4fa8
>>>>>>>>>>>> Destination URL for copy:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2006
>>>>>>>>>>>> -0
>>>>>>>>>>>> 5- 22/file8c15f735-de68-4949-aba5-33c9098462ff
>>>>>>>>>>>> # streams: 1
>>>>>>>>>>>> # set timeout to 0
>>>>>>>>>>>>
>>>>>>>>>>>> Transfer took 2020 ms
>>>>>>>>>>>> Destination URL registered in LRC:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> sfn://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2006-05
>>>>>>>>>>>> -2
>>>>>>>>>>>> 2/ file8c15f735-de68-4949-aba5-33c9098462ff
>>>>>>>>>>>> + result=0
>>>>>>>>>>>> + set +x
>>>>>>>>>>>>
>>>>>>>>>>>> List replicas to check if replication was really
>>>>>>>>>>>> successful
>>>>>>>>>>>>
>>>>>>>>>>>> + lcg-lr --vo dteam
>>>>>>>>>>>> lfn:sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> sfn://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2006-05
>>>>>>>>>>>> -2
>>>>>>>>>>>> 2/ file8c15f735-de68-4949-aba5-33c9098462ff
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> srm://heplnx204.pp.rl.ac.uk/pnfs/pp.rl.ac.uk/data/dteam/generated/20
>>>>>>>>>>>> 06
>>>>>>>>>>>> -0
>>>>>>>>>>>> 5-22/file330985b9-5368-4e67-82ec-5ee6f6fd4fa8
>>>>>>>>>>>> + set +x
>>>>>>>>>>>>
>>>>>>>>>>>> I was always getting a few of these but
>>>> since I added
>>>>>>>>>>>> extra
>>>>>>>>>>>> VOs a week
>>>>>>>>>>>> ago I now seem to failing between 30 and
>>> 50% of the
>>>>>>>>>>>> SFT
>>>>>>>>>>>> runs with this
>>>>>>>>>>>> alone.
>>>>>>>>>>>>
>>>>>>>>>>>> I haven't managed to replicate the error by
>>>>>> copying files
>>>>>>>>>>>> in and out
>>>>>>>>>>>> multiple times and the SFT deletes the file so I
>>>>>>>>>>>> cannot
>>>>>>>>>> check the
>>>>>>>>>>>> status of the file the see the error with.
>>>>>>>>>>>>
>>>>>>>>>>>> Googling for the error seems to show that it's not
>>>>>>>>>> uncommon but I
>>>>>>>>>>>> don't see and indications of cause or
>>>> solution. There
>>>>>>>>>>>> doesn't seem to
>>>>>>>>>>>> be anything in the logs.
>>>>>>>>>>>>
>>>>>>>>>>>> Anyone know what I can do about this (other than
>>>>>>>> install DPM)?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Chris.
>>>>>>>>>>>>
>>>>>>>>>>>> Examples taken from:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>> https://lcg-sft.cern.ch/sft/info/heplnx201.pp.rl.ac.uk/sft_2006-05-2
>>>>>>>>>>>> 2_
>>>>>>>>>>>> 07
>>>>>>>>>>>> .10.05.html#sft-lcg-rm_2006-05-22_07:22:49
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>> ============================================================
>>>>>>>>>>>> ==
>>>>>>>>>>>> ==========
>>>>>>>>>>>> Dr Greig A Cowan
>>>>>>>>>>>> http://www.ph.ed.ac.uk/~gcowan1 School of Physics,
>>>>>>>>>>>> University of Edinburgh, James
>>>>>>>> Clerk Maxwell
>>>>>>>>>>>> Building
>>>>>>>>>>>>
>>>>>>>>>>>> TIER-2 STORAGE SUPPORT PAGES:
>>>>>>>>>>>> http://wiki.gridpp.ac.uk/wiki/Grid_Storage
>>>>>>>>>>>>
>>>>>> ============================================================
>>>>>>>>>>>> ==
>>>>>>>>>>>> ==========
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>> ============================================================
>>>>>>>>>> ==
>>>>>>>>>> ==========
>>>>>>>>>> Dr Greig A Cowan
>>>>>>>>>> http://www.ph.ed.ac.uk/~gcowan1 School of Physics,
>>>>>>>>>> University of Edinburgh, James
>>>>>> Clerk Maxwell
>>>>>>>>>> Building
>>>>>>>>>>
>>>>>>>>>> TIER-2 STORAGE SUPPORT PAGES:
>>>>>>>>>> http://wiki.gridpp.ac.uk/wiki/Grid_Storage
>>>>>>>>>>
>>>> ============================================================
>>>>>>>>>> ==
>>>>>>>>>> ==========
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>> ==============================================================
>>>>>>>> ==========
>>>>>>>> Dr Greig A Cowan
>>>>>>>> http://www.ph.ed.ac.uk/~gcowan1
>>>>>>>> School of Physics, University of Edinburgh, James
>>>> Clerk Maxwell
>>>>>>>> Building
>>>>>>>>
>>>>>>>> TIER-2 STORAGE SUPPORT PAGES:
>>>>>>>> http://wiki.gridpp.ac.uk/wiki/Grid_Storage
>>>>>>>>
>>> ==============================================================
>>>>>>>> ==========
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> ==============================================================
>>>>>> ==========
>>>>>> Dr Greig A Cowan
>>>>>> http://www.ph.ed.ac.uk/~gcowan1
>>>>>> School of Physics, University of Edinburgh, James Clerk Maxwell
>>>>>> Building
>>>>>>
>>>>>> TIER-2 STORAGE SUPPORT PAGES:
>>>>>> http://wiki.gridpp.ac.uk/wiki/Grid_Storage
>>>>>> ==============================================================
>>>>>> ==========
>>>>>>
>>>>>
>>>>
>>>> --
>>>> ==============================================================
>>>> ==========
>>>> Dr Greig A Cowan
>>>> http://www.ph.ed.ac.uk/~gcowan1
>>>> School of Physics, University of Edinburgh, James Clerk
>>>> Maxwell Building
>>>>
>>>> TIER-2 STORAGE SUPPORT PAGES:
>>>> http://wiki.gridpp.ac.uk/wiki/Grid_Storage
>>>> ==============================================================
>>>> ==========
>>>>
>>>
>>
>
> --
> ========================================================================
> Dr Greig A Cowan http://www.ph.ed.ac.uk/~gcowan1
> School of Physics, University of Edinburgh, James Clerk Maxwell Building
>
> TIER-2 STORAGE SUPPORT PAGES: http://wiki.gridpp.ac.uk/wiki/Grid_Storage
> ========================================================================
>
|