Hi,
Hmmm, yes there's a houly cron (on the hour so it's probably still
running if the SFT gets through the queue quickly) that du's the dCache
area to get a per VO breakdown of usage. I'll disable it and see if the
SFT pass rate improves.
Thanks,
Chris.
p.s. Anyone know of another way of getting the information (A query on
the DB perhaps)?
> -----Original Message-----
> From: GRIDPP2: Deployment and support of SRM and local
> storage management [mailto:[log in to unmask]] On
> Behalf Of Greig A Cowan
> Sent: 22 May 2006 12:15
> To: [log in to unmask]
> Subject: Re: dCache SFT Failures
>
> Hi Chris,
>
> I've seen this before, but it's unclear to me what causes it.
> Looking at your latest SFT failure (10:10), the lcg-cp
> command was successful, but the subsequent lcg-rep failed.
>
> Is there something else running on your dCache node which
> could be interfering with pnfs? Maybe a cron job of some sort?
>
> Cheers,
> Greig
>
>
> On Mon, 22 May 2006, Brew, CAJ (Chris) wrote:
>
> > Hi All,
> >
> > I'm getting a lot of random failures in the SFTs from my
> dCache where
> > the write of the file to the dCache appears successful but
> then when
> > the SFT tries to read the file back you get:
> >
> > + lcg-cp -v --vo dteam
> > + lfn:sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
> >
> file:///scratch/WMS_heplnx48_018249_https_3a_2f_2fgdrb02.cern.ch_3a900
> > 0_ 2fLxXmsliu9ehFjCWOYEcxQg/sft-lcg-rm-cp.txt
> > the server sent an error response: 553 553 Permission
> denied, reason:
> > CacheException(rc=666;msg=can't get pnfsId (not a pnfsfile))
> >
> > lcg_cp: Permission denied
> > Using grid catalog type: lfc
> > Using grid catalog : prod-lfc-shared-central.cern.ch
> >
> > It appears that the write was indeed successful because the
> same SFT
> > can later replicate it to CERN:
> >
> > Replicate the file from the default SE to castorgrid.cern.ch
> >
> > + lcg-rep -v --vo dteam -d castorgrid.cern.ch
> > lfn:sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
> >
> > 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
> > 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
> > 0 bytes 0.00 KB/sec avg 0.00 KB/sec
> instUsing grid
> > catalog type: lfc
> > Using grid catalog : prod-lfc-shared-central.cern.ch Source URL:
> > lfn:/grid/dteam/SFT/sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
> > File size: 233
> > VO name: dteam
> > Destination specified: castorgrid.cern.ch Source URL for copy:
> >
> gsiftp://heplnx204.pp.rl.ac.uk:2811//pnfs/pp.rl.ac.uk/data/dteam/gener
> > at
> > ed/2006-05-22/file330985b9-5368-4e67-82ec-5ee6f6fd4fa8
> > Destination URL for copy:
> >
> gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2006-0
> > 5- 22/file8c15f735-de68-4949-aba5-33c9098462ff
> > # streams: 1
> > # set timeout to 0
> >
> > Transfer took 2020 ms
> > Destination URL registered in LRC:
> >
> sfn://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2006-05-2
> > 2/ file8c15f735-de68-4949-aba5-33c9098462ff
> > + result=0
> > + set +x
> >
> > List replicas to check if replication was really successful
> >
> > + lcg-lr --vo dteam
> lfn:sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
> >
> sfn://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2006-05-2
> > 2/ file8c15f735-de68-4949-aba5-33c9098462ff
> >
> srm://heplnx204.pp.rl.ac.uk/pnfs/pp.rl.ac.uk/data/dteam/generated/2006
> > -0
> > 5-22/file330985b9-5368-4e67-82ec-5ee6f6fd4fa8
> > + set +x
> >
> > I was always getting a few of these but since I added extra
> VOs a week
> > ago I now seem to failing between 30 and 50% of the SFT
> runs with this
> > alone.
> >
> > I haven't managed to replicate the error by copying files
> in and out
> > multiple times and the SFT deletes the file so I cannot check the
> > status of the file the see the error with.
> >
> > Googling for the error seems to show that it's not uncommon but I
> > don't see and indications of cause or solution. There
> doesn't seem to
> > be anything in the logs.
> >
> > Anyone know what I can do about this (other than install DPM)?
> >
> > Thanks,
> > Chris.
> >
> > Examples taken from:
> >
> >
> https://lcg-sft.cern.ch/sft/info/heplnx201.pp.rl.ac.uk/sft_2006-05-22_
> > 07
> > .10.05.html#sft-lcg-rm_2006-05-22_07:22:49
> >
>
> --
> ==============================================================
> ==========
> Dr Greig A Cowan
> http://www.ph.ed.ac.uk/~gcowan1
> School of Physics, University of Edinburgh, James Clerk
> Maxwell Building
>
> TIER-2 STORAGE SUPPORT PAGES:
> http://wiki.gridpp.ac.uk/wiki/Grid_Storage
> ==============================================================
> ==========
>
|