> Hmmm, yes there's a houly cron (on the hour so it's probably still
> running if the SFT gets through the queue quickly) that du's the dCache
> area to get a per VO breakdown of usage. I'll disable it and see if the
> SFT pass rate improves.
You could run the cron at half past the hour instead. Do you really need
to run the cron every hour? The Tier-1 just run a similar command each
night at 12pm.
> p.s. Anyone know of another way of getting the information (A query on
> the DB perhaps)?
Unfortunately not. I asked about this, but it's not possible with dCache
at the moment. It should be available in a future release...
>
> > -----Original Message-----
> > From: GRIDPP2: Deployment and support of SRM and local
> > storage management [mailto:[log in to unmask]] On
> > Behalf Of Greig A Cowan
> > Sent: 22 May 2006 12:15
> > To: [log in to unmask]
> > Subject: Re: dCache SFT Failures
> >
> > Hi Chris,
> >
> > I've seen this before, but it's unclear to me what causes it.
> > Looking at your latest SFT failure (10:10), the lcg-cp
> > command was successful, but the subsequent lcg-rep failed.
> >
> > Is there something else running on your dCache node which
> > could be interfering with pnfs? Maybe a cron job of some sort?
> >
> > Cheers,
> > Greig
> >
> >
> > On Mon, 22 May 2006, Brew, CAJ (Chris) wrote:
> >
> > > Hi All,
> > >
> > > I'm getting a lot of random failures in the SFTs from my
> > dCache where
> > > the write of the file to the dCache appears successful but
> > then when
> > > the SFT tries to read the file back you get:
> > >
> > > + lcg-cp -v --vo dteam
> > > + lfn:sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
> > >
> > file:///scratch/WMS_heplnx48_018249_https_3a_2f_2fgdrb02.cern.ch_3a900
> > > 0_ 2fLxXmsliu9ehFjCWOYEcxQg/sft-lcg-rm-cp.txt
> > > the server sent an error response: 553 553 Permission
> > denied, reason:
> > > CacheException(rc=666;msg=can't get pnfsId (not a pnfsfile))
> > >
> > > lcg_cp: Permission denied
> > > Using grid catalog type: lfc
> > > Using grid catalog : prod-lfc-shared-central.cern.ch
> > >
> > > It appears that the write was indeed successful because the
> > same SFT
> > > can later replicate it to CERN:
> > >
> > > Replicate the file from the default SE to castorgrid.cern.ch
> > >
> > > + lcg-rep -v --vo dteam -d castorgrid.cern.ch
> > > lfn:sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
> > >
> > > 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
> > > 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
> > > 0 bytes 0.00 KB/sec avg 0.00 KB/sec
> > instUsing grid
> > > catalog type: lfc
> > > Using grid catalog : prod-lfc-shared-central.cern.ch Source URL:
> > > lfn:/grid/dteam/SFT/sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
> > > File size: 233
> > > VO name: dteam
> > > Destination specified: castorgrid.cern.ch Source URL for copy:
> > >
> > gsiftp://heplnx204.pp.rl.ac.uk:2811//pnfs/pp.rl.ac.uk/data/dteam/gener
> > > at
> > > ed/2006-05-22/file330985b9-5368-4e67-82ec-5ee6f6fd4fa8
> > > Destination URL for copy:
> > >
> > gsiftp://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2006-0
> > > 5- 22/file8c15f735-de68-4949-aba5-33c9098462ff
> > > # streams: 1
> > > # set timeout to 0
> > >
> > > Transfer took 2020 ms
> > > Destination URL registered in LRC:
> > >
> > sfn://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2006-05-2
> > > 2/ file8c15f735-de68-4949-aba5-33c9098462ff
> > > + result=0
> > > + set +x
> > >
> > > List replicas to check if replication was really successful
> > >
> > > + lcg-lr --vo dteam
> > lfn:sft-lcg-rm-cr-heplnx48.pp.rl.ac.uk.0605220722
> > >
> > sfn://castorgrid.cern.ch/castor/cern.ch/grid/dteam/generated/2006-05-2
> > > 2/ file8c15f735-de68-4949-aba5-33c9098462ff
> > >
> > srm://heplnx204.pp.rl.ac.uk/pnfs/pp.rl.ac.uk/data/dteam/generated/2006
> > > -0
> > > 5-22/file330985b9-5368-4e67-82ec-5ee6f6fd4fa8
> > > + set +x
> > >
> > > I was always getting a few of these but since I added extra
> > VOs a week
> > > ago I now seem to failing between 30 and 50% of the SFT
> > runs with this
> > > alone.
> > >
> > > I haven't managed to replicate the error by copying files
> > in and out
> > > multiple times and the SFT deletes the file so I cannot check the
> > > status of the file the see the error with.
> > >
> > > Googling for the error seems to show that it's not uncommon but I
> > > don't see and indications of cause or solution. There
> > doesn't seem to
> > > be anything in the logs.
> > >
> > > Anyone know what I can do about this (other than install DPM)?
> > >
> > > Thanks,
> > > Chris.
> > >
> > > Examples taken from:
> > >
> > >
> > https://lcg-sft.cern.ch/sft/info/heplnx201.pp.rl.ac.uk/sft_2006-05-22_
> > > 07
> > > .10.05.html#sft-lcg-rm_2006-05-22_07:22:49
> > >
> >
> > --
> > ==============================================================
> > ==========
> > Dr Greig A Cowan
> > http://www.ph.ed.ac.uk/~gcowan1
> > School of Physics, University of Edinburgh, James Clerk
> > Maxwell Building
> >
> > TIER-2 STORAGE SUPPORT PAGES:
> > http://wiki.gridpp.ac.uk/wiki/Grid_Storage
> > ==============================================================
> > ==========
> >
>
--
=======================================================================
Dr Greig A Cowan http://www.ph.ed.ac.uk/~gcowan1
School of Physics, University of Edinburgh, James Clerk Maxwell Building
TIER-2 STORAGE SUPPORT PAGES: http://wiki.gridpp.ac.uk/wiki/Grid_Storage
=======================================================================
|