JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for GRIDPP-STORAGE Archives


GRIDPP-STORAGE Archives

GRIDPP-STORAGE Archives


GRIDPP-STORAGE@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

GRIDPP-STORAGE Home

GRIDPP-STORAGE Home

GRIDPP-STORAGE  June 2012

GRIDPP-STORAGE June 2012

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Agenda for tomorrow

From:

"Christopher J. Walker" <[log in to unmask]>

Reply-To:

Christopher J. Walker

Date:

Wed, 27 Jun 2012 00:14:31 +0100

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (122 lines)

On 26/06/12 22:18, Jens Jensen wrote:
> Oh and before you ask, I wrote it in C :-)

The Castor guys have something written in python - I'll dig out a copy, 
but Shawn deWitt gave it to me IIRC.

Bestman has:

https://www.opensciencegrid.org/bin/view/Storage/BestmanAdler32Checksum

also runs through and calculates the checksum - you'd just need to 
modify it slightly to do a comparison (oh and worry about whether you do 
leading zeros or not).

In fact I ran this on some storage at QMUL - and nearly had a heart 
attack - due to the leading zeros.

>
> Thanks, Chris.
>
> On the subject of the "slow dump", this is something I played with at /home
> (ie /home/jens at home, so home squared) following my corrupted file blog
> post, although I haven't got anything running yet but it should only be
> a matter of some holiday and I should get it running.
>
> My idea was I'd open a file and checksum it (using ADLER32 implemented in
> a separate compilation unit compiled with high optimisation), and the
> program would recurse through a given toplevel directory (like /home).
> Whenever it had checksummed a number of files whose combined filesizes
> are>N, or a single file of size>N (where N is, say, five megs), it would
> then sleep for a while.

The devil is in the detail - data coming in at 1.5 Gbit means that you'd 
need to checksum in parallel if the machine you are doing the 
checksumming on only has a Gbit connection (realising that checksumming 
1 day of input data was going to take more than a day caused me to worry 
about it). Also, you might wish to randomly sample across disk servers - 
but perhaps checking data from yesterday more rigorously (to make sure 
it really did hit disk). But you are right, it's fundamentally not a 
particularly difficult problem - and one we can and should easily solve.


>
> For any checksum it;d query a database with (name, cksum, ctime, atime) where name is the relative pathname, ctime is the time the entry is created (first checksum), and atime is the most recent time it is checked. Conversely, a checker could work the other way, from the name in the database, to see if files have gone missing.
>
> It could of course be adapted to using RFIO...
>
> Before I finsih writing it, does anyone know if anyone else has written such a tool? I was just hacking but I won't put any serious effort into it if a tool exists...

Chris

>
> Cheers
> --jens
>
> ________________________________________
> From: Christopher J.Walker [[log in to unmask]]
> Sent: 26 June 2012 16:00
> To: Jensen, Jens (STFC,RAL,ESC)
> Cc: [log in to unmask]
> Subject: Re: Agenda for tomorrow
>
> On 26/06/12 15:03, Jens Jensen wrote:
>> Folks,
>>
>> I am at CWI in the Netherlands and need to run soon to catch a rescheduled flight from sunny warm Amsterdam back to rainy old Blighty.
>>
>
> I'll have to send my apologies - I've got another meeting I have to be
> in (and a minor Lustre upgrade to do).
>
>> Things to cover tomorrow (possibly more stuff to be added):
>> * That checksummy thing - does it make sense to syncat (incl checksum) regularlyfor Vos?
>> How much work would that be?
>
> That's two questions.
>
> I think it would make sense for sites to regularly (for some definition
> of regularly) checksum the data held on their storage and compare with
> the known checksum (stored in file metadata for StoRM). This will at
> least warn against silent corruption on disk. Sites can then file a GGUS
> ticket if they do see corruption.
>
> This is quite resource intensive - but could at least potentially be
> throttled depending on site load. It could also be intelligent and try
> to randomly sample files stored on different servers.
>
> A program already exists for Castor - it needs some adapting for StoRM -
> and presumably could be adapted for DPM and dCache too.
>
> I'd estimate it would take a week of my time to get something like this
> working solidly (I just need to find that week).
>
>
> The second question is should we produce syncat dumps regularly. Well,
> the main purpose of this is to do consistency checking against the LFC.
> The only reason for us to produce them regularly is that it is believed
> that doing it systematically through the SRM interface is too resource
> intensive.
>
> Quite frankly, at the moment producing these dumps by hand is incredibly
> resource intensive on my precious time. I think that producing a tool
> that can slowly go through an SRM would be a big step forward  (perhaps
> one that produced syncat dumps) - even if it needed to be throttled and
> took a large amount of time - it would allow a VO to do this without
> site admin involvement - and that's the expensive thing IMHO. If it
> really is too resource intensive, then putting syncat dumps in a
> standard place would be a way forward - but whether we'd get agreement
> on doing this before the storage providers implement their syncing
> method I don't know.
>
>
> I'd estimate 2 weeks of someone's time (possibly more) to write a script
> that did srm calls and produced a syncat dump.  It might also be
> interesting for this to be linked into ganga somehow.
>
> A script from ATLAS already exists under a free licence to compare LFC
> and syncat dumps.
>
>> * Storage-relevant stuff from OGF last week (not that much actually).
>
> Chris

Top of Message | Previous Page | Permalink

JISCMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004


WWW.JISCMAIL.AC.UK

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager