Print

Print


Great, thanks Lee, I might well take you up on that!

Cheers,
Jez

--
Jez Cope (he/him)<https://pronoun.is/he> • Data Services Lead, British Library • 01937 54 6241 • [log in to unmask]

From: Research Data Management discussion list <[log in to unmask]> On Behalf Of Lee Wilson
Sent: 07 December 2018 18:10
To: [log in to unmask]
Subject: Re: Publishing very large datasets

Hi Jez,

Just to follow up on Eugene’s comment, I am the Service Manager for FRDR and would be happy to join in on any future discussion. Also, returning to your original message, Globus actually uses GridFTP as the backbone of its file transfer service.

Cheers,

Lee

--
Lee Wilson
Service Manager | Portage/ACENET
613.482.9344 ext. 108 | www.portagenetwork.ca<http://www.portagenetwork.ca/> | @portageCARLABRL

From: Research Data Management discussion list <[log in to unmask]<mailto:[log in to unmask]>> On Behalf Of Eugene Barsky
Sent: December 7, 2018 12:52 PM
To: [log in to unmask]<mailto:[log in to unmask]>
Subject: Re: Publishing very large datasets

Hello Jez:

For very large datasets in Canada, we tend to work with our national Federated Research Data Repository (FRDR) service - https://www.frdr.ca. It runs Globus as a backend and UBC designed Open Collections interface as a front end.

Globus is amazing for moving large datasets from place to place, as any machine or server could work as an endpoint for data transfer. Not to mention that FRDR also allows us to preserve digital data into Archivematica platform...

Happy to talk more about our Canadian experience.

Eugene (@UBC in Vancouver)





On Fri, 7 Dec 2018 at 01:51, Cope, Jez <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Hi folks, we’re currently looking at ways of improving the way we deliver the largest datasets on https://data.bl.uk/ to users. The largest today are several hundred GB, and it won’t be long before we’re into the TB. Large downloads can be a pain for users because they take a long time and can easily be interrupted. They also potentially present a significant cost to us as a data provider because outbound bandwidth costs can be high for data stored in the cloud. I know this is something that many people in the community will already have grappled with so I’m hoping there will be some experience to share.

Possibilities we’ve discussed so far include:

•         Just let people download over HTTP but advise use of a download manager to handle interruptions to the connection

•         Publish via BitTorrent, which  has the advantage that if a large number of people are downloading the same thing at once (e.g. during a workshop) our outgoing bandwidth use could be significantly less than filesize × number of people

•         Allow people to request a copy on disk via courier, probably charged to cover costs

•         Split datasets into smaller chunks to make it easier to get just the bit you need (but makes it more effort if you do want the whole lot)

•         Allow users to move their compute to the data, either in the cloud or by renting out space in our machine room (this is essentially what AWS have done for some big public datasets<https://aws.amazon.com/opendata/public-datasets/>)

•         Provide a dedicated API and/or UI to allow users to browse the collection and select a custom subset to download

The last two would probably be my preference in the long run, but require the most resource to set up and maintain.

There was also some discussion a few years ago of using GridFTP but I don’t know where those went.

Any advice would be most welcome and I’d be happy to summarise responses for the community. Feel free to reply to me directly, and I’ll withhold or anonymise your advice from the summary on request.

Many thanks,
Jez

________________________________
[Description: Description: Description: cid:image001.gif@01CF1D12.BB7DE2C0]



Jez Cope (he/him<https://pronoun.is/he>)
Data Services Lead
Research Services

The British Library
Building 6a
Boston Spa
Wetherby
West Yorkshire
LS23 7BQ

www.bl.uk<http://www.bl.uk/>



01937 546241
[log in to unmask]<mailto:[log in to unmask]>

________________________________



******************************************************************************************************************
Experience the British Library online at www.bl.uk<http://www.bl.uk/>
The British Library’s latest Annual Report and Accounts : www.bl.uk/aboutus/annrep/index.html<http://www.bl.uk/aboutus/annrep/index.html>
Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook<http://www.bl.uk/adoptabook>
The Library's St Pancras site is WiFi - enabled
*****************************************************************************************************************
The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the [log in to unmask]<mailto:[log in to unmask]> : The contents of this e-mail must not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.
*****************************************************************************************************************
Think before you print

________________________________

To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1

________________________________

To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1

________________________________

To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1

########################################################################

To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1