Print

Print


In addition to Globus, which is great for certain use cases, many of
environmental data repositories use OPeNDAP (https://www.opendap.org/) and
THREDDS (https://www.unidata.ucar.edu/software/thredds/current/tds/) to
serve large datasets because they allow the data to be accessed as a whole
but small subsets can be specified and downloaded.  This is particularly
useful for high resolution image datasets that span large spatial or
temporal extents and for which subsetting makes sense.  OPENDaP supports
data subsetting through its web service across a wide variety of data
structures.

Matt
--
Matthew B. Jones
http://orcid.org/0000-0003-0077-4738
Director of Informatics R&D
National Center for Ecological Analysis and Synthesis
University of California Santa Barbara
http://www.nceas.ucsb.edu/ecoinfo
PI, NSF Arctic Data Center
https://arcticdata.io
Co-PI, DataONE
https://dataone.org

On Mon, Dec 10, 2018 at 10:21 AM Cope, Jez <[log in to unmask]> wrote:

> Great, thanks Tom. I’ll take a look at this.
>
>
>
> Cheers,
>
> Jez
>
>
>
> *--*
>
> *Jez Cope* (he/him) <https://pronoun.is/he> • *Data Services Lead,
> British Library • 01937 54 6241 • [log in to unmask] <[log in to unmask]> *
>
>
>
> *From:* Research Data Management discussion list <
> [log in to unmask]> *On Behalf Of *Tom Griffin - UKRI STFC
> *Sent:* 08 December 2018 17:27
> *To:* [log in to unmask]
> *Subject:* Re: Publishing very large datasets
>
>
>
> There is a free Globus training course at Rutherford Appleton Laboratory
> on 10th Jan
>
> https://www.scd.stfc.ac.uk/Pages/The-Globus-World-Tour.aspx
>
>
>
>
> Do you need to transfer and share huge amounts of research data? If you
> do, you might be interested in this training course.
>
>
>
> The Scientific Computing Department (SCD), in collaboration with Globus
> Online, is hosting and managing a one-day training course on the Globus
> research data management service.
>
>  Globus was developed by the University of Chicago and is used by hundreds
> of thousands of researchers at institutions worldwide. In this training
> course, which is free of charge, the Globus team will present the inner
> workings of Globus and the means for using Globus as part of your research.
>
> *Who is it for?*
>
> This course will be useful if you are anticipating large amount of data
> being moved between data sources and your machine learning frameworks, and
> is targeted at:
>
>    - System administrators who are planning to deploy or use Globus at
>    their institution;
>    - Researchers building applications and frameworks ;
>    - Anyone who is interested in learning more about the service for
>    research data management.
>
> *What will you get out of it?*
>
>    - Learn how the Globus platform simplifies development of applications
>    for researchers;
>    - experiment with new Globus services and APIs;
>    - exchange ideas with peers on ways to apply Globus technologies;
>    - expand your knowledge of Globus administration features.
>
> ​*View the program <https://www.globusworld.org/tour/program?c=16>*
>
> The training is being held at the STFC Rutherford Appleton Laboratory,
> Didcot, Oxfordshire, on *10th January 2019* but places are limited so *REGISTER
> NOW* <https://www.globusworld.org/tour/register> to secure your place.
>
>
>
> Best Regards,
>
> Tom
>
>
>
>
>
> Tom Griffin
>
> Director, Scientific Computing Department
>
> Science and Technology Facilities Council
>
>
>
> [log in to unmask]
>
> +44 (0)1235 445305
>
>
>
>
>
>
>
> *From:* Research Data Management discussion list <
> [log in to unmask]> *On Behalf Of *Cope, Jez
> *Sent:* 07 December 2018 09:52
> *To:* [log in to unmask]
> *Subject:* Publishing very large datasets
>
>
>
> Hi folks, we’re currently looking at ways of improving the way we deliver
> the largest datasets on https://data.bl.uk/ to users. The largest today
> are several hundred GB, and it won’t be long before we’re into the TB.
> Large downloads can be a pain for users because they take a long time and
> can easily be interrupted. They also potentially present a significant cost
> to us as a data provider because outbound bandwidth costs can be high for
> data stored in the cloud. I know this is something that many people in the
> community will already have grappled with so I’m hoping there will be some
> experience to share.
>
>
>
> Possibilities we’ve discussed so far include:
>
> ·         Just let people download over HTTP but advise use of a download
> manager to handle interruptions to the connection
>
> ·         Publish via BitTorrent, which  has the advantage that if a
> large number of people are downloading the same thing at once (e.g. during
> a workshop) our outgoing bandwidth use could be significantly less than
> filesize × number of people
>
> ·         Allow people to request a copy on disk via courier, probably
> charged to cover costs
>
> ·         Split datasets into smaller chunks to make it easier to get
> just the bit you need (but makes it more effort if you do want the whole
> lot)
>
> ·         Allow users to move their compute to the data, either in the
> cloud or by renting out space in our machine room (this is essentially what
> AWS have done for some big public datasets
> <https://aws.amazon.com/opendata/public-datasets/>)
>
> ·         Provide a dedicated API and/or UI to allow users to browse the
> collection and select a custom subset to download
>
>
>
> The last two would probably be my preference in the long run, but require
> the most resource to set up and maintain.
>
>
>
> There was also some discussion a few years ago of using GridFTP but I
> don’t know where those went.
>
>
>
> Any advice would be most welcome and I’d be happy to summarise responses
> for the community. Feel free to reply to me directly, and I’ll withhold or
> anonymise your advice from the summary on request.
>
>
>
> Many thanks,
>
> Jez
>
>
> ------------------------------
>
> [image: Description: Description: Description:
> cid:image001.gif@01CF1D12.BB7DE2C0]
>
>
>
> *Jez Cope **(he/him <https://pronoun.is/he>)*
>
> *Data Services Lead*
>
> *Research Services*
>
>
> *The British Library Building 6a*
>
>
>
>
>
>
> *Boston Spa Wetherby West Yorkshire LS23 7BQ **www.bl.uk
> <http://www.bl.uk/>*
>
>
>
>
> *01937 546241 [log in to unmask] <[log in to unmask]> *
> ------------------------------
>
>
>
>
>
>
> ******************************************************************************************************************
>
> Experience the British Library online at www.bl.uk
>
> The British Library’s latest Annual Report and Accounts :
> www.bl.uk/aboutus/annrep/index.html
>
> Help the British Library conserve the world's knowledge. Adopt a Book.
> www.bl.uk/adoptabook
>
> The Library's St Pancras site is WiFi - enabled
>
>
> *****************************************************************************************************************
>
> The information contained in this e-mail is confidential and may be
> legally privileged. It is intended for the addressee(s) only. If you are
> not the intended recipient, please delete this e-mail and notify the
> [log in to unmask] : The contents of this e-mail must not be disclosed or
> copied without the sender's consent.
>
> The statements and opinions expressed in this message are those of the
> author and do not necessarily reflect those of the British Library. The
> British Library does not take any responsibility for the views of the
> author.
>
> *****************************************************************************************************************
>
>
> Think before you print
>
>
> ------------------------------
>
> To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1
>
>
> ------------------------------
>
> To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1
>
> ------------------------------
>
> To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1
>

########################################################################

To unsubscribe from the RESEARCH-DATAMAN list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=RESEARCH-DATAMAN&A=1