Dear Herb,
Thank you for your message. Before we respond to your suggestion
about "opening" our code, we would like to double-check that there is
no misunderstanding about what it does. We didn't think that what we
wrote in two different e-mails could be misunderstood, but of course
it might still be :-) .
In our message of Thursday morning, addressed to Graeme, it was
clearly stated that
"[...] even if the primary use of our converter is also to provide
intermediate files (hidden from the user) for processing with
autoPROC/XDS, at least these files are intended to (and _should_) be
fully standardised and allow processing with any other package that
supports mini-cbf/CBF files. Regarding autoPROC itself, we are not
proposing that users convert HDF5 files into mini-cbf/CBF files
before running it - the documentation is very clear about that:
users should give autoPROC the HDF5 data directly."
and similarly in our Friday afternoon message, addressed to Herman,
that
"autoPROC uses HDF5 files directly as input and doesn't leave
miniCBF files around that might get archived and incur storage
costs. That point might not have been clear and [could] therefore
[be] causing confusion."
There should therefore not be any ambiguity about the fact that
"autoPROC deals directly with the HDF5 file" in the sense that the
user doesn't have to pre-convert its contents into mini-CBF files that
might be thought to give rise to an extra archiving burden: autoPROC
provides this conversion efficiently on-the-fly for XDS, while always
extracting all required metadata, needed by itself or by XDS, directly
from the original HDF5 file. No need arises to archive those miniCBFs
in order to be able to repeat the processing, later or elsewhere, from
the HDF5 file. At no time did we claim that everything within autoPROC
works on the HDF5 input "natively" - it couldn't, as it uses XDS as
the main processing engine - and yet we feel that this is perhaps what
you might be assuming we have claimed when you formulate hopes that we
might help in getting rid of the computational overhead of these
conversions.
Because of the above (metadata always read from original HDF data
and miniCBF files only present temporarily and hidden from users), our
converter in principle doesn't *need* to write a fully populated
miniCBF header. We could also have gone for the bare-bones approach as
in e.g. the Dectris H5ToXds tool, and just provided binaries for our
other, supported platforms. However, we thought that if we already
wrote a miniCBF file, we should do it properly right from the start -
following available Eiger/HDF5/NXmx and CBF/miniCBF specifications as
much as possible.
Incidentally, users of autoPROC were very appreciative that our
converter does write correct miniCBF images since that allowed them to
use those files directly in their existing workflows and (internal)
deposition systems - thus avoiding a breakdown of procedure with the
appearance of Eiger/HDF5 datasets. Achieving this "transferability of
(re)processing" for Eiger (or any) datasets outside the synchrotron
where they were collected seemed to us a top-priority imperative from
the start, and is especially so now, as serious efforts are being made
to archive raw data that could play in testing improvements of data
processing programs the same role that archived merged data played in
testing improvements of refinement programs.
Apologies if we are the ones who are misunderstanding the
assumptions behind your enquiry and are misinterpreting them as a
possible misunderstanding on your part :-) - we would be grateful if
you could confirm what you have in mind.
With best wishes,
Gerard, Clemens, Claus & Peter
--
On Fri, Mar 11, 2016 at 01:06:14PM -0500, Herbert J. Bernstein wrote:
> Dear Colleagues,
>
> I am very pleased to hear that "autoPROC uses HDF5 files directly as
> input". Might it be possible to "open" that portion of your code to the
> developer community as a useful worked example for others?
>
> For Eiger 16M images we see a very large fraction of processing time
> going into conversions to CBFs now. The more people who adapt their code to
> work directly with the HDF5 format version of these files, the less use of
> computer resources in conversions we will face and the more resources will
> then be available for actual processing.
>
> Regards,
> Herbert
>
> On Fri, Mar 11, 2016 at 12:20 PM, Clemens Vonrhein <
> [log in to unmask]> wrote:
>
> > Dear Herman,
> >
> > thank you very much for the supportive message, which describes very
> > well the environment we found ourselves in with regard to Eiger/HDF5
> > data. This is why we implemented a method into autoPROC so that our
> > users can process data coming off these exciting, new detectors. Far
> > from being "ad hoc", we tried to make the method as generally
> > applicable across beamlines as possible, and also to populate the
> > image headers as completely and accurately as we could (as a point of
> > reference: the very first solution had to be done in just over 1 week
> > between access to test data and users wanting to collect real data).
> >
> > To avoid any confusion that might arise from the very valid arguments
> > Herb is making about required lifetime of datasets for the need of
> > (re)processing at a later stage: autoPROC uses HDF5 files directly as
> > input and doesn't leave miniCBF files around that might get archived
> > and incur storage costs. That point might not have been clear and
> > therefore causing confusion. However, we are trying to open up the
> > capabilities and features of our software to as many users as possible
> > - even if we are not providing our software in an open-source model -
> > to avoid the impression of autoPROC as a "black box". This was the
> > reason for sending our initial reply to this thread.
> >
> > From a (partially) outside viewpoint it seems, that there are several
> > reasons for having e.g. those miniCBF variants written at different
> > beamlines - as described and rightly lamented by Herb. There are of
> > course very practical restrictions and pressures that everyone is
> > reacting to. But there exists also the unique luxury at a beamline to
> > ignore image headers and metadata completely, not least because XDS
> > provides the great feature of being image header agnostic and
> > completely general. The beamline control software responsible for
> > populating the image header or metadata (through some detector API or
> > by generating them already beforehand) can at the same time write
> > e.g. a XDS.INP file - or any other input/command for some other data
> > processing software. This means there is no necessity to have
> > complete, self-consistent and correct image headers or metadata in
> > place in order to process the data at that point.
> >
> > This is nothing new: we've seen that for a long time e.g. with
> > beamlines producing completely wrong image headers when it was assumed
> > that processing would be done with Denzo and a def.site file. However,
> > this shifts focus into a direction whereby data can only (reliably) be
> > processed at the time of data collection and within a particular
> > software environment at the synchrotron. Of course, it is a very
> > important and valuable feature to process data while the crystal or
> > other samples are still available and decisions can be made for this
> > or the next data collection strategies. It is where a great strength
> > of beamline-specific solutions lies.
> >
> > Some approaches to processing just worry about producing an input file
> > to the processing program that has all the necessary information
> > harvested from whatever local way metadata are stored. If the images
> > are now archived in that way they can be reprocessed, but only at that
> > synchrotron or in the same way - making it nearly impossible to do the
> > same in different and new ways, for instance with another program.
> >
> > It is crucial to achieve true transferability of reprocessing by
> > providing complete and correct metadata (not something that is just a
> > derived product of these metadata). In that respect, autoPROC is a
> > useful external tool to provide extensive checking of metadata by
> > looking directly at them (e.g. in HDF5) and not at a derived subset
> > (e.g. an XDS.INP file archived with the data).
> >
> > We don't think there is any synchrotron-independent developer of
> > processing software that is happy about having to support all the
> > variants regarding image headers and metadata. We would be happy not
> > having to provide a list of beamline specifics [1] or workarounds
> > regarding buggy image headers or incomplete HDF5 metadata [2] in order
> > to provide users with a workable solution for their project.
> >
> > It is also important to recognize that "the user" means different
> > things to different people. For the detector manufacturer it is
> > typically the synchrotron and beamline staff, while for the beamline
> > scientist it is the actual people coming to collect data on their
> > samples. What is left out of that picture is the processing software
> > and its developers - whereas in a very real sense it is only through
> > those external packages and developers that a synchrotron user truly
> > interacts with the data collected at the beamline, as shown by the
> > fact that it is most often the data processing package that gets the
> > blame if something doesn't work, when the cause of such hiccups might
> > actually lie further upstream.
> >
> > So maybe adjusting your last sentence to
> >
> > When detector producers, beamlines and processing packages speak one
> > common language, the users will very quickly follow.
> >
> > It has been a very useful discussion and we think that a number of
> > important matters have been brought up.
> >
> > Cheers
> >
> > Clemens, Gerard, Andrew, Claus & Peter
> >
> > [1] http://www.globalphasing.com/autoproc/wiki/index.cgi?BeamlineSettings
> > [2]
> > http://www.globalphasing.com/autoproc/wiki/index.cgi?DataProcessingHdf5
> >
> > On Fri, Mar 11, 2016 at 01:37:02PM +0000, Herman Schreuder wrote:
> > > I fully agree. For me the Madness lies in the development of a new
> > > detector and image format without consulting beforehand the relevant
> > > software developers and in each beamline apparently implementing
> > > their own, mutually incompatible local format. All major data
> > > processing programs have a way to unambiguously describe detector
> > > and goniometer geometry and I see no reason why such information
> > > cannot written into the headers.
> > >
> > > Once users have images, they want to have them processed as quickly
> > > as possible and when a chaos with new image formats has been
> > > created, one cannot blame Gerard and others for solving this problem
> > > in a maybe ad hoc matter. When the beamlines and detector producers
> > > speak one common language, the users will very quickly follow.
> >
|