On 05/08/13 09:03, Tim Gruene wrote:
> having read Gerard Kleywegt's latest announcement on the wwPDB Workshop
> (1st August) made me wonder whether it is planned to introduce mmCIF as
> working format to users in addition to using it at e.g. the PDB, because
> I think that would make life unnecessarily complicated.
There’s nothing to stop you using your /own/ working format—it’s easy to
extract a simpler file from the full archive file—but the archive file
obviously has to contain the full set of metadata, and to be useful,
that metadata has to be easily parsable.
> The example mmCIF file for GroEL is about 7.5 times bigger than its PDB
> file.
> I know that disk space is 'cheap' nowadays, but that does not make it fast.
>
> And personally I find mmCIF very awkward to work with, since it is not
> line-oriented. 'grep', 'awk', 'perl' etc. do not work well on XML-like
> files.
> Instead of using mmCIF, one could, e.g. introduce a free format PDB
> format, with space holders for non-assigned entities, and maybe a line
> continuation character.
Are you sure you’re talking about the CIF‐based mmCIF format here, not
the XML‐based PDBx format? mmCIF shouldn’t be much bigger than PDB.
> If mmCIF is not going to be the working format for MX (refinement)
> programs I would be happy for a reassurance, and otherwise I would
> appreciate some comments about the benefits of an XML file format over a
> line-oriented free format for the scientists that work with structural data.
> I my opinion, using XML (or mmCIF) for structural information is an
> attempt of programmers to make themselves more indespensable to
> scientists, rather than scientifically needed.
Even when searching the “simple” PDB format, you’re likely to encounter
problems with line endings. Imagine trying to find all files containing
PEG, your script must reliably recognise something like:
REMARK 280 CRYSTALLIZATION CONDITIONS: 1.0M LITHIUM SULPHATE, 100MM POLY
REMARK 280 ETHYLENE GLYCOL
—in fact this sort of thing is much /easier/ to do, given the proper
tools, in a format like XML.
With file formats, the devil is always in the details. If you set out to
create a “line‐oriented, free format” PDB replacement, and you carefully
ironed out all the potential ambiguities and awkward corner cases, I bet
you’d come up with something close to mmCIF.
--
Ian ◎
|