All that needs to happen is that the community agree on
1. What is the finite set of essential/useful attributes of
macromolecular structural data.
2. What is the syntax of (a) accessing and (b) modifying those
attributes.
3. What is the syntax of selecting subsets of structural data
based on those attributes.
The resulting syntax (i.e. language) itself should be terse,
easy to learn, easy to use, and preferably easy to implement.
Ah, but the nice thing about mmCIF is that it isn't truly
"finite" - the PDB may limit what tags are actually included
in the distributed files, but there is nothing preventing
other developers from including their own tags, and there is a
community process for extending the officially defined tags.
Item (2) is very well-established, unlike the current chaos of
REMARK records. I think (3) will be left to the various
libraries to deal with.
-Nat