On Aug 7, 2013, at 2:35 PM, Ed Pozharski wrote:
> If I understand your proposal and reference to SQL correctly, you want some scripting language that sounds like simple English.
I didn't say anything about being English-like. English and other natural languages are ill-adapted to describing the well-defined operations one might perform on a data structure.
> Is the advantage over existing APIs here that one does not need to learn Python, C++, (or, heaven forbid, FORTRAN)?
Anyone can learn Python in an hour and a half. That's not an issue (except for whitespace nuts). If one wants to use Python to modify PDB structural data, I recommend starting with the tutorial I wrote for CCTBX: http://cctbxwiki.bravais.net/CCTBX_Wiki#Working_with_pdb_Files
The advantage of a language over an API is that an API requires coding overhead and must (by the definition of "API") be part of an "Application". SQL has no such requirement and neither would an ideal language for *selecting* and *modifying* macromolecular structural data. In SQL, one can make selections and modifications without importing libraries, defining a main function, declaring variables, etc. Low overhead is probably the reason so many crystallographers (myself not included) are fluent in the likes of awk.
> I.e. programs would look like this
>
> ---
> GRAB protein FROM FILE "best_model_ever.cif";
> SELECT CHAIN A FROM protein AS chA;
> SET chA BFACTORS TO 30.0;
> GRAB data FROM FILE "best_data_ever.cif";
> BIND protein TO data;
> REFINE protein USING BUSTER WITH TLS+ANISO;
> DROP protein INTO FILE "better_model_yet.cif";
> ---
>
> Not necessarily a bad idea but now through the fog of time I remember something oddly reminiscent... ah, CNS! (for those googling for it it's not the "central nervous system" :).
Although a little too much like natural language, it is not a bad idea. But, where is the link describing the layer of CNS that looks like that? In my X-Plor 3.1 manual (Yale University Press, 1987) I see nothing remotely like what you describe. CNS, according to the most recent tutorial for 1.3, looks like this:
topology
evaluate ($counter=1)
evaluate ($done=false)
while ( $done = false ) loop read
if ( &exist_topology_infile_$counter = true ) then
if ( &BLANK%topology_infile_$counter = false ) then
@@&topology_infile_$counter
end if
else
evaluate ($done=true)
end if
evaluate ($counter=$counter+1)
end loop read
end
This example makes a point about the problems of APIs. Namely, they require loops and tests, and lack a true selection mechanism, except perhaps for the scripting layer of CNS. But even with CNS, once you have a selection, you must loop over it to modify the data.
Although it is likely the "best" library for working with structural data, CCTBX requires a loop just to change a specific chain ID (to the best of my knowledge):
pdb_inp = pdb.input(file_name="best-model.pdb")
hierarchy = pdb_inp.construct_hierarchy()
for model in hierarchy.models():
for chain in model.chains():
if chain.id == "A":
chain.id = "B"
I don't intend to pick on CCTBX specifically (because the CCTBX developers have specific needs to which they program), but loop/test mechanisms are awkward for selecting and modifying structural data, and get much more awkward as selections get more complex (e.g. selecting the C-alpha of every alanine of chain A, etc.).
James
|