Thanks Frances for the explanation. Indeed mmCIF format is a lot more
complicated and grep can be a dangerous tool to use with them. But for
most cases it can do the job and thus it maintains some sort of
backwards compatibility. I can't agree more that using specialised tools
(for either PDB files or mmCIF files) that deal with the formats
properly is the best solution (see for instance
http://mmcif.wwpdb.org/docs/software-resources.html for some of the
mmCIF readers).
In any case I find it most surprising that this topic came yet again to
this BB, when it was thoroughly discussed last year in this thread:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1308&L=ccp4bb&D=0&P=26939
I'm not sure why this kind of urban legends on the evilness of the mmCIF
format keep coming back to the list...
As explained there and elsewhere endless times, the PDB format is
inadequate to represent the complexity of macromolecules and has been
needing a replacement for a long time. The decision to move on to mmCIF
has been made and in my opinion the sooner we move forward the better.
Cheers
Jose
On 05.10.2014 15:52, Frances C. Bernstein wrote:
> mmCIF is a very general format with tag-value pairs, and loops
> so that tags do not need to be repeated endlessly. It was
> designed so that there is the flexibility of defining new terms
> easily and presenting the data in any order and with any kind
> of spacing.
>
> I understand that there are 100000+ files in cyberspace prepared
> by the PDB and that they all have the 'same' format.
>
> It is tempting to write software that treats these files as fixed
> format and hope that all software packages that generate coordinate
> files will use the same fixed format. But that loses the generality
> and flexibility of mmCIF, and software written that way will fail
> when some field requires more characters or a new field is added.
> There are software tools to allow one to read and extract data from
> any mmCIF file; using these is more complicated than using grep but
> using these assures that one's software will not fail when it encounters
> a date file that is not exactly what the PDB is currently producing.
>
> Note that mmCIf was defined when the limitations of the fixed-format
> PDB format became apparent with large structures. Let's not repeat
> the mistakes of the past.
>
> Frances
>
> =====================================================
> **** Bernstein + Sons
> * * Information Systems Consultants
> **** 5 Brewster Lane, Bellport, NY 11713-2803
> * * ***
> **** * Frances C. Bernstein
> * *** [log in to unmask]
> *** *
> * *** 1-631-286-1339 FAX: 1-631-286-1999
> =====================================================
>
> On Sun, 5 Oct 2014, Tim Gruene wrote:
>
>> Hi Jose,
>>
>> I see. In the example on page
>> http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html,
>>
>> it is in field 12, though, and I would have thought that mmCIF allows
>> line breaks.
>>
>> But as long as all developers writing PDBx/mmCIF with their programs
>> follow the PDB constraints (``styling plans'' in their FAQ), everything
>> is fine.
>>
>> Cheers,
>> Tim
>>
>> On 10/05/2014 01:13 PM, Jose Manuel Duarte wrote:
>>> Well, if you simply replace that "beauty" by this one:
>>>
>>> grep "^ATOM" filename.cif | awk '{print $15}' | awk '{s+=$1;} END
>>> {print
>>> s/NR;}'
>>>
>>> You will achieve exactly the same result (the b-factors are in the 15th
>>> field of the _atom_site section in deposited mmCIF files). I'm not an
>>> expert in awk, but I'm sure that can be made even shorter ;)
>>>
>>> It is important to keep in mind that mmCIF files are designed to be
>>> usable with grep-like tools, so I don't see any problems in moving
>>> forward to that format. Whilst I see a lot of problems in staying with
>>> the classic PDB format.
>>>
>>> Cheers
>>>
>>> Jose
>>>
>>>
>>>
>>> On 05.10.2014 11:18, Tim Gruene wrote:
>>>> Hi all,
>>>>
>>>> reading this beauty I would like to ask a question to the respective
>>>> developers:
>>>> Will the PDB format remain the working format for the users and only
>>>> upon deposition will it be converted to PDBml for archiving
>>>> purposes, or
>>>> are the refinement programs (et al.) going to abandon PDB, too?
>>>>
>>>> Best,
>>>> Tim
>>>>
>>>> On 10/04/2014 10:32 PM, Ed Pozharski wrote:
>>>>> grep "^ATOM " filename.pdb | cut -c 61-66 | awk '{s+=$1;} END {print
>>>>> s/NR;}'
>>>>>
>>>>> "Nobody likes a show off, Private"
>>>>> Skipper
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>> Sent on a Sprint Samsung Galaxy S? III
>>>>>
>>>>> <div>-------- Original message --------</div><div>From: Chen Zhao
>>>>> <[log in to unmask]> </div><div>Date:10/04/2014 4:03 PM (GMT-05:00)
>>>>> </div><div>To: PHENIX user mailing list <[log in to unmask]>
>>>>> </div><div>Subject: [phenixbb] Calculate average B-factor?
>>>>> </div><div>
>>>>> </div>Dear all,
>>>>>
>>>>> I am just wondering whether there is a command line tool in phenix
>>>>> that calculates the average B-factor of a PDB file? Can it deal with
>>>>> the ANISOU records (from TLS refinement or not) properly? I looked
>>>>> into previous posts but the --show-adp-statistics option in
>>>>> phenix.pdbtools seems to be no longer available in the version
>>>>> (1.9-1678) I installed.
>>>>>
>>>>> Thank you so much,
>>>>> Chen
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> phenixbb mailing list
>>>>> [log in to unmask]
>>>>> http://phenix-online.org/mailman/listinfo/phenixbb
>>>>>
>>>
>>
>> --
>> Dr Tim Gruene
>> Institut fuer anorganische Chemie
>> Tammannstr. 4
>> D-37077 Goettingen
>>
>> GPG Key ID = A46BEE1A
>>
>>
|