Jason, that was pretty fast! You delivered the script before someone
else could even suggest PyMOL ;)
If you use
from chempy import cpv
cpv.distance(aa.coord, bb.coord)
instead of
cmd.get_distance( "index %s" % aa.index, "index %s" % bb.index)
it will be significantly faster.
Cheers,
Thomas
On Mar 9, 2011, at 7:43 AM, Jason Vertrees wrote:
> Hi Cale,
>
>> For any given structure in the PDB, I want to identify all the
>> Histidine ND1
>> atoms. I then want to consider these atoms in pairs, measure the
>> distance
>> in Angstroms between the ND1 atoms in each pair, and compile these
>> distances
>> (along with residue numbers of the pair) in a table. I then want
>> to repeat
>> this procedure for each unique structure in the PDB and generate a
>> table
>> containing all occurrences of HisND1 pairs with their corresponding
>> separation distance. Amongst other things, I want e.g. generate a
>> histogram
>> from this table and determine e.g. the shortest HisND1 pair distance
>> observed and the structure in which this happens. Does anyone have
>> any
>> suggestions for any tools I might be able to use to perform this
>> search?
>
> This looked like a fun little PyMOL task. So, I wrote a script to do
> it. This will iterate over all the PDBs in a given directory and
> calculate the non-redundant distances of all HIS ND1s. A table is
> written to pre_hist.csv. Use R or something similar to create the
> histogram and do the numerical analysis.
>
> To use this, first, you'll need a prepared copy of the PDB, suitable
> for this task, downloaded locally in pdb_path. You will need to set
> "pdb_path" in the script below to wherever your PDBs are. (The other
> kind folks on this list already pointed you to resources for mirroring
> the PDB, if you haven't done it already.) To run the script below
> just save it to disk as "his.py" and launch it with:
>
> # from Linux
> pymol -cq his.py
>
> # or, from Mac
> /Applications/PyMOLX11Hybrid.app/Contents/MacOS/MacPyMOL -cq his.py
>
> If you're a Linux/Mac user, when the script finishes, just type:
>
> sort -n -k8 pre_hist.csv | head -2
>
> into your BASH shell to get the shortest distance. For my example
> PDB it was:
>
> $ sort -n -k8 pre_hist.csv | head -2
> PDB CHAIN RESI ATOM-A CHAIN RESI ATOM-B DISTANCE
> 5rla B 187 3729 C 312 7075 2.838208
>
> To view the shortest distance use the ATOM-A and ATOM-B fields from
> the output file. From my example above I would need to fetch the
> protein and create a distance from "index 3729" to "index 7075".
> Here's how:
>
> # fetch the protein
> fetch 5rla, async=0
>
> # show the distance
> distance index 3729, index 7075
>
> # zoom on the new distance
> zoom dist01
>
>
> Here's the Python script that does the work:
>
> import glob, os, pymol, sys
> from pymol import cmd
>
> the_pdb="/Users/vertrees/small_pdb"
> files = glob.glob(the_pdb+os.sep+"*.pdb")
>
> if not len(files):
> print "Please set 'the_pdb' variable to a valid path containing
> PDB files."
> sys.exit(1)
> else:
> print "Processing %d files." % len(files)
>
> s, outFile = "resn HIS and name ND1", "pre_hist.csv"
>
> f = open(outFile, 'wb')
> # write the header
> f.write("PDB\tCHAIN\tRESI\tATOM-A\tCHAIN\tRESI\tATOM-B\tDISTANCE\n")
> # for each file in the mirror
> for x in files:
> cmd.load(x,finish=1)
> n = cmd.get_names()[0]
> m = cmd.get_model(s).atom
> # pairwise for each atom
> for aa in m:
> for bb in m:
> # avoid duplicates
> if aa.index==bb.index: continue
> f.write( "%s\t%s\t%s\t%s\t%s\t%s\t%d\t%f\n" %
> (n, aa.chain, aa.resi, aa.index,
> bb.chain, bb.resi, bb.index,
> cmd.get_distance( "index %s" % aa.index,
> "index %s" % bb.index)))
> cmd.delete(n)
> f.close()
>
> print "Processed %d files. Please see %s for results." %
> (len(files), outFile)
>
> Cheers,
>
> -- Jason
>
> --
> Jason Vertrees, PhD
> PyMOL Product Manager
> Schrodinger, LLC
>
> (e) [log in to unmask]
> (o) +1 (603) 374-7120
--
Thomas Holder
MPI for Developmental Biology
Spemannstr. 35
D-72076 Tübingen
|