Hi Grant - In addition to the other good suggestions, there is also the Biopython project's PDB parser. http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc148 Cheers, Jared -- Jared Sampson Xiangpeng Kong Lab NYU Langone Medical Center Old Public Health Building, Room 610 341 East 25th Street New York, NY 10016 212-263-7898 http://kong.med.nyu.edu/ On Jun 6, 2013, at 12:37 AM, GRANT MILLS <[log in to unmask]<mailto:[log in to unmask]>> wrote: Dear CCP4BB, I'm trying to write a simple python script to retrieve and manipulate PDB data using the following code: #for line in open("PDBfile.pdb"): # if "ATOM" in line: # column=line.split() # c4=column[4] and then writing to a new document with: #with open("selection.pdb", "a") as myfile: # myfile.write(c4+"\n") Except for if the PDB contains columns which run together such as the occupancy and B-factor in the following: ATOM 608 SG CYS A 47 12.866 -28.741 -1.611 1.00201.10 S ATOM 609 OXT CYS A 47 14.622 -24.151 -1.842 1.00100.24 O My script seems to miscount the columns and read the two as one column, does anyone know how to avoid this? (PS, I've googled this like crazy but I either don't understand or the link is irrelevant) Any advice would help. Thanks for your time, Grant