After I sent that I did find a way to redirect the console output to a file - not pretty but it works:
import sys
import os
import subprocess
so = open("testing.log", 'w', 0)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
os.dup2(so.fileno(), sys.stdout.fileno())
And then I can grep the matching chain ID and pipe it out to a file using a subprocess call: subprocess.call('grep -Em1 "to chain" testing.log | cut -d"'" "'" -f6 > grepped.log', shell=True)
Agree re tail (and the subprocess.call() above!) being ugly - I need to read some python docs, rather than making frankensteinian shell/python combos. When all you have is a hammer...
No rush on fixes - it’s new years eve tomorrow! :)
Best,
Oliver.
>
> On Dec 30, 2013, at 8:32 PM, Paul Emsley <[log in to unmask]> wrote:
>
>> On 30/12/13 22:45, Oliver Clarke wrote:
>>> Great! Next scripting question that I’m puzzling over - how to take a set of sequences (present in a single fasta file), in arbitrary order, match and align them to their corresponding homologous subunits in a PDB file, and use align_and_mutate to correct the sequence of each subunit (and delete the gaps).
>>>
>>> Right now, I’m stuck on the first step - how to take a single sequence from a fasta file and output the matching chain ID.
>>>
>>> I can use:
>>>
>>> import subprocess
>>> align_to_closest_chain("%s"%(subprocess.check_output(["tail","+2","test.fasta"])).upper(), 0.95)
>>
>> FWIW, I think that using "tail" is ugly and you should read the contents of the fasta file into a list of strings.
>>
>> def file_name_to_string_list(file_name):
>> if os.path.isfile(file_name):
>> f = open(file_name, 'r')
>> return f.readlines()
>> else:
>> return []
>>
>> Maybe there is a python built-in for this already? :)
>>
>> Then use (say):
>>
>> fasta_seqs = file_name_to_string_list('all.fasta')
>> align_to_closest_chain(fasta_seqs[2].upper(), 0.95)
>>
>>>
>>> to check for a matching chain, but the output of align_to_closest_chain() is either 0 or 1.
>>>
>>> The matching chain id is present in the console output, but I haven’t yet figured out how to parse that and do anything useful with it - the console output does not seem be easily accessible via either stdout or stderr (was hoping I could redirect the console output to a file and grep it for the chain id - ugly but I don’t know a better way).
>>
>> OK, that's a good point.
>>
>> So the policy is that interesting variables/values should be returned by functions. In this case it would be good to know the selected chain-id, but you don't get that - so that's a problem. I'll fix that now. In future this function will return False or the molecule number and chain-id.
>>
>>>
>>> Also it is not immediately clear to me how the match_fraction parameter relates to sequence identity or alignment score for a given match (The alignment also seems to be case sensitive, which doesn’t seem right, hence the .upper() in the above).
>>
>> Yes, I think the input sequence for coot is case sensitive (and it should not be).
>>
>> IIRC, the alignment score has be at least as close as (as good as) match_fraction, otherwise it is not performed.
>>
>>>
>>> What am i doing wrong - thoughts/suggestions? Would it be better to do the sequence handling steps externally?
>>
>> I'll fix it and update later.
>>
>> Paul.
>>
>>
>
|