On Wednesday, September 12, 2012 09:52:09 am Jacob Keller wrote:
> >
> > For the specific purpose you list -
> > input from tab-delimited data
> > output to simple statisitical summaries and (I assume) plots
> > - it sounds like gnuplot could do the job nicely.
> >
>
> I wasn't aware that gnuplot can do calculations--can it? I was probably
> going to use it somewhere as a plotting option.
Here's a simple-minded example using a dump of the current contents
of the PDB from www.pdb.org as a comma-separated file with ~65000 entries.
The input file was previously filtered to contain only X-ray structures
between 1 and 4 Angstroms resolution.
gnuplot> !head -3 PDB.csv
PDB ID,R Observed,R All,R Work,R Free,Refinement Resolution
"100D","0.145","","0.145","","1.90"
"101D","0.163","","","0.252","2.25"
gnuplot> set datafile separater ","
gnuplot> set datafile nofpe_trap # trap handling greatly slows large data sets
gnuplot> stats 'PDB.csv' using "R Observed" prefix "Robs"
* FILE:
Records: 63029
Out of range: 0
Invalid: 0
Blank: 2
Data Blocks: 2
* COLUMN:
Mean: 0.1982
Std Dev: 0.0334
Sum: 12494.6900
Sum Sq.: 2547.3068
Minimum: 0.0450 [24518]
Maximum: 0.9700 [45024]
Quartile: 0.1770
Median: 0.1970
Quartile: 0.2180
gnuplot> print Robs_mean
0.198237160672072
gnuplot> #calculate correlation of Robs with Resolution
gnuplot> stats 'PDB.cvs' using "R Observed":"Refinement Resolution" nooutput
gnuplot> print STATS_correlation
0.595763711910418
I've attached graphical output of the same data following some sorting,
filtered, binning, etc, with output to a PDF file.
You can do all this in R also. R has a larger collection of statistics options,
but is not as good at dealing with really large data sets. IMHO gnuplot has more
flexible options for graphical output.
> > Otherwise I'd recommend perl, and dis-recommend python.
>
>
> Why are you dis-ing python? Seems everybody loves it...
I'm sure you can google for many "reasons I hate Python" lists.
Mine would start
1) sensitive to white space == fail
2) dynamic typing makes it nearly impossible to verify program correctness,
and very hard to debug problems that arise from unexpected input or
a mismatch between caller and callee.
3) the language developers don't care about backward compatibility;
it seems version 2.n+1 always breaks code written for version 2.n,
and let's not even talk about version 3
4) sloooow unless you use it simply as a wrapper for C++,
in which case why not just use C++ or C to begin with?
5) not thread-safe
you did ask...
Ethan
--
Ethan A Merritt
Biomolecular Structure Center, K-428 Health Sciences Bldg
University of Washington, Seattle 98195-7742
|