Print

Print


One thing to keep in mind is that there's usually a trade-off between 
setup (writing and testing) and execution time.  For one-off data 
processing, I'd focus on implementation speed rather than execution 
speed (in other words, FORTRAN might not be ideal unless you're already 
fluent with it).

That said, I'd take a look at python, octave or R.  Python's relatively 
easy to learn, and more flexible than octave/R; but it doesn't have the 
built-in statistic functions that octave and R do.

One other tip which you've probably already though of - Depending on 
your runtimes (I don't think 100s MB of data is usually considered an 
enormous amount, but it'll depend on what you're doing) it may be worth 
getting things working on a small subset of the data first.

Pete

Jacob Keller wrote:
> Dear List,
> 
> since this probably comes up a lot in manipulation of pdb/reflection files
> and so on, I was curious what people thought would be the best language for
> the following: I have some huge (100s MB) tables of tab-delimited data on
> which I would like to do some math (averaging, sigmas, simple arithmetic,
> etc) as well as some sorting and rejecting. It can be done in Excel, but
> this is exceedingly slow even in 64-bit, so I am looking to do it through
> some scripting. Just as an example, a "sort" which takes >10 min in Excel
> takes ~10 sec max with the unix command sort (seems crazy, no?). Any
> suggestions?
> 
> Thanks, and sorry for being off-topic,
> 
> Jacob
>