Interesting - R was started by 2 people, has a current core group of people
with write permission less than 20, another 50 or so who are thanked for
debugging, contributing snippets etc. This looks like it is actually a very
tightly controlled collaborative writing exercise.
Now, a statistician might disagree with me, but... would it be fair to say
that in writing a statistics application of the sort that R aspires to being
the specification of any particular feature would be pretty well determined.
Eg. Would there be great disagreement on what was entailed in doing, say,
factor analysis? My guess is that there shouldn't be, although there could
well be great room for innovation in a) performance (writing really fast
algorithms to work with very big datasets) and b) graphic user interfaces or
visualisations. My feeling is that the same should hold for the CFD example
posted earlier in this thread (but a fluid dynamicist might well shoot me
down on that as well :-)
Here perhaps is where the syntax field might be a bit different. Why does
anyone want to write software? Well:
a) because it is a good way of getting inside what is happening in the
calculations - ie. it is a personal learning exercise;
b) because of some problems of platform independence - in the old days of
syntax stuff being Mac only this was a major reason why people with PC's
developed their own software (a disappearing motivation perhaps);
c) because of allocation compatibility with other applications - linking to
GIS or CAD in an intimate way, rather than just file in/out;
d) because one wants to develop entirely new representations and measures,
or new variants on existing ones (relativisations etc.)
I think a, b, and c, hold for any domain. But in an emerging field like
syntax there is a lot of d) going on at the moment (also surely at the
leading edge of stats and CFD of course). Now in doing d type work one
doesn't really want to have to get to grips with all that tricky user
interface stuff, reading datasets in, in old formats, exporting them out...
etc. etc. drawing to screen, colour maps etc. etc.
This is where there is a dilemma. In writing an integrated piece of software
- axman, webmap, depthmap all come to mind - the intimate link between
calculation, visualisation and user interaction are crucial to their
usefulness as 'tools for thinking with' about the problem at hand (urban
systems, say). In making that relation seamless to the user (click on this
line here and it highlights in that scatter over there) it requires a whole
lot of constraint and eliminatinon of possibilities by the programmer. This
is where the skill and experience of the Sheeps and Alasdairs of the world
come in: in figuring out before hand what Bill or I or some other user will
ask them for in the future, what should be retained as flexibility, and what
they may not ask for and what might as well be eliminated to make the whole
thing manageable. This is the skill of research computing development. An
example: In Istanbul Bill showed rather beautiful 'mountainscapes' relating
metric depth locally and globally. These are a 'created phenomenon' made
possible by Depthmap, but un-thought of at the time the facility to create
them in Depthmap was written. The facility is generic, smooth and simple to
the user (virtually transparent actually), but involves a lot of decisions
on the part of the programmer to achieve that transparency. Those decisions
all involve closing off other possibilities.
Now a really good open source project would allow exactly those 'closed off'
possibilities to be reopened without requiring recoding of absolutely
everything. But - and this is the dilemma - separating out the closed off
possibilities and making them re-openable requires them to be recognised in
advance, and allowed for in the way the code is modularised. This in itself
adds another layer of complexity to the whole project of coding, and one
which (I think) may only be amenable to formal code management, systems
engineering, top down control, rather than the open market place of open
source - everyone 'doing their thang' - perhaps I am wrong though...
> Alan Penn wrote:
> > Taken all in all, I wonder whether the open source model works for this
> > of analytic software development - there must be examples from other
> > of science. Does anyone know of them?
> The R project is a GNU project: it is a language and environment for
> statistical computing and graphics.
> It adopts the model of an official distribution of a quality-controlled
> core, and a modular structure for add-on packages that anyone can
> Andrew Smith