John (Bibby) is right that there’s a bit of a misnomer here. What’s meant by ‘algorithms’ in the context is what Leo Breiman called ‘algorithmic modelling’ in his paper that I mentioned at the conference on Saturday. It’s available on open access (now – it’s pretty old, 2001) at https://projecteuclid.org/euclid.ss/1009213726 or indeed at sundry other places on the Web, and there’s an article about it by Simon Raper in the latest issue of Significance. As John says, the distinction is with probabilistic models, which are nevertheless fitted using an algorithm in the wider sense of that word. I think it’s certainly still worth reading the Breiman paper, which is one of those papers in Statistical Science that has discussion contributions, and a rejoinder from the original author. There’s a long discussion contribution from David Cox, who disagrees pretty fundamentally with Breiman, and gives many of the reasons why (some) statisticians hate the algorithmic approach, for some purposes at least. But Breiman does have many important arguments, and a lot of it hangs on why one is analysing the data in the first place. It’s also got to be borne in mind that there’s been a lot of work on interpreting the results of fitting data using non-probabilistic algorithms since Breiman’s time.

Actually it’s not really accurate to call them ‘non-probabilistic algorithms’ anyway, because some of them do have probabilistic (or at least pseudo-random) aspects. The distinction is more about whether you have a probability model for the data and are estimating its parameters or some other features of that model, and then interpreting the results in terms of the probabilistic aspects of the model. Regression, that John mentions, is an interesting case. If you’re talking about linear regression, then you would typically estimate the parameters using least squares, and that doesn’t involve a probabilistic model at all, you just minimise some deterministic function of the data (the sum of squares of the residuals) and that gives parameter estimates. It’s what you do with it after that, that determines in which bit of Breiman’s two cultures you sit. Often a statistician would assume independent normally distributed errors, so that’s a probabilistic model for the data, and then you make inferences based on that, with or without checking whether the model seems to come near the data. That would make you what Breiman calls a “data modeller”. But you don’t have to do that. You could estimate the parameters in the regression equation using least squares, using only part of the data (the training set), and them you could check out how good the predictions from the model with those estimates are in a separate test set, and if they’re good enough, you could conclude that the model is useful without the normal distributions attached. (Or you could do that in fancier ways using cross-validation or some such.) In Breiman’s sense that would make you an algorithmic modeller. Though Breiman clearly favours algorithmic modelling in most circumstances, he does point out that both approaches are valid at least some of the time.

Kevin

Kevin McConway

Emeritus Professor of Applied Statistics

The Open University

[log in to unmask]

From: email list for Radical Statistics <[log in to unmask]> On Behalf Of John Bibby
Sent: 02 March 2020 20:40
To: [log in to unmask]
Subject: Re: Do statisticians hate algorithms?

CAUTION: This mail comes from outside the University. Please consider this before opening attachments, clicking links, or acting on the content.

Isn't regression an algorithm? The word is misused. The statistical prejudice is against non probabilistic algorithms.

Cluster analysis such as the AID method featured at the conference is non probabilistic

The question of generalizability is pertinent. Using training and testing samples can help in this

John BIBBY

On Mon, 2 Mar 2020, 19:06 BYRNE, DAVE S., <[log in to unmask]> wrote:

Interesting sub-text at last week's excellent conference when people started talking about big data - it seems that statisticians don't like methods based on algorithms. Now cluster analysis which has been around for more than 40 years is algorithm based, albeit that the mathematical basis of it is accessible in a way that the coding schemes of learning algorithms are not, and has in my view been far too little used in exploratory data analysis. Is that also down to a prejudice against algorithms?

David Byrne Ph.D., FAcSS

see my recent book:

Class after Industry: a complex realist approach

https://www.palgrave.com/gb/book/9783030026431

****************************************************** Please note that if you press the 'Reply' button your message will go only to the sender of this message. If you want to reply to the whole list, use your mailer's 'Reply-to-All' button to send your message automatically to [log in to unmask]. Disclaimer: The messages sent to this list are the views of the sender and cannot be assumed to be representative of the range of views held by subscribers to the Radical Statistics Group. To find out more about Radical Statistics and its aims and activities and read current and past issues of our newsletter you are invited to visit our web site www.radstats.org.uk. *******************************************************

****************************************************** Please note that if you press the 'Reply' button your message will go only to the sender of this message. If you want to reply to the whole list, use your mailer's 'Reply-to-All' button to send your message automatically to [log in to unmask]. Disclaimer: The messages sent to this list are the views of the sender and cannot be assumed to be representative of the range of views held by subscribers to the Radical Statistics Group. To find out more about Radical Statistics and its aims and activities and read current and past issues of our newsletter you are invited to visit our web site www.radstats.org.uk. *******************************************************