This discussion was posted on our large LinkedIn group (100,000+ members) by our friend Gregory, pictured below. It has generated a tremendous volume of great comments by a number top leaders. Below are some reactions:
--
@carey - why should statisticians "be the leaders of the Big Data and data science movement" ? Except for a few statisticians like Breiman & Tibshirani, most statisticians missed the boat on Data Science and Big Data, and statistics does not deal with computational aspect which is critical for Big Data, nor with the business aspect which is critical for getting results.
--
For those who believe that big data and data science are just pure engineering or CS fields with ignorance or poor application of statistics, I invite out to read my book at http://www.datasciencecentral.com/profiles/blogs/my-data-science-book
You'll see that data science has its own core of statistics and statistical research. For instance, in my article "the curse of big data", I discuss the fact that in big data, you are bound to find spurious correlations when you compute billions or trillions of correlations. These spurious correlations overshadow real correlations that get undetected. I mention that instead of looking at correlations, you should compare correlograms. Correlograms, uniquely determine if two time series are similar, correlations do not. I also talk about normalizing for size. You don't need to be a statistician to identify these issues and bias, and correct them. A data scientist should know these things too, as well as other stuff such as experimental design, applied extreme value theory and Monte Carlo simulations, confidence intervals created without underlying statistical model (Analyticbridge's first theorem), identifying non-randomness, and much more.
--
Read entire discussion at http://bit.ly/197Jsfa
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|