When doing math or numerical analysis, the knowledge of the technique is far too often tied to the tool performing the calculation. Consider an engineer whose understanding of the Fast Fourier transformation is inseparably tied to the fft function in Matlab. Of course this hypothetical engineer understands what the results mean (more or less) but may not be able to duplicate his analysis if Matlab were taken away.

In most cases, it is likely that no deeper understanding will be required. But what happens if the computer makes a mistake? Or the program becomes unavailable? Both situations are entirely possible. Computer algorithms aren’t perfect and occasionally arrive at results make little sense; and hardware has been known to fail.

When the engineer understands how the computer arrived at the answer, however, he can recognize, understand, and ultimately correct those cases where the results are unexpected. This is an important reality check that can prevent costly disasters later down the line. Or, if the hardware is unavailable, he can use an alternative tool or software package to duplicate the analysis.

But while such a situation can arise with any type of numerical software, it’s most likely to happen to users of a statistical package. I find this extremely ironic since a proper understanding of statistics is essential to live in the modern world. (Much more so than an understanding of the Fast Fourier transform, at any rate.) The rules of probability, the normal curve, correlation, and multivariate statistics can have a direct impact on how we live our lives. They are used in making important decisions in finance, medicine, science and government. A misunderstanding of stats and the methods of science (from which statistics is inseparable), underlies the most divisive issues of our day: abortion, stem cell research, and global warming.

Moreover, neither side has a monopoly on ignorance or misunderstanding. People fail to distinguish between correlation and causality, or insist in using the word “average” as a slur. Nearly as bad are those that – like the hypothetical engineer described above – only understand statistics within the narrow context of their stats package. Casual statisticians are nearly as dangerous as the wholly uninformed.

The Statistical Package for the Social Sciences (SPSS), is one of the biggest perpetrators of this crisis. Which is hugely ironic, because I happen to love SPSS. SPSS is probably the first statistical package that has placed advanced statistical methods within the grasp of the novice user. I’ve been a happy user for nearly a decade (ever since I was introduced to the program in high school). But there is no doubt that I’ve come to understand statistics within the context of SPSS and its GUI.

Please don’t misunderstand me, I have a pretty good grasp of basic statistics. I can sling probability with the best of them and take relish in describing when to use the Fischer Exact test instead of a Chi-Square; but advanced statistics are a completely different matter. Advanced stats *scare* me. I can certainly use these more complicated methods. I’ve analyzed and written about multi-variate models and even ventured into Analysis of Variance (ANOVA). But I have to rely on SPSS and the aid of my institution’s biostatistician to help me recognize when there is a problem.

Which is why, in a time of tight budgets, losing the institution’s SPSS license has been a crushing blow to my productivity. (Whoever made that decision should be hauled out and shot!) Because I don’t have my statistics software any more, there are certain aspects of my job that are much more difficult to do. And unfortunately, there is only logical conclusion to draw: I’ve become a victim of the statistical ease of SPSS.

Show me more… »

Tags: R,Scientific Computing,Statistics

Categories: Computer, rapidBOOKS

3 Comments »