Imagine for a moment that you are a stat professor. You are teaching a STAT101 class and you assign a term project to the class. The project is simple: students need to demonstrate mastery of the material taught in the class by defining a problem from their everyday life, collecting some data and preforming and appropriate analysis by using at least one concept that they learn in your class.
Now lets assume that one of the groups submits the project and in their final report state: “We pretty much didn’t have time to collect data so John made up some numbers. Then we also figured out that we didn’t want to build a statistical model so Mary made up some predicted values and confidence intervals. We thought that the problem that we were trying to solve it was so difficult that it probably made no difference if we go with real data collection or just make up some results”. I guess not the greatest piece of work you’ve seen. Right? Most probably these folks will need to repeat this class next semester.
Now lets go back to reality. We have elections in the US and forecasting websites post their “sciency” data driven forecasts every half an hour. A lot of people follow them because they look trustworthy and they are based on hardcore stat models. After all people involved in these websites know their work and have had several success stories so far. The day of the elections pass and it turns out that none of these predictions turn out to be right. Everybody is astonished. People start believing that statistics are useless and not good for anything.
Following the election results I found myself several times discussing or arguing with colleagues and friends about the value and the usefulness of statistics. I kept repeating that statistics are neither “bad” or “good” but the forecasts can be right or wrong with a certain probability. Furthermore the trustworthiness of the results depends on the right data collection and the appropriateness of the methods. This is common knowledge for everyone who has had a decent graduate class in statistics but for most of people a failed prediction lead to the total rejection of a field with hundreds of years of history.
Discarding Statistics as useless or pointless is the same as saying “we need to ban driving so that we don’t have accidents”. Well my opinion on this is that we need to build better cars and educate people about safe driving so that we efficiently reduce accidents.
The editor in chief of the major forecasting website fivethirtyeight.com, Nate Silver posted an apologetic letter stating the major reasons why their predictions went so wrong (the full letter can be found here http://fivethirtyeight.com/features/how-i-acted-like-a-pundit-and-screwed-up-on-donald-trump/ ). In a nutshell these are the major reasons why (I’m quoting the exact titles as they appear in the letter)
1. “Our early forecasts of Trump’s nomination chances weren’t based on a statistical model, which may have been most of the problem.”
2. “Trump’s nomination is just one event, and that makes it hard to judge the accuracy of a probabilistic forecast.”
3. “The historical evidence clearly suggested that Trump was an underdog, but the sample size probably wasn’t large enough to assign him quite so low a probability of winning.”
4. “Trump’s nomination is potentially a point in favor of “polls-only” as opposed to “fundamentals” models.”
5. “There’s a danger in hindsight bias, and in over correcting after an unexpected event such as Trump’s nomination.”
I guess that there is no need to explain how most of these arguments resemble the ones of the fictional group project at the beginning of this article. Right? The next question that naturally arises is what is the difference between a college student group who work on their project with an independent website like fivethirtyeight who issue their forecasts about major events? The answer is again simple: in the first case, there is a professor who can judge the quality of the work, whereas in the second case, the forecast can be published and judged for its accuracy only after the real event happens.
I am not trying to point my finger anywhere or be judgmental about model flaws but I am rather making a point that most forecasts in popular media are posted without any third party review or quality check.
In academia, when a group of researchers wants to publish the results of their research they write their paper, the paper is submitted to the editor-in-chief who in turn ask the expert opinion of several other scientist related to the field. The task of these independent reviewers is to identify glitches and mistakes related to the methodology and then ask the authors to submit a better/corrected version of their manuscript along with a letter that explains all the changes made to the manuscript. This process is called peer review and it is pretty standard for academic publications. Sometimes it might take several back and forth between the reviewers and the authors before the work is finally accepted for publication. Although this process is not perfect and definitely not flawless it acts like a “firewall” that pushes back papers with major deficiencies.
The difference is that in popular articles there is hardly any quality check of the math, models and data. Major flaws are revealed usually when something goes wrong and it usually depends on the sincerity of the source.
I am the last one to support that popular media should adopt a process exactly similar to academic peer review. As I explained earlier it is a process that still has a lot of flaws and it is pretty time consuming. But It is imperative though to adopt some short of quality control that will at least make sure that forecasts with major flaws will not come out to public. But until that day comes I think that we should all be extremely careful about what we read and what we trust.