Friday, 29 April 2016

Podcast: Donald Trump and the Limits of Forecasting

The rise of Donald Trump prompts Nick, Peter and Fraser to discuss the limits of forecasting behaviour.

To subscribe to the podcast, add this RSS feed to your preferred player.

Monday, 25 April 2016

Success metrics for prevention of rare events

The problem with things that don't happen very often is that it's hard to tell if you've prevented them. How long do you have to wait before you can conclude that it's worked?

This is a ubiquitous question - one that appears, for example, in preventive medicine, national security, and airline safety - and we've touched on it before when looking at evidence of absence. The approach to take, as ever, depends on the system you're looking at and the assumptions you can make about it. But it's possible to generate rules that can act as a handrail to our beliefs, based on simple probabilistic reasoning.

So far, so good

For instance, if a type of event normally occurs with a frequency of, say, once every five years (or 0.2 times a year, on average), then it has roughly a 50% chance of happening in a three-year stretch. This means that, for every three years that go by without any event of that kind, the odds of the hypothesis that it's stopped happening roughly double. If you originally thought there was a 50% chance of the prevention activity succeeding, then after three years the probability would have risen to about 67%.

In general, for a randomly-occurring event with an annual frequency of f, the probability of no occurrences of the event in t years is e^(-ft), as a direct implication of the Poisson distribution, which governs the behaviour of these kinds of processes. If we start being very uncertain that our new prevention measure will work - if we assign an initial 50% probability to its success - then we can calculate what the success probability will be after any amount of uneventful time has passed.

For low-frequency events, this can be approximated reasonably well by 0.5 + 0.25ft. So if we try to prevent events of a kind that happens once every five years, and are initially 50% sure we'll succeed, then after four uneventful years the probability it worked will be around 70% (0.5 + 0.25 x 0.2 x 4). Using this simple approximation, we can think about how successful preventive measures have been.

For instance, how successful has the European Union (and its previous incarnations since the Treaty of Paris in 1951) been in preventing large wars between European states - say, wars with more than 1m deaths? These large wars are very infrequent: in the last thousand years, this category probably only includes the Hundred Years War, the Thirty Years War, the Napoleonic Wars, and the two World Wars: an average annual frequency for conflict-onset of about 0.005.

The sixty-five years of peace since the Treaty of Paris would therefore raise our belief that the EU has prevented large European wars from 50% in 1951, to about 58% today. But if we take our baseline frequency from 1800 - during which large wars occurred about every 70 years instead of every 200 - instead of the last millennium, it would have risen to about 70%.

Never again?
This is of course an outrageous simplification but nonetheless a somewhat useful one when thinking about our beliefs. To have a belief that diverged markedly from this probability would require some strong additional evidence over and above simply the prevalence of peace since 1945. One such piece of evidence might be the more marked (but harder to measure) decline in the frequency of small European wars. And there's no particular reason to have started the clock at 50% - i.e. at total ignorance. There might be good prior reasons to expect the EU to be good at preventing wars, or the opposite.

And of course, as always, strength of belief is by itself no guide to policy. We also need to take account of the costs and benefits of our choices. The EU's budget is around £100bn. No-one knows the economic cost to Europe of World War II, but an estimate in excess of £10tr wouldn't be unreasonable. Using the post-1800 base rate, this would imply an average annual cost of large wars of around £140bn.

This makes it a curiously close call: a probability of 70% that the EU is preventing large wars makes it just about viable, for that single purpose. Of course the EU doesn't just exist to prevent large wars, and there are reasons not to want wars that aren't just related to their economic cost. And we haven't taken account of other hypotheses in which their frequency is lessened but not reduced to zero. Wars have also been falling in frequency across the world - we haven't taken that into account either. As usual with these kinds of problems, there are a great many things to take into account to which our conclusion is sensitive. This isn't surprising really: if they were clear-cut, we wouldn't spend so much time debating them.

Friday, 22 April 2016

Podcast: AlphaGo

Nick, Peter and Fraser discuss what AlphaGo's triumph over Lee Sedol might mean for analysis and decision making.

To subscribe to the podcast, add this RSS feed to your preferred player.

Saturday, 16 April 2016

Thinking about Sampling

The Channel 4 documentary 'What British Muslims Really Think', screened last Thursday, presented the results of a set of structured face-to-face interviews of 1000 British Muslims. The ensuing controversy - inevitable, one imagines, whatever the results of the survey might have been - presents a unusually-topical opportunity to look at some interesting aspects of statistical methodology.

Some criticisms focused on the sample size (e.g. "How can it be possible that the views of 1,000 odd people can prove something about an entire community?", or "You can't get a thousand people and just ask them questions and make that a representation of [all] British Muslims"). The counterintuitive thing about sample size is that, by-and-large, the percentage of the population surveyed is less important than the absolute numbers. A survey of 1000 Muslims would be roughly as informative for most purposes if there were 100,000 Muslims in Britain as if there were 10,000,000, assuming of course the sampling was genuinely random. 

Of course, as ever, what constitutes an 'adequate' sample size is not determined just by the things we're interested in, but equally by the decision we need to make based on the information the sample gives us. If the decision we wanted to make was very risky - if there were sizeable differences in the outcomes from getting it right or wrong - then we need a larger sample size. If not, sample size matters less. 

If there's no decision to make, however, then the concept of an 'adequate' sample size simply doesn't apply. It's a bit like the Type I / Type II error distinction - it can only be made meaningful in the context of a decision. Information from a sample is just information and should affect our beliefs about the world to an extent determined by how informative it is - there isn't a cut-off below which we should simply ignore it. A small sample of a large population won't give us much information, but it will give us some. As much as classical statistics tries to suppress this notion, we all understand this intuitively and would be incapable of functioning if it were not true: if we had to sample 1000 olives before deciding we didn't like them - just in case we'd just been unlucky - we'd spend a lot of time being miserable.

If you're looking for a line-of-attack against a survey with whose conclusions you disagree, hunting for sampling bias is normally a better bet ("It is clear that the areas surveyed are disproportionately poorer and more conservative [and so] may potentially have different views than richer areas"). The great thing about sampling bias is that there is no 'diagnostic test' for it - in other words, you can't pick it up from any feature of the data. Instead, you have to look at the mechanics whereby the data were generated to see if it's likely to produce correlation with the thing you're trying to measure. Among other things, this has the benefit of moving the debate away from discussion of statistical rules-of-thumb and widely-misunderstood concepts like confidence intervals, which can't by themselves ever force us to 'reject' a survey entirely.

When do you give up?
All animals deal with the 'sampling' problem all the time - it's fundamental to decision-making in a dynamic world. We continually have to decide whether we've got enough to make a call (to keep eating the fruit, attack the castle, put an offer in on that house etc.), or whether we need to collect more information first. Thinking statistically can certainly help (and is perhaps necessary) when we are trying to measure the power of information, but any claim that a survey should be ignored entirely based on its failure to meet some statistical threshold is almost certainly mistaken.