Wednesday, 12 August 2015

The Impact of Information on Probability

In this post we alluded to the fact that information tends to push probabilities towards 0 or 1. If you're not familiar with information theory, this might not seem obvious at all. After all, when we say something is 10% likely, on one occasion out of ten it will turn out to be true - further information will push its probability away from 0, towards 50%, and then up towards 100%. New information might then push it back again. Information seems to be able to push probabilities around in any direction - so why do we say that its effect is predictably to push them towards the extremes? This post zooms in on this idea, since it's a very important one in analysis.

To start with, it's worth taking time to consider what it means to assign a probability to a hypothesis. To say that a statement - such as "it will rain in London tomorrow", "Russia will deploy more troops to Georgia in the next week" or "China's GDP will grow by more than 8% this year" - has (say) a 10% probability implies a number of things. If we consider the class of statements to which 10% probabilities are assigned, what we know is that one in ten of them are true statements, and nine in ten are false statements. We don't know which is which though; indeed, if we had any information that some were more likely to be true than some others, they couldn't all have the same probability (10%) of being true. This is another way of saying that the probability of a statement encapsulates, or summarises, all the information supporting or undermining it.

Now let's imagine taking those statements - the ones assessed to be 10% probable - and wind time forward to see what happens to their probabilities. As more information flows in, their probabilities will be buffeted around. Most of the time, if the statement is true, the information that comes in will confirm the statement and the probability will rise. Most of the time, if the statement is false, the information that comes in will tend to disconfirm it and the probability will fall. This is not an empirical observation - it's not 'what we tend to see happening' - but instead it follows from the fundamental concepts of inference and probability. It means that things that are already likely to be true are more likely to be confirmed by new information, and things that are already likely to be false are more likely to be disproved with more information.

This means that most of the '10%' statements (the nine-out-of-ten false ones, in fact) will on average be disproved by new information, and the others (the one-in-ten true ones) will on average be confirmed with new information. By definition, this isn't a predictable process. It's always possible to get unlucky with a true statement, and receive lots of information suggesting it's false. It's just less likely that that'll happen with a true statement than with a false one. And the more information you get, the probability that it's all misleading becomes vanishingly small.

But we need to be careful here. When we say that most statements assigned a 10% probability will be disconfirmed with new information, we're not saying that, on average, the probability of '10% probable' statements will fall. Far from it: in fact, the average probability of all currently '10% probable' statements, from now until the end of time, will be 10%. Even if we acquire perfect information that absolutely confirms the true ones and disproves the false ones, we'd have one '100% statement' for every nine '0%' statements - an average probability of 10%. But as time (and, more pertinently, information) goes on, this will be an average of increasingly-extreme probabilities that approach 0% or 100%.

Perhaps surprisingly, we can be very explicit about how likely particular future probability time-paths are for uncertain statements. If we assume that information comes as a flow, rather than in lumps, the probability that a statement's probability will rise from p0 to p1, at some point, is rather-neatly given by p0/p1. For example, the probability that a statement that's 10% likely will (at some point) have a probability of 50% is (10% / 50%) = 20%. Why? Well, we know that only one in ten of the statements are true. We also know that for every two statements that 'get promoted' to 50%, exactly one will turn out to be true. So two out of every ten '10%' statements must at some point get to 50% probable - an actually-true statement, and a false fellow-traveller - before one of them (the true one) continues ascending to 100% probable and the other (the false one) gets disconfirmed again. (The equivalent formula for the probability that a statement will fall from p0 to p1 is (1-p0) / (1-p1).)

The time paths of probabilistic judgements are surprisingly predictable
We can say a surprising amount about the future paths of probabilistic
judgements. Unfortunately none of it is 'useful' to decision-makers
because all of these possibilities are encapsulated within the
probability itself.
This may not seem intuitive at all. In fact it might seem barely believable. But it's embedded in the concepts of probability and evidence. Information will, in the long run, push probabilities towards the extremes of 0% and 100%. Unfortunately, we don't know which statements new information is going to push one way or the other - if we did, we'd have the information already and it would be incorporated into our probabilities. Assuming, of course, we were doing things properly.

Monday, 10 August 2015

Confidence and Probability Part 9: The 'Expected Value' Theory

So far, in the quest to understand what statements of analytical confidence actually mean, we've appraised five separate theories on the basis of coherence, alignment to actual usage, and decision-relevance. The last two theories relate to the economics of information - its value, and its cost.

It is not widely understood, even by people in the information business, that the value of information can be quite precisely defined in terms of its impact on the outcomes of decisions, and in particular on risk. The 'expected value' theory of analytical confidence is that it captures this value and describes how useful more information is likely to be. It says that if new information is likely to be of low value, you can have higher confidence than if new information is likely to have high value. In this respect, the 'expected value' theory is a refinement of the 'ignorance' theory, but instead of relating confidence to the amount of unknown information (which is probably an incoherent idea), it relates it to its value.

Is it Coherent?

Information has value, ultimately, because it makes us more likely to choose a course-of-action matched optimally to our circumstances. Exactly why information adds value in this way follows from its intimate link to probability, discussed here. More information has the effect of - or rather, is the same as - pushing the probability of an unknown closer to 0 or to 1. So more information leads to greater certainty about outcomes. Where these outcomes are important to a decision - e.g. the prospect of rain, to a decision to take your umbrella with you to work - this has the effect of lowering risk.

More information means less chance of a decision-error

'Risk' has been defined in a large number of ways, varying from the woolly and circular to the relatively-robust. One simple definition, which works well for most purposes, is that a 'risk' is a possible outcome that, if it were to occur, would mean you'd wish you'd done things differently. If you take out insurance, the risk is that you don't actually need to make a claim. If you don't take out insurance, the risk is that you do. Risks don't necessarily represent decision errors. Risk, understood in this way, is inherent in decision-making under uncertainty: given that the uncertainty (whatever it is) is relevant to your decision, there will always be the possibility of being unlucky and retrospectively wishing you'd made a different choice.

This is where information comes in. Information pushes probabilities closer to 0 or 1. On average, this will tend to reduce your exposure to risk, even if it doesn't actually change a decision. For instance, let's say you are considering getting travel insurance for your camera, which is worth £500. This would cost £50. But you think there's only a 5% chance it'll be stolen or lost, so you decide not to get the insurance. Now suppose you receive information about your destination which suggests that crime is almost non-existent there, which means you revise your estimated probability of loss down to just 2%. This doesn't change your decision - you're still not going to take out insurance - but your exposure to risk has fallen from an average value of -£25 (5% of £500) to -£10 (2% of £500). In other words, you're better off, on average, as a result of the new information, even though it didn't change your behaviour.

We're not used to thinking of information as risk mitigation, but it has exactly the same effect. This is wherefrom information derives its economic value, and it's the basis for the 'expected value' theory of analytical confidence. The key idea is that if we expect new information to be of relatively high additional value, confidence will be low - because we ought not to make a decision yet, but to collect more information. But if new information is likely to be only of low value, confidence will be high, since further analysis is not likely to add any additional value to the decision-maker.

What determines the expected value of further information? The maths is a bit convoluted, but there are two key factors: current uncertainty, and the magnitude of the risks. The closer you get to a probability of 0 or 1 with your key uncertainties, the less valuable, on average, further information is likely to be. And, perhaps intuitively, the more expensive the risks and benefits of your decision are, the more valuable (all else being equal) further information is likely to be.

In summary, the notion that confidence relates to the expected value of further information has a sound theoretical basis.

Does it accord with usage?

Looking at our survey results, confidence was affected by relative magnitudes of risks within questions, but there was no consistency about the levels of confidence expressed between questions. For example, assessments about the authenticity of a £10 watch attracted greater confidence levels than similarly-based assessments where the watch was worth £800. But the absolute levels of confidence expressed were not dramatically different to questions involving the potential collapse of a bridge, or from one about the cessation of a rural bus route. Likewise, as we covered here there was no significant observed relationship between expressed confidence and current levels of uncertainty. 

Turning to the qualitative definitions of 'confidence' supplied by survey respondents, there was limited support for the idea that confidence is related to expected value of future information, apart from one or two references to the 'ability to make a decision' and to 'knowing when not to make a decision'. In general, analysts do not think that confidence is related to the characteristics of the decision their analysis is supporting, nor to current levels of uncertainty, and seek to root it instead in characteristics of the evidence base or assessment methods, even where (as we've seen) these ideas might be hard to define coherently.

Is it decision-relevant?

Expected value of information certainly is decision-relevant. It is an important factor determining whether it's optimal to make a decision immediately, or to defer and collect more information first. However, it's only one half of the information you need to make this judgement definitively. The other half is the expected cost of further information. This is the final theory of confidence, that we'll be looking at in the next post.


The 'expected value' theory of confidence is both decision-relevant and coherent. However, analysts' assessments of confidence are not heavily influenced by it and it is not widely represented in the kinds of things analysts put forward for their proposed definitions.

In the next post we'll cover the final theory of confidence - that it captures the expected cost of further information.