Tuesday, 24 November 2015

The Red Button Problem

Liverpool University Professor Simon Maskell created the 'red button problem' as a simple but messy conundrum that can be used to elucidate the similarities and differences between different approaches to handling uncertainty, such as Bayesian inference, Dempster-Shafer, frequentism, 'fuzzy logic' and so on. We'll outline the problem here, and present our answer, which comes from a squarely Bayesian angle.

The Red Button Problem

You are in charge of security for a building (called 'B' in the problem, but let's call it 'The Depot'). You are concerned about the threat of a VBIED (vehicle-borne improvised explosive device, or car bomb) being used against The Depot. 

An individual (called 'A' in the problem, but let's call him 'Andy'), has previously been under surveillance because of 'unstable behaviour'. He drives a white Toyota.

A white Toyota of the kind that may be owned by 'Andy'

10 minutes ago, a white Toyota was spotted on a video camera on a road 200m away from The Depot. An analyst ('Analyst 1' - let's call him 'Juan') with ten years' experience views the video footage and states that Andy is 'probably' near the Depot. An automated image recognition system, analysing the number plate, states that there is a 30% probability that the Toyota in the image is Andy's car. 

5 minutes ago, a white Toyota was spotted on a video camera on a road 15km away from The Depot. A second analyst ('Analyst 2', who we'll call 'Tahu'), who is new in post, reviews the footage an concludes that it is 'improbable' that Andy is 'near The Depot'.

There is a big red button in your office, that if pressed will order The Depot to be evacuated.

A big red button yesterday

Based on the evidence you have, should you press the button? 

Key Uncertainties

This problem is deliberately full of ambiguity. There are many questions that, in real life, you would want to know, and probably would already know or could easily resolve, including:

  • Is Andy's 'unstable behaviour' of a kind that is in any way empirically linked to VBIED attacks? Why was he under surveillance?
  • How many other individuals like Andy pose a security concern?
  • Which country and city is The Depot in, how easy is it to obtain the materials to manufacture a VBIED, and how prevalent are VBIED attacks?
  • Is The Depot in some way a salient target, in general and for Andy?
  • Do we have video feeds from the streets adjacent to The Depot, so we can scan them for the white Toyota?
  • Are analysts on the lookout for Andy?
  • What kind of analysis is Juan experienced in? 
  • To what extent is Juan already factoring in the NPR data?
  • When Tahu says he thinks it's improbable that Andy is near The Depot, does he mean that 15km away (where he spotted the Toyota) is not near, and that he thinks it's Andy's car, or that he thinks 15km is near, but that it isn't Andy's car?
  • Given the roads and traffic concerned, could Andy conceivably have driven at 180kmph - a fair lick, but not physically inconceivable - and thus be in both videos? Is this even possible in a Toyota?
  • Under what circumstances is it safer to evacuate The Depot rather than stay inside or perhaps retreat to the basement? 
  • How many people work in The Depot, and how costly is evacuation?

Thankfully, instead of requiring concrete answers or forcing us to make cavalier assumptions, the Bayesian approach to tackling these sorts of uncertainties is simply to expose and quantify each of them, based on reasonable comparison with similar situations and drawing on all the available evidence including the existence of the problem itself. For example, although we don't know anything about Andy's 'unstable behaviour', the fact that there are analysts keeping an eye out for him, and that we (the person in charge of the red button) are involved in the problem at all, suggests that he is at least more likely than the average person to commit a VBIED attack. Similar kinds of reasoning can be applied to think about likely answers to each of the questions above, so these uncertainties can be factored into our response. In this case, though, it's largely unnecessary to do this for all of the questions above, as we gain most of what we want for a decision through order-of-magnitude judgements, using a simple model.

Start with the Decision

All analysis, particularly in the realm of business or government, adds value because it reduces uncertainty and therefore risk. The more closely one's object of analysis - the key uncertainty, or target hypothesis - maps to one's decisions, the more valuable the analysis will be. In most cases, this means that analysing the decision needs to precede the analysis of the problem

In the case of the Red Button Problem, the decision is a simple binary one: evacuate, or don't. If we evacuate, there are two stages to think about - during the evacuation, and after the evacuation. While people are filing out of The Depot, one assumes to a remote assembly point, they might be more vulnerable to a VBIED attack. The US National Counterterrorism Centre helpfully provides some guidance on how to respond to VBIED threats. In the case of an explosive-laden sedan (e.g. a Toyota) the guidance suggests that sheltering in place is the optimal strategy at a radius of 98m or more (98m is curiously, and implausibly, specific here - why not say 100m?) but that 560m is the safest distance. Closer than 98m, evacuation is always optimal. If The Depot is located in the centre of a compound, staying inside would therefore be better. But if it's roadside, it would be better to evacuate, presumably because being inside a building at that radius would be as dangerous as being caught outside during a blast.

The decision is therefore time- and distance-sensitive. If there's a VBIED closer than 100m to The Depot, evacuation is best under any circumstances. If there's a VBIED between 100m and 560m from The Depot, sheltering in place would be best if it's going to explode soon, but evacuation would be better if there's enough time to get people to a safe distance. This might take in the order of ten minutes. Beyond 560m, you certainly shouldn't evacuate as people will be safer inside.   

So we have three mutually-exclusive hypotheses that we're interested in here:

H1. There is no VBIED.
H2. There is a VBIED that is timed to explode in roughly the next ten minutes.
H3. There is a VBIED that is timed to explode in more than ten minutes. 

If The Depot is near the road, evacuation is optimal under H2 and H3. If The Depot is more than 100m from the road (but less than 560m), evacuation is optimal under H3. Under all other circumstances, evacuation isn't optimal.

Costs and Benefits

Optimal decision-making involves at the very least a comparison of costs and benefits with probabilities. A bomb might be very unlikely, but if the risk of staying put is significantly higher than the cost of evacuating, it might still be right to press the button. 

In this case, the cost of evacuating, whether or not a bomb goes off, will be equal to the lost productivity from workplace absence. How much is that worth? Depending on what sort of work The Depot does, it won't be too far off £10-100 an hour per person. Assuming, in the event of a false alarm, that the area can be confirmed safe (and people return to their desks) in about an hour, and that there are (say) 100 or so people at The Depot, the cost of an evacuation would be of the order of £1,000-10,000. 

What if we don't evacuate - or we evacuate at the wrong time, and people are still walking to their muster point - and a bomb goes off? There are two things we need to know: how many deaths are likely, and how much is a death worth? 

It's hard to find statistics about numbers of people killed by car bombs at various distances. But we have a few reference points. The Oklahoma bomb killed 168 people, in an office building that accommodated about 500. It was a massive bomb, and close to the building. As an upper limit, this suggests that something like one third of The Depot's occupants would be killed. Then you have other costs, such as ongoing medical costs for those injured. If The Depot were further away from the road, this figure would be lower, and we can assume that at 500m or so, the probability of being killed is negligible. So we are perhaps looking at something around 10% fatalities from a nearby VBIED if The Depot were not evacuated. For the trickier situation in the 'shelter-in-place' zone, we may have to do some handwavey guesswork if the decision turns on the relative probability of a bomb in the next 10 minutes (the evacuation time) compared to a bomb afterwards.

How much is a life worth? Although some organisations avoid putting explicit values on human lives, the 'yuck' factor aside, you have to do it somehow or you won't be able to make a decision. If you use lifetime productivity (at the lower end) or willingness-to-pay to avoid death (at the upper end), estimates of the value of a life are typically in the order of £1m-10m. Assuming there are around 100 staff, and that not evacuating when there's a bomb next to The Depot would lead to around 10% fatalities, this gives us an order-of-magnitude estimate for the maximum cost of an un-acted-upon roadside bomb of around £10m-100m.

The ratio between the two figures - the cost of evacuation and the cost of failing to evacuate when there is a bomb - could therefore be somewhere between 1000:1 and 100,000:1. This ratio - between costs and benefits - gives us a 'critical probability' of a bomb, above which we should evacuate, and below which we should sit tight, for those situations in which evacuation may be optimal.

On to the Probability

The decision analysis has given us some useful information about roughly what we need to establish. If The Depot is close to the road, we should certainly evacuate if the evidence suggests a bomb probability of more than 1 in 1000, and certainly not evacuate if it's less than 1 in 100,000. If it's in between, we need to think a bit harder but we might take the risk-averse option and evacuate anyway. If The Depot is further from the road, but less than about 500m away, we have a trickier decision that will depend on our assessment of bomb timing and relative safety inside or outside the building.

In the usual Bayesian fashion, we'll take the approach of splitting our probability estimate into a prior probability, typically using background frequencies, and a set of conditional probabilities that take into account the evidence. This provides a useful audit trail for our estimate, to identify key sensitivities in our assessment, and to focus discussion on the main areas of disagreement.

First, then, what's the background frequency of VBIED attacks? Well, clearly it depends. In Iraq, there were over 800 VBIED attacks in 2014 alone, out of around 1400 worldwide (according to the ever-useful Global Terrorism Database). But assuming The Depot isn't in a troublespot - Iraq, Yemen, Nigeria etc. - the prior probability of a VBIED will be minuscule. There are - generously - a few dozen such attacks in stable countries worldwide. There are a number of back-of-the-envelope ways we could derive a prior probability from this, but it would be of the order of 1,000,000-to-1 a year that a VBIED attack would hit a particular office building, and therefore of around 1,000,000,000-to-1 or less that a VBIED was parked outside a particular building and timed to go off within the next (say) eight hours. 

The question, then, turns on the power of the evidence - the relative likelihood of that evidence under the 'bomb' and 'no bomb' hypotheses. Is this evidence sufficiently powerful to raise the odds of an imminent VBIED by a factor of 10,000 or more - to at least 100,000-to-1 - as would be needed to make evacuation potentially optimal?

First, we will discount Tahu's evidence entirely. We don't even know what he means. Does he mean that the car he saw 5 minutes ago (15km away) was Andy's car, and that 15km is 'not near', or that although 15km is 'near', the car he saw wasn't Andy's car and so Andy is unlikely to be near The Depot simply because there's no reason to think he would be? We don't know. Both interpretations seem equally likely, and they pull in different directions: if the car was Andy's car, we can pretty much rule out his presence at the Depot, but if it wasn't Andy's car, the other evidence becomes more important. Action point: Tahu to be enrolled on an intelligence analysis communications course.
The more of these you have, the less diagnostic they are
The next piece of evidence is the video of the white Toyota, possibly Andy's, spotted 200m from The Depot ten minutes ago. We'll combine it with the evidence of Andy's previous instability to form a single piece of evidence:
  • E: "An individual with a history of instability was in a car near The Depot five minutes ago."
In fact, this evidence should only believed with a probability of either 30%, or whatever Juan means by 'probably', or some other number depending on how credible the NPR system or the analyst are. But we're going to pretend that E is known with certainty. This is the most diagnostic case. If it's still insufficient to push the probability into the 'evacuate' zone, then we don't need to worry about the finer points.

So, how likely is E if there's going to be a VBIED attack? Let's assume it's pretty close to 1. Some VBIED attacks will be carried out by individuals not known to have been unstable, but let's not worry about that. The key question here is the second probability - how likely is E under the assumption that there isn't going to be a VBIED attack? The lower this probability, the more powerful the evidence.

What this boils down to is how likely it is that an unstable person will be 'near' The Depot during the vast majority of time that there isn't about to be a VBIED attack. According to this article, the Security Service ('MI5') are 'watching' 3000 potential 'Jihadists' in the UK. Let's assume, including other types of threat, that there are something like 6000 people of 'concern' in the UK. This is about 1 in 10,000 people. The security infrastructure supporting The Depot may well have a similar proportion of people covered - after all, they have intelligence analysts, collection assets and so on. 

Finally, we need to guess roughly how many people are 'near' The Depot every day, and how probable it is that one of them is an individual of concern. Is it in the middle of a town, or out in the countryside? Let's give it the benefit of the doubt and assume that it's somewhere quiet, which again increases the power of the evidence. Let's say one car a minute is on the road nearby. This equates to about 500 cars during a working day.

And here's the key point: even with just 500 cars going past a day, each with just one occupant, you expect to see an individual of concern on average every twenty days. In an average eight hour stretch, the probability of seeing one of these individuals is about 5%. Even in the worst case - with a set of assumptions that make the evidence particularly diagnostic - the evidence presented raises the probability of an attack by a factor of just 20 or so - nowhere near the factor of 10,000 that would be needed to make evacuation optimal at any distance.


Don't press the button. Andy might be nearby, but the probability that he's about to conduct a VBIED attack is negligible. Instances of 'unstable person near a building' are far, far more frequent than instances of 'VBIED attack', to a multiple that greatly exceeds the ratio of costs and risks associated with the decision problem itself. The nature of the evidence is such that it simply cannot be diagnostic enough. In real life, you'd perhaps want to pursue further investigation - perhaps eyeball the street outside to see if the car's there. But ordering an immediate evacuation would be very jumpy indeed, and your tenure as head of security would probably not be a long one.

Some professional analysts would baulk at the approach taken above. It seems too full of assumptions and guesswork. It is, of course - in real life you would have a lot more information that would help guide the decision. But the broad approach taken - to start by analysing the decision, then ask whether the evidence could conceivably be sufficient to change it - is a robust one, and might save analytical resources that would otherwise be used up on estimating a probability that would not, in fact, make any difference to anything.

Friday, 6 November 2015

Sharm el-Sheikh and Terrorism Risks

The British government's choice to suspend flights from Sharm el-Sheikh, based on 'intelligence reports', makes an interesting subject for decision-analysis. On the benefit side is the (assumed) avoidance of an unacceptable risk of fatalities either by moving passengers geographically (to another airport) or temporally (by delaying flights for some period of time), and the longer-term reduction in risk through pressure on the Egyptians to improve airport security. On the cost side are the potential increased risk to those passengers through remaining in Sharm el-Sheikh, and the material costs (time, money) of delayed return to the UK. By making a few back-of-the-envelope calculations, we can get a sense of what the scales of these various risks and costs are, and draw some inferences about the UK government's decision calculus.

Background Risks

Before considering the specifics of the case, it's useful to look at the data on background terrorism risk levels. The Global Terrorism Database is the go-to source for this kind of information, even though the latest full database only runs up to the end of 2014 and therefore doesn't include recent events such as the bombing of the Italian consulate in Cairo or the downing of Flight 9268 (if this were to prove the work of IS).

Despite the conclusions our reptilian brains might draw from the background noise of an 'ever more dangerous world', for people living outside countries such as Syria, Iraq, Nigeria and Afghanistan, terrorism risks are minuscule. In the UK in 2010 (the latest year for which comprehensive cause-of-death data are available), no-one died in any terrorist attacks, but 34 died of bullous pemphigoid - we'd never heard of it either. Even in Iraq, an average of 5,400 people a year were killed in terrorist incidents in 2010-2014, compared to a reported 10,000 from road traffic accidents in 2014.

You're pretty safe here

Between 2010-2014, around 130 people a year died in terrorist attacks in Egypt. Given Egypt's population (80m), this means that a day spent in Egypt would have exposed you to around a 1-in-250,000,000 chance of dying in a terrorist attack. We might suppose that tourists were particularly likely to be targeted, but this is not so - only 3% of attacks targeted tourists, and of these, not a single one was fatal in the whole five year period. 

What about terrorism against non-military aircraft? Throughout that period, there were no terrorist attacks against such aircraft in Egypt, and there were only four such attacks throughout the region between 2010-2014. With such low-frequency events, we get a better idea of background risk looking at the global figures. The wider trend is clear: aviation is safer than ever. Looking at terrorism risks in particular, between 2010-2014 there were just 17 attacks of any kind on non-military aircraft, with 306 fatalities, but 298 of these were in a single incident: MH17. The risk of dying in a terrorist attack during a flight is therefore probably around 1-in-50,000,000. Five times more dangerous than the terrorism risk of spending a day in Egypt, but still negligible, particularly in comparison with terrestrial means of getting home.

Finally, given that the stranded passengers will be hanging around in the airport for longer, we should consider the background risk of terrorist attacks on those targets. Perhaps unsurprisingly, attacks on airports are more frequent, but considerably less deadly, than those on aircraft. 227 people were killed in 118 attacks on airports worldwide between 2010-2014. (Two of those attacks were in Egypt, with one fatality between them.) On average, this is a terrorism death once every eight days in airports worldwide. Perhaps around 16,000,000 hours are spent by people in airports per day (assuming 8,000,000 flights and 2 hours of airport-time per flight), so this works out at around one terrorism death every 130,000,000 hours spent in airports. To put it another way, about 150 minutes in an airport is as risky as the flight you take, and 24 hours in an airport exposes you to about a 1-in-5,000,000 chance of dying in a terrorist attack. It would certainly be safer to make your way back to the centre of town for a mint tea as most of the passengers at Sharm seem to have done.

Some ways of exposing yourself to a 1 in 50m risk
of dying in a terrorist attack

'Intelligence Reports'

With these figures serving as anchor points, we can speculate as to what intelligence the UK government might have received, and consider its possible implications for the relative risks of postponing the flights versus allowing them to continue.

'It was a bomb'

The media are suggesting that the intelligence reports related to Flight 9268, and specifically that it was destroyed by a bomb. If this is all the intelligence reports suggested, then the only implication for future terrorism risk is if there is a clustering effect. The data for Egypt suggest a small clustering effect, such that the risk of a terrorist attack (of any kind) is around 15% higher on the same day as another attack, but this effect disappears thenceforth. Using the figures above, there might therefore be a rationale for sending passengers home on the day that Flight 9268 went down - a Saturday - but the travel ban was imposed on the following Wednesday. Assuming the UK government is acting to minimise risks, we might be able to assume that the intelligence also related to raised future threat levels.

'There is a general threat'

If the intelligence suggested a higher general threat level, this might appear to justify the travel ban. But if a 'general' threat increase applies to all terrorism risks in Egypt, including against flights, the figures presented above suggest that the best thing to do would be to fly passengers home as quickly as possible: the flight is unavoidable, and getting back to the UK would definitely be safer than spending time in the airport or roaming around Egypt, even at (albeit negligible) background risk levels.

'There is a specific threat'

It's possible the intelligence related to a specific threat against certain flights in a certain area or at certain defined times. If this is so, then the extent of increased risk is impossible to guess, but if the intelligence were credible enough it could easily justify the temporary travel ban.

Benefits and Costs

Scarier than a terrorist

The decision may not entirely have been driven by short-term risk minimisation. The UK government may hope that the Egyptians will be pressured into improving airport security, for a longer-term gain. But the decision is also costly - for the stranded people, for the Egyptian tourist industry, and (one assumes) for the Egyptian government in terms of increased security spending. The reduction in risk would have to be very significant or very long term, if this were the only goal of the UK's decision. Since 1970, an average of around 60-70 aircraft passengers a year have been killed in terrorist attacks worldwide. If we include indirect deaths from airborne terrorism (e.g. the people in the World Trade Centre) then this average figure would roughly double, but would still be lower than the number of people killed each year by either hippos, lions or buffalo. In other words, the risk from aviation terrorism is already so low that there are probably few gains left to be had from improved security. 

Thursday, 5 November 2015

Cause-Effect diagramming in focus

Another example of a popular and effective low-tech analytical tool is the 'Cause-Effect diagram' (aka Herringbone or Ishikawa diagram). As the name suggests they are appropriate for identifying and delineating possible causal relationships. Examples include the ways a plan might fail, the causes of an accident or bottlenecks in processes. As with the previous post examining SWOT analysis we won’t provide yet another ‘how to’ - there are lots of excellent guides out there, but to look a little further into what is going on when an analyst goes about using such a method, to add robustness to the approach and encourage adaptation.

So what’s going on? Lets examine some of the things that seem to be happening when this method is used. There seem to be a few mechanisms in play:
  • Selecting and specifying the subject. 
  • Choosing the framework - selecting top-level headings to suit the question.
  • 'Ideation' - identifying and recording 'factors' relating to the question
  • Sorting and categorising ideas
  • Turning the output into action

Specifying the question: As observed in the previous SWOT post, there is an important process of electing a manageable but useful scope to address. I won’t go into more detail on this (it is probably worthy of it’s own post). However selection of an appropriate framework of headings is an important element of this method which is informed by the scope/questions selection.

Choosing the framework: Various guides suggest different collections of top level headings (choose from 5 Ms or 8 Ms in manufacturing, 7 Ps in marketing or 5 Ss in service industry) to provide the first level of features. These top level headings provide broadly (woolly) defined causes, as additional layers of detail are added these causes become more clearly defined. These top level headings serve as a set of categories for sorting factors.

Naturally, it is important to choose headings which you consider likely to give you good coverage of the scope decided in the step above, so, for example, if human error is though to be a root cause of the error under examination then 'PEOPLE' (or other appropriate category) should be included. Be aware, that the framework may limit ideation: Any prescriptive set of categories may limit ideation, so, these stock frameworks should be used with discretion, they can provide useful suggestions of things to think about, but do not feel pressured to 'force' ideas in order to satisfy an empty heading and likewise, feel free to add more categories if those pre-selected don't fit your ideas.

Presenting this outline structure which assists ideation. There seems to be a mode of thought while 'generating' ideas which is assisted by considering a single branch (category) at a time. This reduction in scope, seems to make it less cognitively demanding in some way. It is easier often to come up with an answer a series of questions, "What 'people'/'environment'/'equipment'... factors caused this?" than to answer the more general "What factors caused this?".

Guides for this method often suggest asking the question “what are the causes for this?” or “why?” to prompt ideas. This seems to play on the power of the human mind for generating narrative style explanations for observed phenomena. The categories already present provide a subject for the guiding question, and the question prompts you to seize at available possible causes from experience or imagination. The method seems to be making good use of the Availability Heuristic to reduce the burden of ideation. However, watch out, the first thing to come to mind is not necessarily the most important factor. To assume this is to succumb to a natural and powerful bias (the 'dark side' of the heuristic). Try asking your self "Why else?" to generate more ideas on the theme until it becomes exhausted.

It’s clear that when you are ‘generating ideas’ (or perhaps, recalling things you already know) the visible categories help guide you to bits of your memory that contain useful/relevant concepts. It seems to me that this plays to the strengths of the associative nature of the human memory, you might ask yourself “What are the possible PEOPLE causes of this problem” and by recalling the PEOPLE concept in your mind there are other ‘proximate’ concepts which come readily to mind. And as with SWOT the diagram and pre-selected categories provide a handrail to both prompt and sort ideas.

The ideas already present as well as the categories themselves (and the associated 'conceptual baggage') provide a useful prompt to the brainstormers. In this sense it is similar to SWOT, as mental short cuts can be used to find which are similar factors and add them to the diagram with less cognitive load than if they were generated from fresh.

It is important that the question asked is relevant to the task. Furthermore, to prevent the output being ambiguous and confusing the question should be and consistent. Avoid mixing cause-effect questions such as 'why?' with other relationships. There may be a temptation to try to explicit capture temporal relationships - perhaps by asking "What came before?", If this in relevant and interesting, consider a different, separate analysis.

When a participant comes up with an idea, and then wants to place it on the board, they will generally already have a category in mind - the category was the prompt. The guiding question invites a critical analysis of the category, and from that an idea is generated.

In some cases, ideas come out which are not driven by the guiding question. In these cases there is a need to switch between ‘creative' and a ‘critical’ mode of thought to assign the idea to an appropriate limb. If they can’t find a suitable category to ‘hang’ the idea off, they may be dissuaded and assume that the idea is ‘wrong’ in some way. A good facilitator will invite people to put any ideas up, using the white space around the diagram, and worry about categorising/judging ideas later.  The categorisation scheme can be used to sort ideas by looking for parent classes that the thing in mind is NOT like, just as much as finding it an appropriate parent.  By elimination of candidates a home may be found.

Although this form of categorisation feels like concepts go though a process of gross simplification (and therefor data loss), this is only surface deep.  By associating concepts with a parent class they inherit a great deal of information in return. The association with other siblings, and a proper place in the hierarchy (e.g. level relative to other items) tells you a great deal, implicitly, about the item in the tree.

The order in which one chose to do things is again a matter of preference, but the the two ‘modes’ (creative/critical) of thought seem to occur at some point whatever approach the participants take to the task. So, mechanistically, it is very similar to a SWOT: Ideas are generated (or recalled) and then sorted. Where this process differs significantly from SWOT is the structure into which the ideas are placed. Consider two parts to the structure of the ideas:
  • Visual Structure of the diagram and 
  • Logical Structure of the guiding questions
Logical Structure: The guiding question is clearly a powerful aid in ideation, but more than this it provides a prototype for the relationship between all the factors captured in the diagram. By asking a guiding question such as 'why?' means that a level of ‘meaning’ woven into the output: information about the causal relationship between any ‘bone’ or branch and its subordinates is implicit in the relationship between the parent-child branches in the hierarchy. The link between any bone and the it's superior is the answer to the question: 'BRANCH...because...SUB-BRANCH'.

This is extra information, that the structure (layout) helps you visualise and capture. So, a doctrinally pure Cause and Effect diagram represents an understanding of sequential causality, and hence implicitly includes temporal information in the relationship between the concepts: the small bones must exist before their bigger parent bones. With each step ‘down’ from branch to sub-branch you are trading breadth of scope for specificity while moving back through time and sibling sub-branches share an implicit relationship via the parent branch’s broader meaning.

Visual Structure: The SWOT diagram invites only one level of categories, which is fixed at 4 all ideas must be made to fit into on of the ‘holes’. With this method we have potentially many more. For example, we might at the start, have 1 of 8 Ms to choose, but as we drill down adding more layers, we potentially have more and more sub-categories which in turn may have their own sub-categories. And if the situation arises that ideas don’t ‘fit’ into any existing categories, then new ones can be appended. So, in this way the visual structure is extensible both horizontally (number of bones) and vertically (number of levels of sub-bones).

The fish bone shape, with distinct hierarchy and slanting bones and a ‘head’ makes use of the semiotic 'baggage', there is a sense of ‘pointing’ forward from which the human reader infers the temporal/causal relationship. You are left with a sense that a number of sub-things lead to the higher thing. The completed diagram represents a set of stories which tell how various factors fell into sequence to cause the main focus of the analysis.

Keep in mind that the fish bone shape aids this inference. For some problems this may not be appropriate. For example if you are not looking for temporal or causal relationships between factors then you should question if the fish shape it appropriate - does the shape complement the logic of the question that drives the relationship between elements and their parents and children?

When completed and agreed as a group it represents a shared and accepted vision of events. This is often a useful consensus to build for subsequent group action.

Commonalities: What is clear is that there are certain common mechanistic activities which are present in both SWOT and cause-effect diagramming:
  • Specifying the question: Selecting a usefully broad, but answerable narrow scope.
  • Coming up with and recording ideas: Ideation, recall and capture.
  • Sorting ideas: Critical consideration and classification.
As we have discussed, it is important not to allow the structure and choice of categories to be too prescriptive. This is a theme which we shall examine further in the next post providing a critical examination of another common visual analytical method, Mind Mapping.

Monday, 26 October 2015

SWOT in focus

There are some highly-effective, simple, low tech, structured methodologies that are commonplace in intelligence and business analysis, such as SWOT, Cause-Effect diagrams (aka Fishbone diagrams), Mind Mapping and Concept Mapping.  However, sometimes these ‘analytical tools’ are wielded by users without giving much thought to exactly what kinds of problems they are designed to be used on. It can help analysts to think about these techniques in a more-fundamental way.  This helps them understand the possibilities and limitations of various common methods, and with sufficient command of these principles lets them customise and adapt them to be more appropriate for a particular question.

In a series of posts we're going to take a closer look at some commonly-used analytical tools. We don’t intend to provide a detailed description of how to use them as there are many excellent guides out there already. Instead, we'll de-construct them a little and consider why these techniques work, and what is going on when they are used. We'll suggest some ways they can be extended and adapted to suit particular analytical problems. We'll start with one of the most faithful tools in the toolbox of analysts from all sectors of business: the 'SWOT' (strengths, weaknesses, opportunities and threats) analysis.

What is SWOT?

SWOT is a simple but effective method of delineating and categorising the factors that will lead to the success or failure of an objective or set of objectives. We've found it particularly useful in the early stages of (for example) a technical delivery project but it can also be used when thinking about how third parties might behave. Although it's probably best done in a group (using ideas on Post-its around a whiteboard) it be just as effective done privately, or asynchronously and remotely via electronic means.

What’s going on in a SWOT?

SWOT analysis involves three types of judgement or input:
  • Specifying the actor and objective
  • Generating and capturing 'factors' influencing the actor's achievement of their objective
  • Sorting into one of the four areas on the SWOT 
Let’s look at these in turn.

1. Specifying the Actor and Objectives:  Failure to precisely define and agree on these will usually lead to woolly, unactionable output.  When we define the actor - the person or organisation whose strengths, weaknesses etc. we are interested in, who in most cases will be ourself - and objectives - the things that the actor is trying to achieve - we are choosing a scope for the question. It isn't possible to do a SWOT without having both an actor and an objective, since a feature of an actor or their environment can only be a strength, weakness etc. with reference to something they're trying to achieve.

Generally the narrower the scope of the actor or objectives the ‘easier’ the analysis will be (that is to say, it is easier to exhaust possibilities or at least the imagination of the participants), but there is limited scope sensitivity and usually a SWOT analysis will take about the same amount of time (30 minutes to an hour) regardless of how strategic or tactical its scale.  But the narrower the scope, the easier it will be to map the output to particular, actionable decisions or policies.  For example, in decreasing breadth, and increasing actionability:
  • Who: OurCompany Inc. Objective: Increase profits
  • Who: OurCompany Inc.'s sales division. Objective: increase sales to financial sector clients
  • Who: OurCompany Inc.'s 'BuyMeNow' sales team. Objective: increase sales of the 'BuyMeNow' product to financial sector clients by 20% in the next financial year.
As the scope narrows, the more relevant the outputs of the latter are going to be to the 'BuyMeNow' product developers and managers. By removing possible actions and factors from the possibility-space, narrowing the scope helps participants focus, and hence reduces cognitive burden. But it increases the risk of failure of imagination - of missing a potentially important factor that falls outside the SWOT's scope.

2. Generating and Capturing 'Factors': The most intensive component of a SWOT analysis is the generation of suggested 'factors' that actually or potentially help or hinder the actor's achievement of the identified objectives. Generating these factors in a group, and exposing them, has a number of benefits. First, there is the creation of a consensus and the identification of specific differences. Being explicit and open encourages focus on the relevance of particular factors with respect to the objective and, for example, challenges supposed 'strengths' that aren't clearly linked to any desirable outcomes.

Ideally, this element of the SWOT will yield lots of ideas pertinent to the problem, without introducing too much noise. It may also be the case that group workshops introduce a subconscious sense of competition that encourages people to generate more and more-creative ideas: this artificially-generated stress can be a powerful motivator. Finally, the simple act of recording ideas on paper (real or virtual) provides a useful audit trail for the subsequent work, and by doing it in a open group, accountability is shared.

3. Sorting ideas:  The SWOT process invites you to post a concept in one of four mutually exclusive areas. Categorisation is the placing of a concept within a framework of ‘higher’ concepts, which provide conceptually convenient groupings. Although in one sense categorical groupings are a gross simplification, they are also essential for understanding. The architecture of human cognition appears to involve the sorting of concepts into higher categories, and their division into lower ones, based on similar characteristics within each class.

Generating and sorting ideas is usually an iterative process, both cognitively and in collaboration with others. Ideas already on the board provide cognitive shortcuts by allowing comparison with other potential categorical siblings rather than on first principles: it is often easier to consider “does this idea I have feel more like the things in the S area that it does those things in the W area?” rather than to consider if the idea is a strength or a weakness in isolation.

In the end, you will have packaged up ideas in exclusive buckets which provide additional meaning to the ideas.  For example, there is an obvious implication that you should use strengths and exploit opportunities, while reducing the effects of our weaknesses and mitigating  threats.  So by simply classifying ideas, you have already decided the fate of them within a wider framework of action. If the SWOT analysis is about a third party, then you will have an idea of the kinds of things that party is likely to pursue, assuming they are broadly rationally self-interested.

Playing with the SWOT Analysis

Clearly, personal preference and problem-specifics will partly determine the best way to conduct a SWOT, but it can be easier to separate the generation and sorting of factors (steps 2 and 3) from one another. Switching between generating an idea and then immediately judging it and assigning it to a box involves transitioning between two different cognitive styles can be tiring, and may take longer. It's sometimes easier either to take each quadrant in turn, and use it to further narrow the scope of thinking, or to generate ideas until they ‘dry up’ and judge each in turn.

It's not necessarily worth investing too much energy in thinking about whether a factor should be in one quadrant or another. If in doubt ‘split’ the idea into more narrowly specified sub-ideas and then they will be easier to categorise. Nevertheless, categorisation can be harder than it feels like it should be, and so it sometimes helps to exploit the 2x2 nature of the grid by thinking not about which quadrant a factor belongs in, but where on each axis it should be placed.

If instead of strengths, weaknesses, opportunities and threats being primary categories, the axes are considered to represent first scales of control from ‘in your control’ (strengths and weaknesses) to ‘outside your control’ (opportunities and threats) and from ‘good’ (i.e. they do or would tend to help achieve your objectives) and ‘bad’ (they tend to work against them) then a new type of diagram can be generated.  This represents a more generic 2x2 matrix.
Instead of 'inside / outside control', 'actual / potential' or 'present / future' work well to help distinguish opportunities and threats from strengths and weaknesses. Another added advantage is that these axes can be considered continuous scales rather than discrete categories.  This invites an optional bolt-on judgement where individual ideas can be compared to see how relatively strong they are on each axis (e.g. is X being more helpful than Y) and their position on the scale adjusted appropriately.  This can, if used carefully, begin to indicate which ideas are the priority.

Through this substitution of the categories for continuous axes we can see that the SWOT may be considered a highly specific instance of a much (infinitely?) broader family of analytical tools: the 2x2 (matrix).  Very briefly, you could judge ideas on any set of axes which are conceptually orthogonal - whether these provide enlightenment is a up to the analyst. For example, here is a manifestation of the Dunning-Kruger effect for stakeholder analysis and communication strategies, based on the two axes 'understands' and 'thinks they understand'.
2x2 matrices are helpful in general because they force us to consider combinations of features that are sometimes overlooked (e.g. 'knowing' but 'unaware') and they encourage us to decompose high-level concepts into their lower-level determinants (e.g. 'strength' being a combination of 'in our control' and 'helpful to an objective').

Turning it into Action

The point of a SWOT analysis is not, of course, just situational awareness. Its primary function is to inform our action planning. If the analysis has been done properly, then action planning should flow very naturally from the output. One way explicitly to transition from a SWOT to a set of actions is to vote on the top (say) three factors in each quadrant, and then identify a specific action to take with each one: either to exploit (strengths and opportunities) or mitigate (weaknesses and threats).

Finally, if a SWOT is from the perspective of a third party, and particularly if their objectives are inimical to ours (e.g. if they are an enemy group or rival firm), we can plan our own actions using a trick called the 'Jolly Inversion', which broadly observes that their strengths are likely to map to our threats, their weaknesses to our opportunities, their opportunities to our weaknesses, and their threats to our strengths.


A SWOT analysis is a straightforward but powerful precursor to action planning, if done with a modicum of care and precision. It's part of a wider family of analytical methods that help minimise the cognitive load associated with generating and classifying ideas into higher-level families that invite similar responses. It forms a useful bridge between your objective and the actions that can help achieve it, and provides an audit trail for future reference.

Monday, 19 October 2015

Confidence and Probability: Summary

(This post concludes a series that begins here.)

Why is defining confidence so hard?

We all understand the idea of analytical confidence. Analyst and non-analyst alike, we have a clear sense that some probabilistic judgements are dodgy - rough guesses, stabs-in-the-dark, first impressions, better than nothing but not by much - while others are gold-standard - well-evidenced, based on robust modelling, the judgements of experts, the results of controlled experiments with solid protocols. We also feel like this distinction is important - legally, perhaps, with regard to situations such as the L'Aquila earthquake - but also morally, connected to the concept of 'epistemic responsibility', the notion that we ought to exercise diligence in ensuring our beliefs are justified and that it is wrong to make decisions based on dubious evidential foundations. 

Given the force of our intuition regarding confidence, why does it seem so hard for us to say what 'confidence' actually means? And if it's important for decision-making, how then can we communicate our sense of confidence in anything other than vague, circular, and possibly incoherent terms?

Theories of confidence

We identified seven theories, each of which provides an alternative interpretation of the concept of confidence. For each theory, we looked at three factors: how coherent it was, how much it accorded with analysts' usage, and whether it was relevant to a decision (and therefore whether it was worth communicating without redundancy). The conclusions are summarised in the table below.

One salient feature is that none of the theories gets top marks on all three criteria. The theories that are both coherent and decision-relevant - those that relate to the expected value and expected cost of new information - are not ones that analysts themselves subscribe to. The 'quality' theory, which suggests that confidence relates to the general soundness of the available information, looks the best overall but it is only tangentially decision-relevant. We are faced with two problems that may have different answers: what does 'confidence' mean, and what should 'confidence' mean? In order to answer them we need to take a short philosophical detour. First, we'll look at why we may not be able to trust intuition and usage as a guide to finding a coherent definition of 'confidence'.

Naive realism

Naive realism is the approach to understanding the world that we are born with. We open our eyes, and the world is just there - we experience it directly, and what we experience is how it actually is. It is only through philosophical and scientific investigation that we find the demarcation between 'us' and 'the world' is fuzzier than it first appears. Our ability to perceive things at all depends on complex cognitive software, and some of the elements of our experience, such as the differences between colours, represent features that despite having significance in our evolutionary environment, do not form a qualitative distinction from a physics standpoint. Red things and green things are different, but the redness and the greenness, if it can be said to be anywhere, is in us. Our natural inclination, though, is to perceive the redness and greenness as 'out there', as a property of the world that we are merely passively perceiving. But the idea that 'redness' and 'greenness' are properties of things is not as coherent a theory as the idea that 'redness' and 'greenness' are instead properties of the way that we and things interact.

Our 'default setting' of naive realism may also apply to our concepts of probability and confidence. The view that probability is 'out there' is an intuitive one, as we saw when looking at the 'uncertain probability' theory of confidence, but it doesn't survive careful examination. It may be the same with 'confidence' itself. We might intuitively believe that confidence statements are statements about the system we're looking at, but this doesn't mean that a coherent theory can be made along these lines. Confidence might instead be better explained as a feature of us - of the payoffs from our decisions, or our personal information-set. Intuition cannot be relied on as a guide to what drives our intuitive sense of confidence.

Could our sense of 'confidence' be innate?

Questions of confidence generate strong and consistent intuitive responses, but as we've seen, the drivers of those responses are opaque to us. This is a clue that the notion of 'confidence' might be something rather fundamental for which we have evolved specialised computational software. By analogy, we have evolved to see red things as distinct from green things. As eye users, we don't need to know that the distinction in fact corresponds to differences in photic wavelengths. It is experienced as fundamental. This is because the distinction is so useful - for, say, finding fruit or spotting dangerous animals - that evolution has shaped our cognitive hardware around the distinction. The problem-space - the evolutionary environment and the decisions we need to make within it - has shaped our perception of the world, so animals with different problems see the world differently

But how can ascription of analytical confidence possibly correspond to any problem faced by ancestral creatures? Isn't the application of a confidence judgement to a probabilistic statement an extremely-artificial, rarefied kind of problem, faced by desk-bound analysts but surely not by tree-dwelling primates? 

Well, perhaps not. There's no reason to believe that ancestral primates played cricket, but the skills involved - motor control, perception, throwing projectiles - all have plausible mappings to the evolutionary environment. If there is a common theme to the foregoing discussions about confidence, it is one that relates to the justification for a decision: the extent to which we should act on the information, rather than continuing to refine our judgements. This certainly is a fundamental problem, one faced not just by humans but by all animals and indeed systems in general that need to interact with their environment.

The universal analytical problem

James Schlesinger
"Seldom if ever does anyone ask if a further reduction
in uncertainty, however small, is worth the cost..."

Other than in the most simplistic environments, animals do not face the kind of static decision-problems, with a fixed information set, that decision-theory primers tend to focus on. For most organisms, information is a flow that both affects and is affected by their behaviour. There is usually a tension between gathering information and acting on it. Collecting information is normally risky, costly and time-consuming. But the benefit of more information is reduced decision-risk. This means there is a practical engineering trade-off to be made, and one that is expressible in a simple question: "when do I stop collecting information, and act on it?"

Even in very simplistic models, 'solving' this trade-off - finding the optimal level of information - is often mathematically intractable (because probability has a broadly linear impact on payoffs but is affected logarithmically by information). Our response to dynamic information problems is therefore likely to be heuristic in nature, and could well depend on ancient cognitive architecture. 

This means that if confidence is (as seems likely) something to do with the relative justifications for 'act now' versus 'wait and see', then it is entirely plausible that in making confidence judgements we are using the same cognitive machinery that we might use to, for example, decide whether to continue eating an unfamiliar fruit, to keep fishing in a particular stream, to make camp here or press on to the next valley, to go deeper into the cave, to hide in the grass or pounce. However it is experienced, optimal decision-making must involve consideration of the relative value, and relative cost, of further information-gathering, and of the risks associated with accepting our current level of uncertainty and acting now. A theoretical description of optimal decision-making in a dynamic information environment will utilise exactly these concepts, and so it shouldn't be surprising if we have cognitive software that appears to take them into account. Whether or not this is where our notion of 'confidence' comes from can only be a guess, but it's a satisfying theory with a robust theoretical foundation.

Should our definition of confidence capture intuition, or replace it?

Broadly speaking, there is a significant gulf between our intuitive concepts of probability and confidence, and what we might consider a coherent theoretical treatment of them. Crudely, we tend to think of probability as inherent in systems, whereas it is more-coherently thought of as a property of evidence. But we tend to see 'confidence' as a feature of the evidence - of its quantity or quality, say - whereas it might more-coherently be thought of as a function of the evidence we don't (yet) have, and plausibly of the importance of the decision that it informs. What, then, should we do in designing a metric for capturing confidence, and communicating it to customers? 

This is not by any means a unique problem. Many precise concepts start out pre-scientifically as messy and intuitive. In time, they are replaced by neater, more-coherent concepts that are then capable of supplanting and contradicting our naive beliefs. Through such a process, we now say that graphite and diamond are the same thing, that whales are mammals, that radiant heat and light are degrees on a scale, that Pluto is not a planet and so on. 

Photo: Steve Johnson

A similar thing has happened to the concept of probability. Seventeenth-century attempts to quantify uncertainty wrestled with the mathematical laws governing the outcomes of games of chance, but over several centuries these have been refined into a small but powerful set of axioms that have little superficial relevance to games involving dice and playing cards. The interpretation of probability has undergone a parallel journey, from the 'intuitive' concept that uncertainty resides in (for example) dice and coins, to the more-coherent notion that it is a feature of information. The intuition is a stepping-stone on the path to scientific precision.

How shall we define 'confidence'?

Over the course of the last few posts, we've reviewed a number of proposed systems for defining confidence. Mostly these are theoretically-weak, and often simply provide an audit trail for a probability rather than delineating a separate, orthogonal concept of the kind that we strongly believe 'confidence' must consist of. But they have intuitive appeal, probably because they were designed and drafted by working analysts rather than by information theorists. We have a choice to make, roughly between the following definitions:

Confidence-I: 'Confidence' measures the quality and quantity of the information available when the probabilistic judgement was formed. High-confidence probabilistic judgements will be founded on large bodies of evidence, repeatable experiments, direct experience, or extensive expertise. Low-confidence probabilistic judgements will be founded on only cursory evidence, anecdotes, indirect reports, or limited expertise.
Confidence-II: 'Confidence' captures the expected value of further information. High-confidence probabilistic judgements are associated with a high cost of information-collection, low-risk decisions, or low levels of uncertainty. Low-confidence probabilistic judgements are associated with a low cost of information-collection, high-risk decisions or high levels of uncertainty.

'Confidence-I' is easy for analysts to understand and to apply. But it adds little to what is already captured in the probability, and it doesn't directly tell the reader what they should do with the information. 'Confidence-II' will make little sense to most analysts, and will require training, and the development of robust heuristics, to understand and apply. But it says something meaningfully distinct from the probability of the judgement to which it applies, and carries the immanent implication that low-confidence judgements ought to be refined further while high-confidence ones can be acted on.

We hope that this series of posts has demonstrated a number of things: that 'confidence' is an important concept distinct from probability, that intuition is not a reliable guide as to its meaning, and that it can be made meaningful but only counter-intuitively. Is an attempt to move towards a meaningful Confidence-II-type definition worth the cost? Will it add more than the effort of getting there? To an analytical organisation with enough vision, capability, and commitment to rigour, perhaps it would at least be worth experimenting with. 

Thursday, 10 September 2015

Confidence and Probability Part 10: The 'Expected Cost' Theory

The 'ignorance' theory of analytical confidence proposes that it is negatively related to the amount of information which you think exists, but which you haven't seen. The idea is that if you've assimilated a high proportion of the available information, you'll have more confidence in your probabilistic judgements than if you've only assimilated a small proportion of it. We've seen that this isn't a coherent idea, since measures of information (the converse of ignorance) are either encapsulated by probability (the information-theoretical definition) or are vulnerable to arbitrary manipulation using redundant material (as with the 'in-tray' model).

The 'expected value' theory that we looked at last time is an attempted refinement of the 'ignorance' theory that relates confidence not to how much information is as-yet untapped (this is implicitly captured by a statement's probability), but to how valuable that information is likely to be, which depends on current uncertainty and the benefits and risks associated with the decision that needs to be made.

The 'expected cost' theory, which is the last one we'll be looking at in this series, adds an extra dimension to this theory by incorporating the expected cost of information. The idea is that if new information is relatively costly, we will have more confidence than if new information is relatively cheap. This theory captures the same intuition driving the 'ignorance' theory - that confidence is about analytical 'due diligence' - but in a more-sophisticated way. How does it stand up to closer examination?

Is it coherent?

It is meaningful to talk about the expected cost of new information, provided we are precise enough in our definition of 'information'. The 'units' that information theory deals in are called 'bits' (a number of exotic-sounding alternatives have been put forward, including the Hartley, the crumb, and the doublet but they all do fundamentally the same thing). A bit of information is the amount that would, if received, effectively double (or halve) the odds ratio of a hypothesis. We can therefore talk about the 'expected cost per bit' of future information and mean something both real and distinct from probability.

Wikipedia: the low-hanging fruit of the information orchard

'Cost' does not always (or indeed usually) have to mean financial cost. Collecting information has an indirect time cost, and may have 'psychic' costs - the technical term for the unpleasant mental effort involved in consciously assimilating and processing information. The cost of information is not, in practice, usually constant: each unit is likely to cost more than its predecessor, assuming information is being collected in a roughly sensible way so that the low-hanging informational fruit (Wikipedia, the paper, what your colleague thinks) is picked before the harder-to-reach stuff (academic journals, paywalled data, secret intelligence).

Additional bits of information push the probability of a hypothesis
closer to 0 or 1, but at a diminishing rate. The two lines are
reflections of one another, because the probability of something
being true and the probability of it being false must add up to one. 
The rising marginal cost of information (per bit) is mirrored by the diminishing marginal value of information (per bit). As we saw when we looked at the 'expected value' theory, information exhibits diminishing returns in terms of the expected value of a decision. Adding additional bits of information changes the (decimal) probability of a hypothesis by ever-decreasing amounts as it approaches either 0 or 1. Since expected rewards (or costs) of a decision are (more or less) linearly-related to the decimal probability, each subsequent bit of information is worth less than the previous bit.

This means that if we bolt the 'expected value' and 'expected cost' theories together, we get a theory of confidence that turns it neatly into an optimal information-searching metric. Optimal information searching means continuing to collect information until the benefits of continuing to do so are outweighed by the cost; at that point, you should make a decision. If 'confidence' is higher when the expected value of new information is low, or its cost is high, we will tend to see confidence rising as new information comes in, because of diminishing returns and rising costs, until we are at a point where further investigation would be counter-productive.

The 'expected cost' theory also has an appealing link to a metaphysical distinction, often made by intelligence analysts, between 'secrets' and 'mysteries'. 'Secrets' are things that someone knows (but not you), while 'mysteries' are things that no-one knows, and in broad terms, intelligence analysts feel that their job is to unearth or make inferences about secrets, but not necessarily to worry about mysteries. This distinction, of course, has nothing other than a very superficial foundation: what makes information contained in a human brain any more special than that contained in any other system? But when situated in a more general theory about the cost of obtaining information, it begins to make some sense; assuming it's relatively-easy to get information out of a human (compared to, say, getting it out of seismic activity or the economy), the distinction between secrets and mysteries maps neatly onto the distinction between cheap information and expensive information.

Does it accord with usage?

In general, analysts seem not to think that their appraisal of confidence is connected to the expected cost of further information.  One of the questions in our survey was designed to test the connection between consequence, cost and confidence; it concerned the possibility of a bridge failure in either an isolated or a busy location, and one of the variables was the cost of additional information (a quick site visit compared to an expensive structural survey). The consequences mattered: confidence was lower when the bridge was on a busy commuter route. But the cost made no difference to people's confidence assessments. Just as tellingly, no-one proposed a cost-driven measure of confidence when given the opportunity to define it in their own words.

Is it decision-relevant?

The expected cost of further information is decision-relevant. As laid out above, it is one half of the equation defining optimal information searching (the other being the expected value of that information).


The 'expected cost' theory of confidence is both decision-relevant and conceptually coherent. Its only drawback is that nobody believes that it's actually what they mean by 'confidence'.

In the next - and final - post in this series, we'll summarise the arguments and discuss what they mean for the prospects of a standard, cogent, workable metric that could be used by analysts to communicate confidence to customers alongside their probabilistic judgements. 

Wednesday, 12 August 2015

The Impact of Information on Probability

In this post we alluded to the fact that information tends to push probabilities towards 0 or 1. If you're not familiar with information theory, this might not seem obvious at all. After all, when we say something is 10% likely, on one occasion out of ten it will turn out to be true - further information will push its probability away from 0, towards 50%, and then up towards 100%. New information might then push it back again. Information seems to be able to push probabilities around in any direction - so why do we say that its effect is predictably to push them towards the extremes? This post zooms in on this idea, since it's a very important one in analysis.

To start with, it's worth taking time to consider what it means to assign a probability to a hypothesis. To say that a statement - such as "it will rain in London tomorrow", "Russia will deploy more troops to Georgia in the next week" or "China's GDP will grow by more than 8% this year" - has (say) a 10% probability implies a number of things. If we consider the class of statements to which 10% probabilities are assigned, what we know is that one in ten of them are true statements, and nine in ten are false statements. We don't know which is which though; indeed, if we had any information that some were more likely to be true than some others, they couldn't all have the same probability (10%) of being true. This is another way of saying that the probability of a statement encapsulates, or summarises, all the information supporting or undermining it.

Now let's imagine taking those statements - the ones assessed to be 10% probable - and wind time forward to see what happens to their probabilities. As more information flows in, their probabilities will be buffeted around. Most of the time, if the statement is true, the information that comes in will confirm the statement and the probability will rise. Most of the time, if the statement is false, the information that comes in will tend to disconfirm it and the probability will fall. This is not an empirical observation - it's not 'what we tend to see happening' - but instead it follows from the fundamental concepts of inference and probability. It means that things that are already likely to be true are more likely to be confirmed by new information, and things that are already likely to be false are more likely to be disproved with more information.

This means that most of the '10%' statements (the nine-out-of-ten false ones, in fact) will on average be disproved by new information, and the others (the one-in-ten true ones) will on average be confirmed with new information. By definition, this isn't a predictable process. It's always possible to get unlucky with a true statement, and receive lots of information suggesting it's false. It's just less likely that that'll happen with a true statement than with a false one. And the more information you get, the probability that it's all misleading becomes vanishingly small.

But we need to be careful here. When we say that most statements assigned a 10% probability will be disconfirmed with new information, we're not saying that, on average, the probability of '10% probable' statements will fall. Far from it: in fact, the average probability of all currently '10% probable' statements, from now until the end of time, will be 10%. Even if we acquire perfect information that absolutely confirms the true ones and disproves the false ones, we'd have one '100% statement' for every nine '0%' statements - an average probability of 10%. But as time (and, more pertinently, information) goes on, this will be an average of increasingly-extreme probabilities that approach 0% or 100%.

Perhaps surprisingly, we can be very explicit about how likely particular future probability time-paths are for uncertain statements. If we assume that information comes as a flow, rather than in lumps, the probability that a statement's probability will rise from p0 to p1, at some point, is rather-neatly given by p0/p1. For example, the probability that a statement that's 10% likely will (at some point) have a probability of 50% is (10% / 50%) = 20%. Why? Well, we know that only one in ten of the statements are true. We also know that for every two statements that 'get promoted' to 50%, exactly one will turn out to be true. So two out of every ten '10%' statements must at some point get to 50% probable - an actually-true statement, and a false fellow-traveller - before one of them (the true one) continues ascending to 100% probable and the other (the false one) gets disconfirmed again. (The equivalent formula for the probability that a statement will fall from p0 to p1 is (1-p0) / (1-p1).)

The time paths of probabilistic judgements are surprisingly predictable
We can say a surprising amount about the future paths of probabilistic
judgements. Unfortunately none of it is 'useful' to decision-makers
because all of these possibilities are encapsulated within the
probability itself.
This may not seem intuitive at all. In fact it might seem barely believable. But it's embedded in the concepts of probability and evidence. Information will, in the long run, push probabilities towards the extremes of 0% and 100%. Unfortunately, we don't know which statements new information is going to push one way or the other - if we did, we'd have the information already and it would be incorporated into our probabilities. Assuming, of course, we were doing things properly.

Monday, 10 August 2015

Confidence and Probability Part 9: The 'Expected Value' Theory

So far, in the quest to understand what statements of analytical confidence actually mean, we've appraised five separate theories on the basis of coherence, alignment to actual usage, and decision-relevance. The last two theories relate to the economics of information - its value, and its cost.

It is not widely understood, even by people in the information business, that the value of information can be quite precisely defined in terms of its impact on the outcomes of decisions, and in particular on risk. The 'expected value' theory of analytical confidence is that it captures this value and describes how useful more information is likely to be. It says that if new information is likely to be of low value, you can have higher confidence than if new information is likely to have high value. In this respect, the 'expected value' theory is a refinement of the 'ignorance' theory, but instead of relating confidence to the amount of unknown information (which is probably an incoherent idea), it relates it to its value.

Is it Coherent?

Information has value, ultimately, because it makes us more likely to choose a course-of-action matched optimally to our circumstances. Exactly why information adds value in this way follows from its intimate link to probability, discussed here. More information has the effect of - or rather, is the same as - pushing the probability of an unknown closer to 0 or to 1. So more information leads to greater certainty about outcomes. Where these outcomes are important to a decision - e.g. the prospect of rain, to a decision to take your umbrella with you to work - this has the effect of lowering risk.

More information means less chance of a decision-error

'Risk' has been defined in a large number of ways, varying from the woolly and circular to the relatively-robust. One simple definition, which works well for most purposes, is that a 'risk' is a possible outcome that, if it were to occur, would mean you'd wish you'd done things differently. If you take out insurance, the risk is that you don't actually need to make a claim. If you don't take out insurance, the risk is that you do. Risks don't necessarily represent decision errors. Risk, understood in this way, is inherent in decision-making under uncertainty: given that the uncertainty (whatever it is) is relevant to your decision, there will always be the possibility of being unlucky and retrospectively wishing you'd made a different choice.

This is where information comes in. Information pushes probabilities closer to 0 or 1. On average, this will tend to reduce your exposure to risk, even if it doesn't actually change a decision. For instance, let's say you are considering getting travel insurance for your camera, which is worth £500. This would cost £50. But you think there's only a 5% chance it'll be stolen or lost, so you decide not to get the insurance. Now suppose you receive information about your destination which suggests that crime is almost non-existent there, which means you revise your estimated probability of loss down to just 2%. This doesn't change your decision - you're still not going to take out insurance - but your exposure to risk has fallen from an average value of -£25 (5% of £500) to -£10 (2% of £500). In other words, you're better off, on average, as a result of the new information, even though it didn't change your behaviour.

We're not used to thinking of information as risk mitigation, but it has exactly the same effect. This is wherefrom information derives its economic value, and it's the basis for the 'expected value' theory of analytical confidence. The key idea is that if we expect new information to be of relatively high additional value, confidence will be low - because we ought not to make a decision yet, but to collect more information. But if new information is likely to be only of low value, confidence will be high, since further analysis is not likely to add any additional value to the decision-maker.

What determines the expected value of further information? The maths is a bit convoluted, but there are two key factors: current uncertainty, and the magnitude of the risks. The closer you get to a probability of 0 or 1 with your key uncertainties, the less valuable, on average, further information is likely to be. And, perhaps intuitively, the more expensive the risks and benefits of your decision are, the more valuable (all else being equal) further information is likely to be.

In summary, the notion that confidence relates to the expected value of further information has a sound theoretical basis.

Does it accord with usage?

Looking at our survey results, confidence was affected by relative magnitudes of risks within questions, but there was no consistency about the levels of confidence expressed between questions. For example, assessments about the authenticity of a £10 watch attracted greater confidence levels than similarly-based assessments where the watch was worth £800. But the absolute levels of confidence expressed were not dramatically different to questions involving the potential collapse of a bridge, or from one about the cessation of a rural bus route. Likewise, as we covered here there was no significant observed relationship between expressed confidence and current levels of uncertainty. 

Turning to the qualitative definitions of 'confidence' supplied by survey respondents, there was limited support for the idea that confidence is related to expected value of future information, apart from one or two references to the 'ability to make a decision' and to 'knowing when not to make a decision'. In general, analysts do not think that confidence is related to the characteristics of the decision their analysis is supporting, nor to current levels of uncertainty, and seek to root it instead in characteristics of the evidence base or assessment methods, even where (as we've seen) these ideas might be hard to define coherently.

Is it decision-relevant?

Expected value of information certainly is decision-relevant. It is an important factor determining whether it's optimal to make a decision immediately, or to defer and collect more information first. However, it's only one half of the information you need to make this judgement definitively. The other half is the expected cost of further information. This is the final theory of confidence, that we'll be looking at in the next post.


The 'expected value' theory of confidence is both decision-relevant and coherent. However, analysts' assessments of confidence are not heavily influenced by it and it is not widely represented in the kinds of things analysts put forward for their proposed definitions.

In the next post we'll cover the final theory of confidence - that it captures the expected cost of further information.

Thursday, 16 July 2015

Confidence and Probability Part 8: The 'Prior Weight' Theory

Does analytical confidence measure the extent to which an assessment is based on 'background information' rather than case-specific data?

This is a tempting idea. There seems to be a qualitative difference between the following assessments:

"The probability of rain in Las Vegas tomorrow is 7%, because it rains on average there 25 days a year."


"The probability of rain in London tomorrow is 7%, because satellite imagery suggests that the weather front moving across the Atlantic is heading for Scotland and the north of England."

The first of these is based on a simple frequency, which by itself is a perfectly good approach to estimating a probability in the absence of any other information, but it doesn't take into account any specific information including about the time of year. The second statement seems to be based on some situation-specific information. We might therefore have more confidence making the second kind of statement than the first.

In a previous post, we considered the possibility that this kind of confidence measures having more information. This theory is problematic in a number of ways, most particularly in that there are no powerful alternatives to probability as a measure of information, implying that in both cases above (where the probability is 7%), the information content would the same.

The 'prior weight' theory is a subtle alternative to the 'information' theory of confidence. According to the 'prior weight' theory, confidence is associated not with quantity of information, but with the extent to which the information is subject-specific, rather than about general, background frequencies. The 'prior weight' theory therefore rests on a meaningful distinction between these two types of information.

Is it coherent?

There are a few possible ways of attempting to demarcate 'background' from 'specific' information, but we'll consider two here: prior v posterior information, and frequency v belief. These are fairly technical distinctions, and if you're not familiar with the slightly-more arcane elements of probability theory, you might want to skip over this part of the discussion.

Prior / Posterior

The prior / posterior distinction comes from the theory of inference. The prior probability of a hypothesis summarises all the information you've already accounted for up to a certain point. When you get new information, it's added to the pool and your probability is transformed according to the relative likelihood of your having received that information under the hypothesis compared to its alternatives - according to Bayes' Theorem, in other words. The diagram we used in a previous post illustrates this process:

The trouble with this theory is that the distinction between 'prior' beliefs and new evidence is a purely practical one used to describe the impact of new information. One would end up at the same beliefs considering the evidence in a different order, or all at once in a single bite. The prior / posterior distinction does not correspond to a fundamental distinction between types of evidence.

Frequency / Belief

The frequency / belief distinction tries to draw a divison between evidence derived from statistical frequencies, and that derived from probabilistic judgements. This is a distinction which echoes the debate between Bayesians and frequentists, and rather tries to accommodate both schools in the same faculty. The idea is that 'background' information is derived from statistical frequencies, while 'specific' information is derived instead from Bayesian-style 'beliefs' that may not be statistical in nature. In the Las Vegas example, we were effectively picking a single day (tomorrow) from a sort of bag of days of which 7 out of 100 were rainy ones. In the London example, we were using a set of information that only applied to the specific tomorrow in question, and which had never been seen before nor would be likely to be seen again. In other words, 'specific' information is of a kind such that there is a reference-class consisting of just one situation, to wit the one you are in.

The problem with this approach to dividing 'background' and 'specific' is rather involved, but it boils down to this: it's always possible to decompose supposedly-'specific' information into a conjunction of individual items of information that are associated with statistical frequencies. In fact, if we couldn't, it wouldn't be possible to form any probabilistic judgements about the situation at all. For example, suppose we are thinking about a fingerprint left at the scene of a crime. It is a unique fingerprint, never before seen and never to be repeated. But our ability to reason about it, and its relationship to (for example) other fingerprints taken elsewhere or in our database, depends on our being able to decompose the 'fingerprint' into a number of more-abstract features, such as its size, or distinctive elements in its pattern, that allow us to make comparisons with other similar objects.

To be considered data at all, in other words, superficially-unique 'specific' pieces of information must be capable of being treated as composites of several 'background' pieces of information. So the distinction between 'frequency' based information (with large reference classes) and 'belief' based information (reference classes of 1) is not supportable on a theoretical level. 

Other ways of distinguishing 'background' from 'specific' information

There may be as-yet unidentified, coherent ways of distinguishing background information from specific information, but it seems unlikely, given the potential usefulness that any such division would have in (for example) legal or medical contexts - someone would have thought of it by now.

Does it accord with analysts' usage?

The short answer is: not in general. None of the respondents in our survey advanced it as a putative definition of 'confidence'. It's included here because of personal experience working with intelligence analysts, some of whom make use of a working distinction between 'background' evidence and 'specific' intelligence, and for whom the link to 'confidence' seems palpable.

Is it decision-relevant?

Would the distinction between 'background' and 'specific' information, even if coherent, have relevance to a decision-maker in addition to the probabilities those types of information supported? In common with other 'confidence' theories that seek to ground it in the nature of the evidence, the answer is 'no'. The reason, as before, is that the probability already summarises the strength of the evidence supporting a hypothesis, and optimal decision-making involves looking more-or-less only at outcomes and probabilities. It's rather like the distinction between a ton of lead and a ton of feathers: if what you're interested in is whether your truck will be able to carry it, the weight and not the composition is the important thing.


The idea that 'confidence' somehow captures the weight of 'background' compared to 'specific' data fails all three tests.

In the next post we'll look at our penultimate theory: the 'expected value' theory, which posits that 'confidence' measures the potential value of new information. This involves taking an economic approach to analysis, which seeks to understand its value added in terms of the impact it has on decision-making.