Saturday, 20 December 2014

Mandy Rice-Davies and the Non-diagnostic Evidence

Mandy Rice-Davies, who died yesterday aged 71, is known primarily for her pithy contribution to popular information theory literature.  When told in court that Lord Astor had denied having an affair with her, she replied simply: "He would, wouldn't he?"

The point is that if the conditional likelihood of any piece of evidence (Lord Astor's denial of having met Mandy) is the same (100%) under a given hypothesis (Lord Astor met Mandy) as under its negation (Lord Astor did not meet Mandy), this evidence will be non-diagnostic with respect to the hypothesis, as a straightforward implication of Bayes' Theorem.  Mandy Rice-Davies' formulation is of course superior in brevity.

Friday, 19 December 2014

Sony Hacker Threats; Risk Assessment Difficulties

In response to threats from possibly-state-led hackers, Sony has pulled 'The Interview' from cinemas, almost certainly guaranteeing a significantly higher viewership when it proliferates on the internet in leaked form.  It's the latest in a surprising sequence of events, with more no doubt to come.

The debate about the wisdom of this decision has naturally focused on the identity of the hackers, the ethics of responding to coercion, and the impact on free speech.  But what if we focus purely on the risk assessment element of Sony's decision?  Can we quantify the risk to public safety that Sony was hoping to mitigate?  Trying to do so exposes some of the more interesting and difficult elements to risk assessment when we have relatively-unprecedented developments.

The background risk from terrorist attacks to entertainment events is very low. According to the Global Terrorism Database (GTD), there are only around 18 attacks worldwide per year on targets in the 'entertainment / cultural / stadiums / casinos' category, with an average of fewer than 1 of these in the US.  Attacks in this category are slightly less deadly than average, with a mean of 1.7 deaths per attack.  

The US deaths-per-attack figure is significantly lower, although curiously the GTD doesn't include the 2012 Aurora cinema attack in Colorado (probably for definitional reasons).  Even including this attack, though, the average risk of death from terrorism in cinemas for a US citizen is around 1 in 500,000,000 per year (one death, on average, US-wide, every couple of years or so).  The average US citizen goes to the cinema about four times a year, so a trip to the cinema exposes an average American to about a 1 in 2,000,000,000 chance of death from terrorism.  That's about 100 times lower than the risk of dying in a five-mile round trip by car to get to the cinema in the first place.

Photo: Fernando de Sousa
A generally-safe place to be

That's under normal circumstances.  What about when a major film distributor has faced explicit, coercive threats from capable hackers that might or might not be backed by nuclear-armed states?  This is when things get difficult.  Robust approaches to risk assessment, particularly for low-probability events, usually start by building a reference-class - a set of relevantly-similar events that can be used to form statistical base-rates to which scenario-specific modifiers can be applied.  In this case, the reference-class is so small (approaching 0) that the specifics, and assumptions about them, dominate the estimate.

The hackers threatened attacks on a 9/11 scale if the film were screened.  If these threats were absolutely credible, the expected number of deaths on the film's opening weekend would be in the hundreds or possibly thousands.  If the threats are entirely empty, then the expected number of deaths from terrorism in the opening weekend would be more like the background level of around one one-hundredth of a death.

Whether the risk is at the background level, or whether it is at the "hackers' threat" level, depends on what intelligence you have and which assumptions you make.  In between are five orders-of-magnitude of uncertainty in terms of expected impact.  Did Sony make the right call?  Judging by reviews of advance screenings and even Sony executives themselves, the answer might be 'yes' whatever you think the risk was.  

Saturday, 13 December 2014

Peripheral Vision in Animals and Organisations

Vision-related tasks in animals are accomplished in an extraordinary range of ways. There are at least ten basic eye structures, each of which can be adapted along dimensions such as colour perception, field-of-view, photoreceptor density, field of focus and so on. Examining what an animal's eyes seem to be optimised for can lead to insights about the resource, threat and physical environment they face.
Apex predators have forward-facing eyes, while prey animals tend to have eyes either side of their heads to maximise field of vision. Horizon-scanning animals such as lions and horses have a horizontal visual 'streak' of high photoreceptor density, while birds and tree-dwelling animals, including primates, tend to have a broadly circular central 'fovea' where visual acuity is highest. Colour perception tends to be much more discriminatory among land animals than in those living in the sea, where the chromatic spectrum is significantly narrower due to absorption by water.

In humans, the make-up of our foveal and peripheral vision systems are quite distinct, suggesting that they are optimised to different tasks. Foveal vision is extremely narrow, very high-resolution, and dominated by colour-sensitive 'cone' cells that require high levels of light to function. In contrast, peripheral vision is accomplished with 'rod' cells that respond well in low light levels but lack colour discrimination.  

Rods and cones by angle from fovea (source: Wikipedia)

However, we are usually unaware of quite how narrow the angle of our high-acuity vision is. Only when we experience visual disturbances such as scintillating scotoma do we realise how difficult it is to perceive detail (e.g. for reading) outside of this tiny 'spotlight'. Most of our perception of the world is in any case the result of visual processing - thought to account for around a quarter of the brain's structure - and so our conscious acquaintance with the information our eyes are receiving is somewhat indirect. But we are primarily unaware of the fovea's narrowness, of course, because our head and eyes are highly mobile. Wherever we look, we see a detailed picture, and by-and-large the things we want to look at only rarely move faster than our eyes can catch. A foveal level of vision everywhere would be unnecessary as well as sacrificing perception in low light.

Instead of giving us high detail everywhere, human vision represents an efficient partnership between a small, mobile, highly-detailed foveal component, and a wide, low-resolution peripheral component. Peripheral vision's vital role is to identify objects of possible interest in order to cue examination by the fovea. Without peripheral vision to support it, the fovea would be completely debilitated, as the experience of tunnel vision sufferers attests.

Many organisations with analytical components have re-invented this division of responsibility, which aligns broadly to a distinction between hypothesis-generating and hypothesis-testing analytical tasks. Organisational peripheral vision involves identifying novel threats and opportunities, seeking new markets, and identifying new products and productive technology. Organisational foveal vision involves gathering data on known threats and opportunities, analysing existing markets, and exploiting existing products and technology.  Often, but not always, these tasks are conducted by different parts of the organisation.

The importance of peripheral vision is easy to underestimate though. By necessity, it does its job out of the spotlight. When analytical budgets get cut, it can look more attractive to cut horizon-scanning, threat-driven functions than to cut those functions dedicated to exploiting known profit drivers. But organisational tunnel vision carries existential risks for many organisations. Failure to spot emerging threats has arguably been a significant driver behind a host of disastrous business decisions, including General Motors' failure to adapt to the growing market for smaller cars, Blockbuster failing to acquire Netflix, Kodak's tardiness in promoting digital products, and Excite's refusal to buy Google in 1999.

The consequences for warning failure in defence and security can of course be more significant. A focus on known threats at the expense of novel ones may have lain behind failures adequately to anticipate and respond to developments such as the stationing of Soviet ICBMs in Cuba, the Iranian Revolution, or the September 11 attacks.

Although our everyday perceptual experience may relegate it to a supporting role, peripheral vision is essential for our survival, as an animal or an organisation.

Wednesday, 10 December 2014

Affine Transformations of Payoffs should not affect Decision-making

Behavioural economics has unearthed interesting features of decision-making that sometimes seem to violate rationality assumptions.  These include loss aversion - people's willingness to take risks to avoid losses, but to avoid risks to take certain gains - and the endowment effect, in which people put a higher price on assets they own compared to those they don't.

These findings don't necessarily violate rationality.  They might also support the hypothesis that people have complex objectives comprising many different factors.  For instance, if losing confers psychological or social costs in itself, additional to any material loss, taking risks to avoid losses might be optimal.

But where objectives are well understood - say in business, where (mostly) the objective of a firm is to maximise long-run profits - optimal decision-making won't be influenced by anything other than the outcomes associated with each possible course of action, their relative probabilities, and the organisation's appetite for risk.

One possibly-surprising feature of optimal decisions is that they are robust to affine transformations of payoffs.  What this means is that multiplying all payoffs by the same (positive) factor, or adding or subtracting a constant sum from all payoffs, should make no difference to the optimal decision.  This means that affine transformations make a good test of a proposed decision rule.  If all the outcomes were halved, or doubled, or all worth exactly £1m more than they are, the decision rule should always identify the same choice.  Inter alia, this is one reason that lump sums (like licensing fees or the poll tax) are considered to be the least distortionary types of tax.

Evidential Support is More Complex than 'For' or 'Against'

A naive view of probability pictures it as the result of a sort of tug-of-war between arguments for a statement and arguments against that statement.  If the arguments in favour of a proposition beat the arguments against, the probability will be high.  If it's the other way round, it'll be low.  Some critical thinking techniques, such as argument mapping or analysis of competing hypotheses, can promote this way of thinking, which is unfortunate because it's misleading.

When evaluating a hypothesis or scenario, the evidence cannot easily be split into 'for' and 'against' arguments for separate consideration.  Instead, all of the evidence needs to be considered as a whole.  The final probability of the target hypothesis is determined by the likelihood of all of the evidence conditioned under both that hypothesis and its alternatives, and is not straightforwardly a summation of parts of the evidence.

As an example, take these three statements:

A: "Jones lives in Eton College"
B: "Jones is more than twenty years old"
C: "Jones is female"

Taken alone, (A) is certainly evidence against (B).  Knowing that Jones is at Eton makes it highly likely that he's a boy or a young man.  But if you know (C), that Jones is female, (A) becomes very strong evidence in favour of (B), since it's highly likely she's a teacher or the wife of one.  

Items of evidence cannot, in other words, be treated in isolation from one another.  

Friday, 5 December 2014

Evidence of Absence

How much absence of evidence do you need for it to constitute evidence of absence?  Despite what the maxim says, all absence of evidence is evidence of absence, assuming that the evidence in question had a chance to show itself but didn't.

The question of when to conclude that something is absent is a surprisingly common problem.  When you are searching a suspect's house, when do you decide to give up looking?  When you are waiting for a lift, when do you conclude the lift is broken?  When you are searching satellite imagery for nuclear facilities, when can you assume they're not there?  When you are looking for ultrasound 'evidence' that a foetus is a boy, when do you conclude it's a girl?

These possible-needle-in-haystack problems (ones where we're not sure the needle is actually there) are all governed by the same underlying information process: in any given period of time, you have some probability of establishing a hypothesis with 100% certainty, but no chance per se of absolutely falsifying it.

It's in there somewhere... maybe

We can put numbers on the question by considering two search methods which represent extreme cases for favourable searching.  The first is when you exhaustively search the haystack, straw by straw, until the needle is found, in very much the way that the Mythbusters did.  The second is when you randomly choose bits of the haystack to search, so that at any given time you might be searching part of the haystack you've already ruled out.

The first process is governed by a simple likelihood function.  Assume it takes a hundred man-hours to search a large haystack for a possible needle.  The probability of not finding it if it's there is (100-t)%, where t is the search time so far.  (The probability of not finding it if it's not there is of course always 100%).  The second process - random searching with replacement - is governed by an exponential function, which is something that appears regularly in dynamic information problems.  Assuming the same search rate above, the probability of not finding the needle after t hours is (100-100e^(0.01t))%, where e is the natural logarithm 2.71828...

In the first process, you have a 50% chance of finding the needle, if it's there, after half the time.  In the second process you have a 50% chance of finding it after around 70% of the time.  If, when we start, we believe there's a 50% chance that the needle's in there at all, the graph of the probability it's there, over time, looks like this:

So intuition, which suggests systematic searching is better than random searching, is right in this case.  It is possible to imagine search strategies which are worse than random searching with replacement, which would involve being more likely to search areas you'd already looked at.  But in general, if it would take you time T to search the whole area, absence of evidence after time T will mean that absence is at least three times more likely, relative to presence, than it was when you started.

Sunday, 30 November 2014

A Simple Rule for Decisions under Uncertainty

Many everyday decisions under uncertainty are fairly simple in structure, and involve two possible options and a single scenario of concern whose probability is of interest.  Insurance is an explicit example here: when we are considering insuring an item against theft, we have to take into account the probability that it will be stolen.  Other decisions are similar, when analysed: the decision to take an umbrella to work will hinge on the probability of rain; the decision to change mobile phone tariffs might hinge on the probability that our usage will exceed a certain level; the decision to search a suspect's house might hinge on the probability that some diagnostic evidence will be found.

For decisions like these, it's helpful to know the 'critical probability' - the probability value above or below which your decision will differ.  There is a relatively easy way to derive this value, which doesn't require anything more than simple arithmetic.  The first step is to find two metrics which we can (for convenience) label 'cost' and 'risk'.

Under these kinds of simple binary decisions, there are four possible outcomes, each defined by a scenario-decision pair.  When considering insurance, for example, the four outcomes are (theft-no insurance), (theft-insurance), (no theft-no insurance), (no theft-insurance).  'Risk' and 'cost' (as defined here) are the differences between the values of the outcomes associated with each scenario.  In this case, these are the differences between (theft-insurance) and (theft-no insurance), and (no theft-insurance) and (no theft-no insurance).  It's easier to explain using a table like this:

When you've calculated 'cost' and 'risk', the 'critical probability' is calculated as follows:

We can illustrate this by running with the example of insurance.  Suppose we have a £500 camera that we are taking on holiday and which we are wondering about insuring.  We are offered insurance that costs £20, with a £40 excess.  We put the outcomes into the grid as follows:

According to the formula above, this gives us a 'critical probability' of 20 / 460, or about 4.3%.  This means that if the probability of theft exceeds this, we should take the insurance.  If the probability is lower, we should risk it.  

Incidentally, the labelling of the two metrics as 'cost' and 'risk', while convenient, is rather arbitrary and depends on the idea of there being a 'do nothing' baseline.  In general, thinking of one of your options as a 'baseline' can be harmful to good decision-making as it can stimulate biases such as the endowment effect.  It's best to think of the two things simply as values that help determine a decision.

Tuesday, 25 November 2014

Failures of Imagination: The Problem with 'Black Swan'

The term 'black swan', invented and popularised by Nassim Taleb, clearly fills a conceptual niche, given its popularity in the futures community.  Analytical failures can be broadly divided into two categories: failures of imagination, and critical thinking errors.  As used by Taleb, a 'black swan' is a failure of imagination in which, instead of a high-impact hypothesis or outcome being appraised and discounted, it is simply not considered at all.  Taleb's story is that until black swans were actually discovered, it was simply assumed that they did not exist.

"Before the discovery of Australia, people in the Old World were convinced that all swans were white, an unassailable belief as it seemed completely confirmed by empirical evidence." 
- The Black Swan, Nassim Nicholas Taleb

There are at least two reasons why the term 'black swan' might be a less than ideal term to use for this concept.  First, and perhaps most boringly, there is plenty of evidence that the concept of a black swan was considered well before the discovery of Australia.  Perhaps more conceptually problematic, though, is that the notion of a black swan can be generated fairly easily and algorithmically by simply combining known colours with known animals.  This makes it a less interesting sort of failure than the failure to consider objects with characteristics that are entirely unlike anything previously encountered.

Necessarily, it is not easy to think of examples.  Candidates might include the arrival of settlers to the New World, quasars, or Facebook.  In Flann O'Brien's remarkable novel 'The Third Policeman', the narrator hears a story about a box which contains a card that is of a completely unimaginable colour, one which sends anyone who sees it insane: "I thought it was a poor subject for conversation, this new colour. Apparently its newness was new enough to blast a man's brain to imbecility by the surprise of it. That was enough to know and quite sufficient to be required to believe. I thought it was an unlikely story but not for gold or diamonds would I open that box in the bedroom and look into it."

Smoking Guns: Binary Tests with Asymmetric Diagnosticity

A binary test is one which produces one of two outputs, which are usually (but not necessarily) thought of as 'positive' or 'negative'.  Metal detectors, pass/fail quality control tests, tests of foetal gender, and university admissions tests are all binary tests.

We often want binary tests to strongly discriminate between cases.  We'd like a metal detector to have a high probability of bleeping in the presence of metal, and a low probability of bleeping in the absence of metal.  Tests with these characteristics will tend to have a broadly symmetric impact on our beliefs.  If our metal detector is doing its job, a bleep will increase the probability of metal being present by some factor, and the absence of a bleep will concomitantly reduce it by a similar factor.  If the detector has, say, a 99% chance of bleeping when metal is present, and only a 1% chance of bleeping when it isn't present, the test will strongly discriminate between cases; a bleep will increase the odds of metal by a factor of 99, and the absence of a bleep will decrease the odds that metal is there by the same factor.

Not all tests are like this.  Some tests have an asymmetric impact on our beliefs.  Some tests for circulating tumour cells (CTCs), for example, have a strongly positive effect on the probability of cancer if detected, but only a relatively weak negative effect if they are absent.  According to this data, just under half of patients with metastatic colorectal cancer (mCRC) tested positive for CTCs, compared to only 3% of healthy patients.  Assuming this data is right, what does this mean for the impact of this test on the probability of mCRC?

Let us suppose that a patient's symptoms, history, circumstances and so on indicate a 5% probability of mCRC.  If a subsequent CTC test is positive, the probability would rise to around 30% - a change in the odds by a factor of about ten (roughly 20:1 to 2:1).  But if it came back negative, the probability would only fall to about 3% - a change in the odds by a factor of about one-and-a-half.  The impacts of positive and negative results are therefore asymmetric.

"Don't wait for the translation, answer 'yes' or 'no'!"

In the realm of security and law enforcement, tests of asymmetric diagnosticity are often called 'smoking guns', apparently in homage to a Sherlock Holmes story.  Perhaps the most celebrated example in modern times is the set of photos of Cuban missile sites, taken from a U2 spy plane, that was presented in the UN by Adlai Stevenson in 1962.  These photos made it near-certain that the USSR were putting nuclear weapons into Cuba.  But if the US had failed to get these photos, it wouldn't have proved that the USSR wasn't doing that.  Incriminating photos are asymmetrically diagnostic.

Evidence that is asymmetrically diagnostic forms an interesting and ubiquitous category.  The enduring but generally-misleading dogma that 'absence of evidence is not evidence of absence' is in fact only true (and even then only partly so) of asymmetrically diagnostic evidence.  Frustratingly though, it's easy to prove that tests like this must only rarely provide a positive result.  If positives were common, their absence would provide exonerating evidence - and we've assumed that isn't the case.  In other words, smoking guns are necessarily rare.  Not because the universe is contrary, but because of the fundamental nature of evidence itself.

Thinking in Odds

There are many ways to express something's probability.  Using the usual decimal method (0 to 1) is convenient for a number of reasons, and particularly when thinking in terms of risk and expected outcomes.  But it doesn't handle inference very well - inference being the process of incorporating information into one's probabilistic judgements.  The reason is that the impact of a piece of information on a decimal probability differs depending on what probability you start with.

Suppose you know that one of the two urns has been chosen randomly, so that the probability it's urn A is 50%.  A ball is now drawn from the chosen urn - it's black - and then returned to the urn.  The black ball has raised the probability that urn A was chosen from 50% to 75%.  Another ball is drawn from the same urn, and is also black.  This time, the black ball has only raised the probability that urn A has been chosen from 75% to 90%.  The information is the same, but the effect on your probability is different.  In fact the mathematical impact on probability is quite convoluted to express algebraically.

Thinking in odds makes things a bit simpler.  Drawing a black ball has an evidential odds ratio of 3:1 in favour of urn A; in other words, a black ball is three times more probable if urn A has been chosen than if urn B was chosen.  Starting with odds of 1:1 (i.e. 50%), the first black ball raises the odds to 3:1.  The next has a similar effect, raising the odds from 3:1 (75%) to 9:1 (90%).  The odds are raised by the same factor (three) in both cases, respecting the fact that the information was the same.

Expressing probabilities in terms of odds makes it easier to separate our base rate (the probability we started with) from the evidence (the new information that affects the probability).  For this reason, there is growing support for the use of odds to express the evidential power of scientific experiments, as an alternative to the perpetually-counterintuitive significance test.

Of course we shouldn't forget that decimal probabilities and odds merely express the same thing in two different ways.  The difference is in the way we respond to them cognitively.  Broadly, it's easier to use decimal probabilities for thinking about expected outcomes and decision-making, and odds for thinking about information and inference.  

Friday, 21 November 2014

How do you Prove a Decision is Bad?

McKinsey has a new report out evaluating the global harm from obesity, and the likely impact of counter-obesity policies.  They compare the costs of obesity - which they estimate at $2tn a year - with the costs of smoking, and war and violence, which they reckon similar.  To arrive at estimates like these with any semblance of certainty can only be misleading: they surely rest (as the report makes clear) on a web of inferences and assumptions that could probably be picked over to a considerable level of tedium.

But there is merit in examining McKinsey's premise, which is that obesity is clearly a net harm, thus implying a rationale for government intervention.  This is a superficially appealing premise, but it raises a number of difficult questions.  The traditional economic basis for government intervention is market failure.  This can occur for a number of reasons: if an industry is a natural monopoly, if information is expensive, if there are unmarketable external impacts (e.g. from pollution) and so on.  Government intervention against violence and war rests on a host of market failures including natural monopolies of defence suppliers but most particularly the lack of a 'market' in violence itself (the victims do not generally suffer voluntarily).  The case for intervention against smoking was (at least initially) driven by the negative externalities imposed by smokers on their co-habitees and neighbours.

It is harder to prove a market failure in the case of obesity.  The case for intervention seems to rest on the idea that people are making bad decisions about their diets, and need help to make better decisions through initiatives like smaller plates or restrictions on availability of junk food.  This is a very difficult thing to show.  It is demonstrably the case that people choose to eat food that leads to deleterious health effects.  But this behaviour is consistent with at least two hypotheses: first, that they are choosing rationally, and place a higher value on eating than health, and second, that they are choosing irrationally, and in fact place a higher value on health.  The observed behaviour simply does not allow us to sort between these two hypotheses.  

(Of course, obese people probably generally wish they were thin.  But then people who buy expensive handbags probably wish they could have the handbag and their hard-earned money.  Obesity is the cost of a certain kind of diet.   All activities have costs: their existence is not an argument for intervention.)

Is obesity like this stuck dog?

The problem of justifying government intervention with a claim of irrationality is germane to a wide range of policy questions.  But as there is no way to observe irrationality we should be particularly careful to examine arguments for intervention that rest on this claim alone.

Tuesday, 18 November 2014

Ways of Distinguishing Short-term from Long-term

The concepts of the 'short term' and 'long term' are used frequently by forecasters, analysts and organisations with a forward-looking remit.  But in my experience there is little coherence in their definitions.  Some organisations approach this issue by identifying a standard length of time - e.g. 'long term' meaning five or ten years or more.  These definitions are arbitrary.  In economic and decision theory, there are a number of definitions including the 'time taken to return to equilibrium' - a system-centric definition - and the 'time over which all inputs can be considered variable' - a decision-centric definition.

An approach which is perhaps more useful for forecasters is to think of the short-term / long-term distinction as relating to the types of information used.  Short-term forecasts primarily use information about the situation now.  Long-term forecasts primarily use information about base rates.  A short-term forecast will therefore be one that uses as a rationale information about what is happening now, while a long-term forecast will be largely invariant to the particular time it is being made.  This distinction is not, of course, a clear one: most forecasts will combine historical base rates with present-focused specifics.  But it is useful to think about the relative weightings placed on these two types of information as a measure of the extent to which it is a short- or a long-term forecast.

Waves for short-term forecasts, tides for long-term forecasts

Forecasting the weather in one minute's time will almost entirely involve information about what the weather is like now.  Forecasting the weather in one year's time will almost entirely involve historical data and very little information about what the weather is like now.  So in the context of weather forecasting, one minute is clearly short-term and one year is clearly long-term.

In more volatile systems, information about today will have less diagnostic value about the future than in more stable systems.  So long-term political forecasts look out far further than long-term weather forecasts.  An information-based distinction between the concepts of the short- and long-term captures these differences but in a way that is more powerful in that it encompasses other factors (such as the volume of information collected) as well.   

Thursday, 13 November 2014

Five Hypotheses to Explain a Correlation

There was a common type of study published yesterday, which linked the number of fast food outlets with nearby obesity and diabetes rates.  It's a common mantra in the field of methodology that 'correlation is not causation'.  This isn't strictly true.  First, it might be the case that causation is, on a fundamental sort of level, nothing more than a very strong set of correlations.  Second, even if correlation and causation are not semantically equivalent, it's often the case that correlation is extremely good evidence for causation, and this should never be ignored.  Having said that, if people can only remember one thing about 'correlation' from their stats classes, 'correlation is not causation' isn't a bad candidate to promote good practice and scepticism about claims.

When, as an analyst, you have established a correlation between two features of the data A and B, there are always five distinct families of hypothesis you should bear in mind to explain it:

1. A causes B.  Perhaps obese or diabetic people choose to live near fast food restaurants?

2. B causes A.  Perhaps fast food restaurants nearby encourage people to become obese?

3. A and B are caused by a separate factor C.  Perhaps poor people are more likely to be obese or diabetic, and fast food outlets tend to open in poorer areas?

4. The data are a coincidence.  Perhaps it's just chance that these two things occur together in the study data?

5. The data are wrong.  Perhaps diabetics are more likely to be diagnosed in urban areas with more fast food restaurants, and rural diabetics are just not being picked up?

An observed correlation, by itself, will provide evidence in favour of hypotheses in any of these categories.  Only additional features of the data will help you sort between them.

The website Spurious Correlations allows you to generate your own correlations from a number of data sources.  As an analytical exercise, force yourself to come up with causal hypotheses to explain them.

L'Aquila Earthquake Conviction Overturned; Analytical Responsibility

The seven earthquake experts who were convicted of manslaughter in the wake of the L'Aquila quake in 2009 have had their convictions overturned.  This will be generally welcomed by scientists and other analysts because of the concern that legal risks would deter honest enquiry.  At the time of the original conviction, medical physicist Professor Malcolm Sperrin was quoted as saying that: "if the scientific community is to be penalised for making predictions that turn out to be incorrect, or for not accurately predicting an event that subsequently occurs, then scientific endeavour will be restricted to certainties only and the benefits that are associated with findings from medicine to physics will be stalled."

This is undoubtedly a valid concern.  But should analysts be entirely exempt from legal redress?  The analyst's role is to inform decisions by identifying relevant outcomes and assigning probabilities to them, via collecting and processing information.  Analysts add value by giving decision-makers an accurate picture of their decisions' likely consequences, so that their decisions under uncertainty can be well-informed.  Like other practitioners - GPs, financial advisers, and engineers, for example - their work has a practical and (in theory) measurable impact.  It is difficult to argue consistently that analysts should as a matter of principle be entirely protected from legal responsibility for their work.

"You will destroy a great empire..."

In the UK, however, the idea of scientific or analytical responsibility in general has not been sufficiently tested for there to be a clear legal position.  The government has issued principles governing scientific advice for policy but these are primarily ethical and relate to the motives of the advisers - conflicts of interest and so on - without touching on any requirement for methodological rigour.  Is it possible to imagine a legal code for analysts, failure to adhere to which might lead to culpability for negligence?  There are it seems to me two main problems with any project to build a notion of responsibility for analysts that could operate legally: first, that of measuring an analyst's performance in the first place, and second, that of establishing that they had a causal role in any wrongs incurred.

The first problem is that of ascertaining the extent to which an analyst was doing a bad enough job as to be regarded as negligent.  It isn't a hopeless endeavour to try to measure some aspects of an analyst's performance, as the Good Judgment Project has demonstrated.  But there are other key aspects of analysis - in particular, the generation of novel hypotheses or identification of new scenarios - for which we must almost entirely rely on analysts' brains as black boxes.  Could failure of imagination ever be proved negligent?  It's difficult to imagine how, given the obscurity of this kind of creative analytical activity.

The second problem is that of establishing an analyst's impact on a decision-maker.  It is that any given decision depends not just on analysis but on a range of other factors, including the objectives of the decisionmaker, their appetite for risk, their resources and constraints and so on.  To prove civil negligence, a claimant must establish that their damage would not have happened but for the defendant's actions.  This would be hard to prove for most decisions.  To prove criminal negligence or malfeasance on the part of an analyst would seem to require a standard of proof that was higher still.

Having said all this, analysts of every kind want to have impact, and to be taken seriously.  The idea of analysis as a distinct from domain expertise is a new one, however, and to some extent we are at the birth of the profession.  There is room for a code of practice for analysis in general, even if building a legal framework around it would be difficult.  

Monday, 10 November 2014

Deception; Bats

As well as possibly having solved some interesting problems relating to investment in information-acquisition, bats have now been demonstrated actively to jam one another's sonar to interfere with hunting accuracy. This fact alone allows us to make some inferences about the economic constraints that bats face: for example, that food resources are relatively scarce, that there are limited returns from co-operation over hunting, and that thwarting another bat carries a relatively high individual return.

Animal deception is extremely common.  In all its forms, deception always relies on the same underlying principle: the replication of signals associated with a hypothesis, under circumstances where that hypothesis is false.  Sending 'false' signals in this way is usually costly, so in order to evolve (or to present as optimal to a decision-maker), deception must also have a sufficient chance to induce a decision error in at least one one other player in the same strategic game (i.e. someone whose decision can affect your payoff, e.g. a fellow bat).  This isn't a very demanding set of circumstances, hence the ubiquity of deceptive behaviour.

Johnny Chan vs Erik Seidel in the final of the 1988 World Series of Poker

Of course, the evolutionary (and strategic) response to deception is to increase investment in information gathering and processing.  But, as with the bats, it won't necessarily be possible to negate the impact of deception on your own decision-making, and even if it is possible, it won't always be worth the expense.  Poker presents an interesting case study.  Poker is complex enough that optimal strategies have only been explicitly derived for a few of its very restrictive forms.  But it is straightforward to show that, as with many games of asymmetric information, any such strategy must entail opportunistic and sometimes-random bluffing to mask the strength of one's hand. Optimal poker strategy will inevitably entail making mistakes: calling in a position of relative weakness and folding in a position of relative strength.  In poker, as with real life, being deceived doesn't mean you're playing wrongly.

"You know where you are with a complete liar, but when a chap mixes some truth with his yarns, you can't trust a word he says." - The Horse's Mouth (1944), Joyce Cary

Wednesday, 5 November 2014

Organisational Arationality

Aleph Insights' approach to analysing decision failure is based on a four-element model of an idealised decision.  These four elements are discrete decision-tasks: (i) the identification of objectives, (ii) the identification of resources, (iii) the identification of relevant outcomes and (iv) the assignment of probability to those outcomes.

For example, when you are deciding whether or not to take your umbrella to work, your objectives might include staying dry and minimising weight carried, your resources would constitute the carrying of an umbrella or not, the relevant outcomes would be the weather events that will affect your objectives, and their probability would be determined by whatever information you had at hand.

When these four elements are in place, they constitute a complete rationale for a particular decision.  They have distinct failure modes, each associated with particular cognitive and organisational characteristics, and carrying a range of predictable effects on decisionmaking.  For example, insufficient attention to the identification of outcomes - failure of imagination - is associated with organisational or cultural rigidity, and carries a risk of surprise.  Much has been written about these four main categories of decision failure.

But there is a fifth kind of decision failure that is not often discussed.  We might label it 'arationality'; it is where there are the phenomena of decision-making but an absence of any mechanisms actually to generate decisions from rationales.  This can still occur where every other component of the decision has been performed properly.

In the case of individuals it is impossible behaviourally (i.e. from the standpoint of an observer) to separate this kind of failure from any of the other failure types.  But arationality is arguably a greater risk when considering the decisionmaking of corporate persons such as companies, departments of states or states themselves.  In his analysis of the start of the First World War, Jonathan Glover writes:

"...for most people, the outbreak of war is experienced passively.  It comes like the outbreak of a thunderstorm.  Only a few people in governments take part in the discussions.  Negotiations break down: an ultimatum is issued.  The rest of us turn on the television news and find ourselves at war.  Often even the leaders who take the decisions are trapped in a war most of them do not want.  1914 was like that."  (Humanity, Glover 2001)

Organisational arationality is a sort of mechanical failure: a failure of a decision-making machine to produce the right output (the optimal decision) even when given the correct inputs.  We are comfortable thinking of organisations as people, and this is facilitated by our language ("North Korea has issued a statement..."), law, pyramidal hierarchies that lead to a single decision-maker, and by our cognitive metaphors (organisations have heads and arms).  But organisations are not people, even if people are their constituents, and sometimes the parts can all be functioning while the organisation fails.

Tuesday, 4 November 2014

Fun with the Visualisation of Uncertainty

The New York Times has a fun feature outlining a predictive model for today's Senate elections.  The most entertaining part lets you run a single simulation of the outcome by spinning wheels representing all the states.  This isn't just fun.  It's an excellent way of presenting uncertainty and some ways of handling it.  Each time you press the button, you are running one iteration of a 'Monte Carlo' model of the election.

A 'Monte Carlo' model is a way of generating a probability of an outcome by running a large number of simulations of it, and where for each simulation the uncertain variables are randomly generated anew.  It's a useful approach if there are many events that interact or combine to produce an outcome.  It's easier but not as satisfying as calculating a precise probability analytically, but many systems are sufficiently complex that this is not possible anyway.

Elections are relatively easy to present this way because each polity can be considered (at least to an extent) as an independent event, and there are no complicated interactions between results to consider since no result is in the information set of any voter.  So the Monte Carlo component of the model is quite straightforward to visualise.  (Although in this case the probabilities for each state seem to be fixed for each iteration, the underlying model probably has a far larger number of constituent variables of which these probabilities are themselves a simulated average.)

Inter alia, the use of miniature roulette wheels as a means of presenting probability allows readers to interact to an extent with an uncertain event in a way that might cement a more concrete understanding of what probabilities actually mean in terms of outcomes.  The evidence for this is currently limited but the approach has been considered in a number of contexts.  The science of data visualisation is relatively young though; we are a long way from consensus on the best way to visualise uncertainty to support decision-making most effectively.  Analysts should take an interest in experiments like the New York Times's as the future demand for interactive visualisations is only going to increase.

Monday, 3 November 2014

What do the UK's Terror Threat Levels Mean?

Since 2006, the UK government has published terrorism threat assessments in the form of 'threat levels'.  Originally only one overall threat level was published, but from 24 September 2010, different threat levels for Northern Ireland, Great Britain (both NI-related and domestic) and from international terrorism have been released.  Some explanatory wording is provided, which explains that (for example) the current international terrorism threat level, 'severe', means that an attack is 'highly likely', while the current threat to Great Britain from domestic and NI-related terrorism, 'moderate', means an attack is 'possible but not likely'.

However, these descriptive wordings are not associated explicitly with any probabilities.  This makes it difficult to apply the threat levels to decision-making.  If we don't know whether, for example, 'highly likely' means '1 in 100 per day' or '1 in 10 per day' then we can't easily use the threat level information to decide whether boarding a plane or travelling to the centre of London with our family is worth the risk.  To be fair to JTAC and the Security Service (who compile the assessments), the threat level is not supposed to affect our behaviour.  The Security Service says that:

"Threat levels in themselves do not require specific responses from the public. They are a tool for security practitioners working across different sectors of what we call the Critical National Infrastructure opens in a new window (CNI) and the police to use in determining what protective security response may be required." 

But can we use the published threat levels, along with data on terrorist attacks in Great Britain and Northern Ireland, to infer which probabilities the threat levels correspond to?  For example, the published GB threat level spent a total of 709 days at 'moderate' between first publication and the last published attack data at the end of 2013.  During those 709 days there were 16 attacks in Great Britain which were attributable to domestic or NI-related actors.  A best guess for the daily probability of an attack in this category is therefore about 1 in 44, or an attack somewhere in Great Britain roughly every six weeks.

But do we have enough information to make these kinds of inferences?  Well, we have enough information to make some inferences along these lines, because we do at least have some information.  It is very unlikely, for example, that a 'moderate' threat level corresponds to a daily probability of 1 in 1000, because then 16 attacks in just 709 days would be an extraordinarily high figure.  Likewise, the daily probability is not likely to be 1 in 10 - for then the observed number of attacks would be implausibly low.  This kind of reasoning - "if this were true, how likely are the data?" - is actually the engine of all probabilistic inference, and the probabilities it generates are called 'likelihoods'.  The likelihood associated with a hypothesis isn't the same as its probability, but it is a key component of its estimation using data.

Taking the example of non-international, GB-targeted terrorism, and using the data to estimate likelihood functions for the rates of terrorist attacks under a 'substantial' and a 'moderate' threat level (the only two levels that have applied to this category of attack), produces the following:

What this tells us is that, in the absence of any other information about what the terms mean, the data are consistent with a 'substantial' threat level equating to somewhere between a 1-in-200 and 1-in-50 probability of an attack on a given day, and with a 'moderate' threat level equating to somewhere between a 1-in-100 and 1-in-25 daily probability of an attack.  Interestingly, the data suggest (but do not absolutely confirm) that the attack rate is lower under the higher, 'substantial', threat level.  We don't know why this is so, but it is consistent with a number of hypotheses; perhaps the response levels do effectively deter terrorism, for example, or perhaps threat levels merely lag, rather than predict, changes in terrorist attack rates.

We can use this kind of approach to make tentative inferences about what probabilities of attacks the other threat levels correspond to.  Northern Ireland has had a threat level of 'severe' since the first publication; this is consistent with a daily attack probability of just less than 1-in-5.  Meanwhile, the international terrorism threat level of 'severe' corresponds to a substantially lower daily attack probability of somewhere between 1-in-1000 and 1-in-40 (the wider bounds here are due to the fact that there have been around 4 international terrorism incidents - depending on how they're classified - since 2010 compared with over 200 in Northern Ireland).  Meanwhile, the international terrorism threat level of 'substantial' probably corresponds to a daily attack probability of between 1-in-1000 and 1-in-100.

Taking the current threat levels into account, then, the daily probability of a terrorist attack is around 20% in Northern Ireland, around 1-4% in Great Britain from domestic or NI-related attacks, and around 0.1-1% from international terrorist attacks.

Tuesday, 28 October 2014

Vagueness as an Optimal Communication Strategy

Imagine the following game.  You are given a random number between 1 and 100 and must communicate its value to a friend.  But you're only allowed to use the words 'low' and 'high'.  You win £100, less the difference between your number and your friend's guess.  If your number is 40 and she guesses 60, you'll win £80 (£100 less the £20 difference).  If your number is 40 and she guesses 90, you'll only win £50.  You're allowed to discuss strategy in advance, and we will assume that you are risk averse and so are interested in finding a way of maximising your guaranteed win (a 'maximin' strategy).    

One approach is this: you might agree in advance that you will say 'low' if the number is 1-50, and 'high' if it's 51-100.  Your friend would then guess 25 when you say 'low', and 75 when you say 'high'.  That way you'll never win less than £75.

But what happens if you're given the number 50?  In this case, it might be better not to communicate at all.  That way your friend has no information and may choose to minimise losses by guessing 50, guaranteeing (she thinks) a win of at least £50 but in fact (as you know) giving you a win of £100. This would be a form of strategic vagueness: your friend is more likely to make a better decision if you don't use your limited lexicon of 'high' and 'low' at all.

"...I couldn't possibly comment."

Given that the aim of analytical communication is accurately to convey information, and that our language, while rich, is nevertheless finite in its expressive range, we should expect to find real-life situations in which a strategically vague response is more likely to induce the correct belief in the receiver of the message.  Of course there are venal reasons for vagueness, including avoidance of accountability.  There is also a range of other philosophical theories for the origin of linguistic and epistemic vagueness.  But the possibility that strategic vagueness might be an optimal communications strategy opens some interesting questions for analysts and their customers.    

Friday, 24 October 2014

Terror Threat Levels and Intervals Between Attacks

The shootings in Ottawa on 22 October, which came only two days after a vehicle attack in Quebec, are an anomaly given the phenomenally low incidence of terrorism within Canada.  According to the Global Terrorism Database, only 8 people have been killed in terrorist attacks within Canada since 1970. The investigation is only just underway, but the attacks already seem to have some surprising features.  They are allegedly unconnected, at least in terms of any planning.  The terror threat level in Canada was raised to 'moderate' after the first attack, in response (the authorities say) to 'increased chatter' from radical groups, but the shooter on 22 October, Michael Zehaf-Bibeau, apparently has no direct connection to IS and AQ.  The details, when they emerge, will clearly hold lessons for intelligence analysts.

As in this case, terrorist threat levels often seem to change after an attack has happened.  This sometimes seems a bit like shutting the door after the horse has bolted.  But in fact, as the Canada incident illustrates, terrorist attacks really do come in clumps.  Canada's terrorist attacks are so infrequent that there is an average of about 230 days between them, since 1970.  A purely random process with this kind of interval would mean that only 3% of terrorist attacks would occur within the same week.  In Canada's case, though, nearly one in five terrorist attacks occurs within a week of the last one.  In other words, the rate of terrorist attacks rises by a factor of about six immediately following an attack.  A terrorist attack is therefore a relatively good indicator of another imminent attack - even disregarding any intelligence received - and the authorities almost certainly did the right thing in raising the threat level.

The effect is also clear in countries with relatively high levels of terrorism.  The chart below shows the observed frequency of intervals (in days) between attacks in Northern Ireland since 1970 (in blue) and the frequency that would be observed if attacks came purely randomly (in red).

As the chart shows, attacks are more likely to occur immediately after another attack (whether through reprisals or co-ordinated activity), although the effect is smaller than that of Canada.  After about 2-3 days of peace the immediacy effect washes out and the daily frequency returns to the long-run average.

Wednesday, 22 October 2014

Cogntive Task Sequencing

While at Los Alamos, Richard Feynman designed an interesting way of debugging computer programs before the computers actually arrived:

From 'Surely You're Joking, Mr Feynman'

Effective application of cognitive structures and analytical techniques is somewhat like this.  Breaking an analytical approach to a question into a series of separate, logically-sequential, cognitive tasks makes each of the stages easier to perform and, as an added bonus, gives you an audit trail for the final answer.  The CIA's Tradecraft Primer suggests the following approach to task sequencing:

In the UK, the MOD and Cabinet Office intelligence analysis guide, Quick Wins for Busy Analysts, contains the following:

What these have in common is the evolution of a project from divergent, creative types of approach to convergent, critical, probabilistic assessment, or in other words from hypothesis generation to hypothesis testing.  In other words, good analytical sequencing is the application of the scientific method.  In the words of Richard Feynman again:

"In general we look for a new law by the following process. First we guess it. Then we compute the consequences of the guess to see what would be implied if this law that we guessed is right. Then we compare the result of the computation to nature, with experiment or experience, compare it directly with observation, to see if it works. If it disagrees with experiment it is wrong. In that simple statement is the key to science."

A Simple Base Rate Formula

A number of studies have identified 'base rate neglect' as a significant factor undermining forecasting performance even among experts.  A 'base rate' is not easy to define precisely, but in essence it's a sort of 'information-lite' probability which you might assign to something if you had no specific information about the thing in question.  For example, since North Korea has had a nuclear test three times in about the last nine years, a base rate for another test in the next year would be about one in three, or 33%.  If you're asked to make a judgement about the probability of an event in a forthcoming period of time, you should first construct a base rate, then use your knowledge of the specifics to adjust the probability up or down.  It seems simplistic, but anchoring to a base rate has been shown significantly to improve forecasting performance.

If your arithmetic is rusty, you can use the following simple formula to get a base rate for the occurrence of a defined event:

How far AHEAD are you looking?  Call this 'A'.
How far BACK can you look to identify comparable events?  Call this 'B'.  (Make sure the units are the same as for 'A' - e.g. months, years.)
What NUMBER of events of this kind have happened over this timeframe, anywhere?  Call this 'N'
How big is the POPULATION of entities of which your subject of interest is a part?  Call this 'P' 

Your starting base rate is then given by: (A x N) / (B x P)

For example, suppose we were interested in the probability of a successful coup in Iran in the next five years.

How far AHEAD are we looking? 5 (years)
How far BACK can we look to identify comparable events?  68 (years)
What NUMBER of events of this kind have happened over this timeframe?  223 (successful coups since 1946, according to the Center for Systemic Peace)
How big is the POPULATION of entities (countries, in this case) of which Iran is a part?  The data cover 165 countries

The base rate is therefore: (5 x 223) / (68 x 165) = 0.099, or 9.9%, or more appropriately 'about one in ten'.

Remember this is just a starting point, not a forecast.  And there isn't just one base rate for a event - it will depend on how you classify the event and how good your data are.  But doing this simple step first will help mitigate a significant bias.

(NB. If you're dealing with events that have no precedents, or if the events are relatively frequent compared to your forecast horizon, you have a different problem on your hands and shouldn't use a simple formula like the one above.)

Monday, 20 October 2014

What Coincidences Tell Us

Today I happened to walk down South Hill Park in Hampstead.  South Hill Park is greatly beloved of fact fans as it's associated with one of the most remarkable coincidences in London's history.  South Hill Park is relatively well-known as the street where Ruth Ellis, the last woman to be hanged in the UK, shot her lover outside the Magdala pub.  What is less well known is that the second-last woman to be hanged in the UK, Styllou Christofi, also (and completely unrelatedly) committed the murder of which she was convicted on South Hill Park, at house number eleven.  The cherry on the aleatory cake is that South Hill Park is, quite unusually, distinctively noose-shaped.

 Ruth Ellis, Styllou Christofi, the noose-shaped South Hill Park

Hardened rationalists though we may be, we have to admit that this is an extremely tantalising confluence of events.  Great coincidences like these demand our attention.  The interesting thing about our reaction to coincidences is what it tells us about our cognitive machinery.

First, though, it is worth noting that, by-and-large, the occurrence of coincidences is not objectively remarkable.  Allowing that the South Hill Park coincidence has - let's say - a one-in-a-million probability, and that there are - getting very cavalier with our concepts - several billion facts, it's a certainty that a large number of the facts constitute coincidences.  Coincidences involving notorious people are a small proportion of the whole because there aren't that many notorious people, but we shouldn't, objectively, be that surprised.  

Why are we, then?  To answer this we need to work out what makes something a coincidence in the first place.  This is not as easy as it seems it should be.  Our first reaction is to say that a coincidence is some kind of low-probability event.  But this falls over very quickly on examination, since almost every event is low probability.  A bridge hand consisting of all thirteen spades is exactly as probable as a bridge hand consisting of 2-K of spades and the ace of hearts, or indeed any other particular combination of cards.  Discovering that all four people in your team share a birthday is exactly as probable as discovering that they all have some other specified but unremarkable combination of birthdays.

What makes something a coincidence is not, then, directly the probability.  Coincidences seem instead to require data points with shared values across multiple fields.  This is a somewhat abstract way of putting things, but what it means is that:

(Name = Alice, Team = my team, Birthday = 1 June)
(Name = Bob, Team = my team, Birthday = 3 August)
(Name = Charlie, Team = my team, Birthday = 19 December)
(Name = Dave, Team = my team, Birthday = 5 June)

is not a coincidence, because the data have shared values in only one field ('Team'), but that

(Name = Alice, Team = my team, Birthday = 1 June)
(Name = Bob, Team = my team, Birthday = 1 June)
(Name = Charlie, Team = my team, Birthday = 1 June)
(Name = Dave, Team = my team, Birthday = 1 June)

is a coincidence, because the data have shared values in two fields ('Team' and 'Birthday').  It's not about probability - the probabilities of both these datasets are equal - but about features of the data.  The interesting thing is that, put like this, the connection between coincidence and hypothesis testing becomes very clear.  Broadly, shared data values provide evidence for the existence of a hypothesis which posits a lawlike relationship between the two fields; in this case, the data support (to an extent) hypotheses that I only hire people whose birthday is on 1 June, or that babies born in early summer are particularly suited to this kind of work, and so on.  It is only the low prior probability for hypotheses of this kind that leads us, ultimately, to dismiss them as possible explanations, and to accept the data as a 'mere' coincidence.

This doesn't of course stop us wondering at them.  And this is the interesting thing: it suggests that our sense of wonder at coincidences is how a currently-running unresolved hypothesis-generation algorithm expresses itself.  When we get the explanation, the coincidence - and the wonder - go away.  If I tell you that last night I was at a party and met not one but two old schoolfriends who I hadn't seen for twenty years, this is a remarkable coincidence until I tell you it was a school reunion, at which point it stops being interesting at least in one regard.  The data are the same, but the existence of the explanation rubs away the magic.  The feeling of awe we get from coincidences is the feeling of our brain doing science.

That's No Moon

Mimas, everyone's second-favourite moon, has been found to have an unorthodox wobble that is too big for a moon that size with a solid internal structure.  The authors of a paper in Science offer two explanations: that it has a rugby-ball-shaped core, or that it has an internal liquid ocean.  This is a great example of hypothesis generation, which is both a key part of analysis, and risky and difficult.  It is a fairly straightforward matter (assuming you know about moons and gravity and so on) to work out that the observed wobble isn't consistent with a solid core.  It's quite another matter to come up with possible hypotheses that are consistent with the evidence.  Not least, this is because (as a matter of logic) there are an infinite number of hypotheses that could account for any given set of evidence.  This means that there is no algorithm that can exhaustively generate all the possible explanations for a set of data.  How humans do it is still a matter for debate and undoubtedly one of the most impressive features of our cognitive architecture.

Logarithmic Probability Scores

Expressing probability as a percentage is something we're all used to doing, but in many ways it's of limited usefulness.  It's particularly inadequate for communicating small probabilities of the kinds considered in risk assessments such as the UK's National Security Risk Assessment which looks at a range of risks whose probabilities might vary in orders of magnitude, from say one in a million to one in ten.  For these kinds of situations - where we're more interested in the order of magnitude than in the precise percentage - a logarithmic ('log') scale might be a better communication tool, and potentially more likely to support optimal decisions about risk.

On a log scale, a one point increase equates to an increase in magnitude by a constant factor, usually ten (since it makes the maths easier).  A log scale might start at '0' for a 1 in 1,000,000 probability, move to '1' for 1 in 100,000, and so on through to '5' for 1 in 10 (i.e. 10%) and '6' for 1 in 1 (i.e. 100% probable).  (A more mathematically-purist probability scale would actually top out at '0' for 100%, and use increasingly negative numbers for ever-lower probabilities.)  A log scale also brings the advantage that, if paired with a log scale for impact, expected impact - which is a highly decision-relevant metric - can be calculated by simply adding the scores up (since it's equivalent to multiplying together a probability and an impact).  One thing it can't do, though, is express a 0% probability (although arguably nothing should ever be assigned a 0% probability unless it's logically impossible).

For these reasons, log scales are used to simplify communications in a number of risk-related domains where the objects of interest vary in order-of-magnitude.  The Richter Scale is a log scale.  The Torino scale is used to combine the impact probability and kinetic energy for near-earth objects:

Several intelligence analysis organisations have developed standardised lexicons for expressing probabilities, such as the UK Defence Intelligence's uncertainty yardstick (last slide) and the US National Intelligence Council's 'What we Mean When We Say' (p.5) guidance.  To my knowledge, however, there are no standardised log scales used in these areas.  There may be an argument for their development, to enable easier communication and comparison of risk judgements concerning high-impact, low-probability events across the intelligence community and government more widely.  

Friday, 17 October 2014

Pareto Redux: 80:20 in Information Gathering

The so-called 'Pareto Principle' states, of systems to which it applies, that 80% of something is distributed within just 20% of something else, and vice versa.  People will often mention this in the context of allocating effort: if 80% of work is done in the first 20% of time, it might be better to produce five things 80% well rather than one thing 100% well.  Although it's frequently and often inappropriately cited by charlatans as an excuse for doing things badly, the Pareto Principle does have a mathematical basis in that many systems and processes produce a 'power law' distribution that can sometimes roughly follow the '80:20 rule'.

Interestingly, information-gathering, at least within a fairly abstract model, is one of these processes.  'Information' here is here defined fairly standardly as anything which would change a probability.  As is often the case, we can use an 'urn problem' to stand in for a range of real-life analytical and decision problems.  Here, there are two urns containing different-coloured balls:
One urn is chosen at random (50-50) and the other destroyed.  You need to work out whether Urn A or Urn B has been chosen - because you win £1m if you guess correctly.  Balls are repeatedly drawn out of the chosen urn and then replaced.  

Every black ball that is drawn increases the probability that Urn A is the chosen one.  Every white ball concomitantly reduces it (and increases the probability of Urn B).  The impact on probability is very well-understood: each ball doubles the odds of its respective associated urn.  If we start with odds of 1-1 (representing 50% probability of each urn), a black ball will increase the probability of Urn A to 2-1.  A second black ball will increase it to 4-1.  If we then draw a white ball, the odds go back down to 2-1, and so on.  If Urn A was chosen, the number of black balls would gradually outpace the number of white balls (in the long run) and the assessed probability of Urn A would creep closer and closer (but never actually equal) 100%.  

Because the odds ratio - which is what information directly affects - is related to the probability in a non-linear fashion, we end up with a Pareto-Principle-type effect when looking at probability compared to information quantity (i.e. number of balls drawn).  In fact, the relationship between probability and information quantity, on the assumption that the target fact is true, looks like this:
The relationship between information and probability is fairly linear from 50% to about 80%.  Above 90% the curve steepens dramatically, and on the assumption that information continues to be drawn from the same sample space (e.g. the same urn, with replacement) it edges closer and closer to 100% with increasing amounts of information, without ever reaching it.

The implication is something that most analysts realise intuitively: there is a diminishing marginal return to information the closer you are to certainty, in terms of the impact it has on the probability of your target hypothesis.  The amount of information that will get you from 50% to 80% probability will only take you from 80% to about 94%, and from about 94% to about 98%.  Because the expected utility of a decision scales linearly with probability (in all but exceptional cases), there is indeed an 'optimal' level of information at which it will be better simply to make a decision and stop processing information.

Resource-management under Uncertainty; Bats

The police have reported being over-stretched with the large number of investigations into possible Islamist terrorist activity.  In my experience most analytical organisations feel stretched most of the time, and analysts always worry about missing things.  Curiously though, I haven't heard about any organisations that attempt to match resources to uncertainty in an explicit sort of way.  Apart from in exceptional circumstances, analysts tend to be reallocated within organisations fairly slowly in response to changes in perceived threat levels, and analytical organisations very rarely change significantly in size in the short term.  This is in contrast to many naturally-evolved information-processing systems.

The analytical resource-management task is fairly easy to state in cost-benefit terms.  Analytical work on a topic has an expected benefit in terms of the extent to which it reduces uncertainty, and a cost.  It adds value because of the possibility of it changing a decision.  If you have £10 to bet on a two-horse race, with each horse valued at 1-1, and you currently believe Horse A has a 60% probability of winning, you will (if you have to) bet on it and expect to win £10 60% of the time, and lose £10 40% of the time, for a net expected profit of £2.  If you are offered new information that will accurately predict the winning horse 90% of the time, your winnings will (whatever it tells you) be expected to rise to £8.  This information would therefore have net value of £6 and if it costs less than that then you should buy it.

Although there is no easily-stated general solution to this problem, you can show how the optimal amount of resources to throw at an analytical problem will change, and roughly how significantly, when the problem parameters change.  When the costs and risks associated with a problem rise, analysis becomes more valuable.  When new information increases uncertainty (i.e. pushes the probabilities associated with an outcome away from 0 or 1), analysis becomes more valuable.  When information becomes easier (cheaper) to gather, analysis becomes more valuable.  The optimising organsiation might attempt to measure these things and move resources accordingly - both between analytical topics (within analysis teams) and towards and away from analysis in general (within larger organisations of which only parts produce analysis).

Of course it's not as simple as that, and that's not very simple in the first place.  It's expensive to move analytical resources, not least because it takes time for humans to become effective at analysing a new topic.  This adds an additional dimension of complexity to the problem.  But it surprises me that firms and other analysis organisations don't attempt explicitly to measure the value their analysis adds - perhaps by looking at the relative magnitude of the decisions it helps support - because among other things this would give them a basis on which to argue for more resources, and a framework to help explain why surprises were missed when they were.

Animals have evolved interesting solutions to this problem that we might learn from.  As humans, we do this so naturally we barely notice it.  Under high-risk situations - while driving, negotiating a ski run, or when walking around an antiques shop - our information processing goes into overdrive at the expense of our other faculties such as speech or abstract thought (one theory suggests this is why time seems to slow down when a disaster is unfolding).  Generally, humans sleep at night - switching information-processing to a bare minimum - when threat levels (from e.g. large predators) are lowest in the evolutionary environment.

A bat yesterday

Bats, however, make an interesting study because their information-gathering is easy to measure.  When bats are searching for insects they emit a sonar pulse interval of 100ms.  This enables them to 'see' objects up to 30-40 metres away but with a resolution of only ten times a second.  When bats are in the final moments before capture (where the decision-infoming benefit of frequent information is much higher) this pulse interval falls to 5ms - 200 times a second - but this pulse only provides information about objects less than about 1 metre away.  Sonar is expensive though.  A medium-sized bat needs about one moth an hour.  This would rise to around twenty moths an hour if its sonar was kept switched on at full power.  Bats have therefore found a solution to the problem of optimising information-acquisition from which analysis organisations perhaps have something to learn.