Sunday, 30 November 2014

A Simple Rule for Decisions under Uncertainty

Many everyday decisions under uncertainty are fairly simple in structure, and involve two possible options and a single scenario of concern whose probability is of interest.  Insurance is an explicit example here: when we are considering insuring an item against theft, we have to take into account the probability that it will be stolen.  Other decisions are similar, when analysed: the decision to take an umbrella to work will hinge on the probability of rain; the decision to change mobile phone tariffs might hinge on the probability that our usage will exceed a certain level; the decision to search a suspect's house might hinge on the probability that some diagnostic evidence will be found.

For decisions like these, it's helpful to know the 'critical probability' - the probability value above or below which your decision will differ.  There is a relatively easy way to derive this value, which doesn't require anything more than simple arithmetic.  The first step is to find two metrics which we can (for convenience) label 'cost' and 'risk'.

Under these kinds of simple binary decisions, there are four possible outcomes, each defined by a scenario-decision pair.  When considering insurance, for example, the four outcomes are (theft-no insurance), (theft-insurance), (no theft-no insurance), (no theft-insurance).  'Risk' and 'cost' (as defined here) are the differences between the values of the outcomes associated with each scenario.  In this case, these are the differences between (theft-insurance) and (theft-no insurance), and (no theft-insurance) and (no theft-no insurance).  It's easier to explain using a table like this:

When you've calculated 'cost' and 'risk', the 'critical probability' is calculated as follows:

We can illustrate this by running with the example of insurance.  Suppose we have a £500 camera that we are taking on holiday and which we are wondering about insuring.  We are offered insurance that costs £20, with a £40 excess.  We put the outcomes into the grid as follows:

According to the formula above, this gives us a 'critical probability' of 20 / 460, or about 4.3%.  This means that if the probability of theft exceeds this, we should take the insurance.  If the probability is lower, we should risk it.

Incidentally, the labelling of the two metrics as 'cost' and 'risk', while convenient, is rather arbitrary and depends on the idea of there being a 'do nothing' baseline.  In general, thinking of one of your options as a 'baseline' can be harmful to good decision-making as it can stimulate biases such as the endowment effect.  It's best to think of the two things simply as values that help determine a decision.

Tuesday, 25 November 2014

Failures of Imagination: The Problem with 'Black Swan'

The term 'black swan', invented and popularised by Nassim Taleb, clearly fills a conceptual niche, given its popularity in the futures community.  Analytical failures can be broadly divided into two categories: failures of imagination, and critical thinking errors.  As used by Taleb, a 'black swan' is a failure of imagination in which, instead of a high-impact hypothesis or outcome being appraised and discounted, it is simply not considered at all.  Taleb's story is that until black swans were actually discovered, it was simply assumed that they did not exist.

"Before the discovery of Australia, people in the Old World were convinced that all swans were white, an unassailable belief as it seemed completely confirmed by empirical evidence."
- The Black Swan, Nassim Nicholas Taleb

There are at least two reasons why the term 'black swan' might be a less than ideal term to use for this concept.  First, and perhaps most boringly, there is plenty of evidence that the concept of a black swan was considered well before the discovery of Australia.  Perhaps more conceptually problematic, though, is that the notion of a black swan can be generated fairly easily and algorithmically by simply combining known colours with known animals.  This makes it a less interesting sort of failure than the failure to consider objects with characteristics that are entirely unlike anything previously encountered.

Necessarily, it is not easy to think of examples.  Candidates might include the arrival of settlers to the New World, quasars, or Facebook.  In Flann O'Brien's remarkable novel 'The Third Policeman', the narrator hears a story about a box which contains a card that is of a completely unimaginable colour, one which sends anyone who sees it insane: "I thought it was a poor subject for conversation, this new colour. Apparently its newness was new enough to blast a man's brain to imbecility by the surprise of it. That was enough to know and quite sufficient to be required to believe. I thought it was an unlikely story but not for gold or diamonds would I open that box in the bedroom and look into it."

Smoking Guns: Binary Tests with Asymmetric Diagnosticity

A binary test is one which produces one of two outputs, which are usually (but not necessarily) thought of as 'positive' or 'negative'.  Metal detectors, pass/fail quality control tests, tests of foetal gender, and university admissions tests are all binary tests.

We often want binary tests to strongly discriminate between cases.  We'd like a metal detector to have a high probability of bleeping in the presence of metal, and a low probability of bleeping in the absence of metal.  Tests with these characteristics will tend to have a broadly symmetric impact on our beliefs.  If our metal detector is doing its job, a bleep will increase the probability of metal being present by some factor, and the absence of a bleep will concomitantly reduce it by a similar factor.  If the detector has, say, a 99% chance of bleeping when metal is present, and only a 1% chance of bleeping when it isn't present, the test will strongly discriminate between cases; a bleep will increase the odds of metal by a factor of 99, and the absence of a bleep will decrease the odds that metal is there by the same factor.

Not all tests are like this.  Some tests have an asymmetric impact on our beliefs.  Some tests for circulating tumour cells (CTCs), for example, have a strongly positive effect on the probability of cancer if detected, but only a relatively weak negative effect if they are absent.  According to this data, just under half of patients with metastatic colorectal cancer (mCRC) tested positive for CTCs, compared to only 3% of healthy patients.  Assuming this data is right, what does this mean for the impact of this test on the probability of mCRC?

Let us suppose that a patient's symptoms, history, circumstances and so on indicate a 5% probability of mCRC.  If a subsequent CTC test is positive, the probability would rise to around 30% - a change in the odds by a factor of about ten (roughly 20:1 to 2:1).  But if it came back negative, the probability would only fall to about 3% - a change in the odds by a factor of about one-and-a-half.  The impacts of positive and negative results are therefore asymmetric.

"Don't wait for the translation, answer 'yes' or 'no'!"

In the realm of security and law enforcement, tests of asymmetric diagnosticity are often called 'smoking guns', apparently in homage to a Sherlock Holmes story.  Perhaps the most celebrated example in modern times is the set of photos of Cuban missile sites, taken from a U2 spy plane, that was presented in the UN by Adlai Stevenson in 1962.  These photos made it near-certain that the USSR were putting nuclear weapons into Cuba.  But if the US had failed to get these photos, it wouldn't have proved that the USSR wasn't doing that.  Incriminating photos are asymmetrically diagnostic.

Evidence that is asymmetrically diagnostic forms an interesting and ubiquitous category.  The enduring but generally-misleading dogma that 'absence of evidence is not evidence of absence' is in fact only true (and even then only partly so) of asymmetrically diagnostic evidence.  Frustratingly though, it's easy to prove that tests like this must only rarely provide a positive result.  If positives were common, their absence would provide exonerating evidence - and we've assumed that isn't the case.  In other words, smoking guns are necessarily rare.  Not because the universe is contrary, but because of the fundamental nature of evidence itself.

Thinking in Odds

There are many ways to express something's probability.  Using the usual decimal method (0 to 1) is convenient for a number of reasons, and particularly when thinking in terms of risk and expected outcomes.  But it doesn't handle inference very well - inference being the process of incorporating information into one's probabilistic judgements.  The reason is that the impact of a piece of information on a decimal probability differs depending on what probability you start with.

Suppose you know that one of the two urns has been chosen randomly, so that the probability it's urn A is 50%.  A ball is now drawn from the chosen urn - it's black - and then returned to the urn.  The black ball has raised the probability that urn A was chosen from 50% to 75%.  Another ball is drawn from the same urn, and is also black.  This time, the black ball has only raised the probability that urn A has been chosen from 75% to 90%.  The information is the same, but the effect on your probability is different.  In fact the mathematical impact on probability is quite convoluted to express algebraically.

Thinking in odds makes things a bit simpler.  Drawing a black ball has an evidential odds ratio of 3:1 in favour of urn A; in other words, a black ball is three times more probable if urn A has been chosen than if urn B was chosen.  Starting with odds of 1:1 (i.e. 50%), the first black ball raises the odds to 3:1.  The next has a similar effect, raising the odds from 3:1 (75%) to 9:1 (90%).  The odds are raised by the same factor (three) in both cases, respecting the fact that the information was the same.

Expressing probabilities in terms of odds makes it easier to separate our base rate (the probability we started with) from the evidence (the new information that affects the probability).  For this reason, there is growing support for the use of odds to express the evidential power of scientific experiments, as an alternative to the perpetually-counterintuitive significance test.

Of course we shouldn't forget that decimal probabilities and odds merely express the same thing in two different ways.  The difference is in the way we respond to them cognitively.  Broadly, it's easier to use decimal probabilities for thinking about expected outcomes and decision-making, and odds for thinking about information and inference.

Friday, 21 November 2014

How do you Prove a Decision is Bad?

McKinsey has a new report out evaluating the global harm from obesity, and the likely impact of counter-obesity policies.  They compare the costs of obesity - which they estimate at \$2tn a year - with the costs of smoking, and war and violence, which they reckon similar.  To arrive at estimates like these with any semblance of certainty can only be misleading: they surely rest (as the report makes clear) on a web of inferences and assumptions that could probably be picked over to a considerable level of tedium.

But there is merit in examining McKinsey's premise, which is that obesity is clearly a net harm, thus implying a rationale for government intervention.  This is a superficially appealing premise, but it raises a number of difficult questions.  The traditional economic basis for government intervention is market failure.  This can occur for a number of reasons: if an industry is a natural monopoly, if information is expensive, if there are unmarketable external impacts (e.g. from pollution) and so on.  Government intervention against violence and war rests on a host of market failures including natural monopolies of defence suppliers but most particularly the lack of a 'market' in violence itself (the victims do not generally suffer voluntarily).  The case for intervention against smoking was (at least initially) driven by the negative externalities imposed by smokers on their co-habitees and neighbours.

It is harder to prove a market failure in the case of obesity.  The case for intervention seems to rest on the idea that people are making bad decisions about their diets, and need help to make better decisions through initiatives like smaller plates or restrictions on availability of junk food.  This is a very difficult thing to show.  It is demonstrably the case that people choose to eat food that leads to deleterious health effects.  But this behaviour is consistent with at least two hypotheses: first, that they are choosing rationally, and place a higher value on eating than health, and second, that they are choosing irrationally, and in fact place a higher value on health.  The observed behaviour simply does not allow us to sort between these two hypotheses.

(Of course, obese people probably generally wish they were thin.  But then people who buy expensive handbags probably wish they could have the handbag and their hard-earned money.  Obesity is the cost of a certain kind of diet.   All activities have costs: their existence is not an argument for intervention.)

Is obesity like this stuck dog?

The problem of justifying government intervention with a claim of irrationality is germane to a wide range of policy questions.  But as there is no way to observe irrationality we should be particularly careful to examine arguments for intervention that rest on this claim alone.

Tuesday, 18 November 2014

Ways of Distinguishing Short-term from Long-term

The concepts of the 'short term' and 'long term' are used frequently by forecasters, analysts and organisations with a forward-looking remit.  But in my experience there is little coherence in their definitions.  Some organisations approach this issue by identifying a standard length of time - e.g. 'long term' meaning five or ten years or more.  These definitions are arbitrary.  In economic and decision theory, there are a number of definitions including the 'time taken to return to equilibrium' - a system-centric definition - and the 'time over which all inputs can be considered variable' - a decision-centric definition.

An approach which is perhaps more useful for forecasters is to think of the short-term / long-term distinction as relating to the types of information used.  Short-term forecasts primarily use information about the situation now.  Long-term forecasts primarily use information about base rates.  A short-term forecast will therefore be one that uses as a rationale information about what is happening now, while a long-term forecast will be largely invariant to the particular time it is being made.  This distinction is not, of course, a clear one: most forecasts will combine historical base rates with present-focused specifics.  But it is useful to think about the relative weightings placed on these two types of information as a measure of the extent to which it is a short- or a long-term forecast.

Waves for short-term forecasts, tides for long-term forecasts

Forecasting the weather in one minute's time will almost entirely involve information about what the weather is like now.  Forecasting the weather in one year's time will almost entirely involve historical data and very little information about what the weather is like now.  So in the context of weather forecasting, one minute is clearly short-term and one year is clearly long-term.

In more volatile systems, information about today will have less diagnostic value about the future than in more stable systems.  So long-term political forecasts look out far further than long-term weather forecasts.  An information-based distinction between the concepts of the short- and long-term captures these differences but in a way that is more powerful in that it encompasses other factors (such as the volume of information collected) as well.

Thursday, 13 November 2014

Five Hypotheses to Explain a Correlation

There was a common type of study published yesterday, which linked the number of fast food outlets with nearby obesity and diabetes rates.  It's a common mantra in the field of methodology that 'correlation is not causation'.  This isn't strictly true.  First, it might be the case that causation is, on a fundamental sort of level, nothing more than a very strong set of correlations.  Second, even if correlation and causation are not semantically equivalent, it's often the case that correlation is extremely good evidence for causation, and this should never be ignored.  Having said that, if people can only remember one thing about 'correlation' from their stats classes, 'correlation is not causation' isn't a bad candidate to promote good practice and scepticism about claims.

When, as an analyst, you have established a correlation between two features of the data A and B, there are always five distinct families of hypothesis you should bear in mind to explain it:

1. A causes B.  Perhaps obese or diabetic people choose to live near fast food restaurants?

2. B causes A.  Perhaps fast food restaurants nearby encourage people to become obese?

3. A and B are caused by a separate factor C.  Perhaps poor people are more likely to be obese or diabetic, and fast food outlets tend to open in poorer areas?

4. The data are a coincidence.  Perhaps it's just chance that these two things occur together in the study data?

5. The data are wrong.  Perhaps diabetics are more likely to be diagnosed in urban areas with more fast food restaurants, and rural diabetics are just not being picked up?

An observed correlation, by itself, will provide evidence in favour of hypotheses in any of these categories.  Only additional features of the data will help you sort between them.

The website Spurious Correlations allows you to generate your own correlations from a number of data sources.  As an analytical exercise, force yourself to come up with causal hypotheses to explain them.

L'Aquila Earthquake Conviction Overturned; Analytical Responsibility

The seven earthquake experts who were convicted of manslaughter in the wake of the L'Aquila quake in 2009 have had their convictions overturned.  This will be generally welcomed by scientists and other analysts because of the concern that legal risks would deter honest enquiry.  At the time of the original conviction, medical physicist Professor Malcolm Sperrin was quoted as saying that: "if the scientific community is to be penalised for making predictions that turn out to be incorrect, or for not accurately predicting an event that subsequently occurs, then scientific endeavour will be restricted to certainties only and the benefits that are associated with findings from medicine to physics will be stalled."

This is undoubtedly a valid concern.  But should analysts be entirely exempt from legal redress?  The analyst's role is to inform decisions by identifying relevant outcomes and assigning probabilities to them, via collecting and processing information.  Analysts add value by giving decision-makers an accurate picture of their decisions' likely consequences, so that their decisions under uncertainty can be well-informed.  Like other practitioners - GPs, financial advisers, and engineers, for example - their work has a practical and (in theory) measurable impact.  It is difficult to argue consistently that analysts should as a matter of principle be entirely protected from legal responsibility for their work.

"You will destroy a great empire..."

In the UK, however, the idea of scientific or analytical responsibility in general has not been sufficiently tested for there to be a clear legal position.  The government has issued principles governing scientific advice for policy but these are primarily ethical and relate to the motives of the advisers - conflicts of interest and so on - without touching on any requirement for methodological rigour.  Is it possible to imagine a legal code for analysts, failure to adhere to which might lead to culpability for negligence?  There are it seems to me two main problems with any project to build a notion of responsibility for analysts that could operate legally: first, that of measuring an analyst's performance in the first place, and second, that of establishing that they had a causal role in any wrongs incurred.

The first problem is that of ascertaining the extent to which an analyst was doing a bad enough job as to be regarded as negligent.  It isn't a hopeless endeavour to try to measure some aspects of an analyst's performance, as the Good Judgment Project has demonstrated.  But there are other key aspects of analysis - in particular, the generation of novel hypotheses or identification of new scenarios - for which we must almost entirely rely on analysts' brains as black boxes.  Could failure of imagination ever be proved negligent?  It's difficult to imagine how, given the obscurity of this kind of creative analytical activity.

The second problem is that of establishing an analyst's impact on a decision-maker.  It is that any given decision depends not just on analysis but on a range of other factors, including the objectives of the decisionmaker, their appetite for risk, their resources and constraints and so on.  To prove civil negligence, a claimant must establish that their damage would not have happened but for the defendant's actions.  This would be hard to prove for most decisions.  To prove criminal negligence or malfeasance on the part of an analyst would seem to require a standard of proof that was higher still.

Having said all this, analysts of every kind want to have impact, and to be taken seriously.  The idea of analysis as a distinct from domain expertise is a new one, however, and to some extent we are at the birth of the profession.  There is room for a code of practice for analysis in general, even if building a legal framework around it would be difficult.

Monday, 10 November 2014

Deception; Bats

As well as possibly having solved some interesting problems relating to investment in information-acquisition, bats have now been demonstrated actively to jam one another's sonar to interfere with hunting accuracy. This fact alone allows us to make some inferences about the economic constraints that bats face: for example, that food resources are relatively scarce, that there are limited returns from co-operation over hunting, and that thwarting another bat carries a relatively high individual return.

Animal deception is extremely common.  In all its forms, deception always relies on the same underlying principle: the replication of signals associated with a hypothesis, under circumstances where that hypothesis is false.  Sending 'false' signals in this way is usually costly, so in order to evolve (or to present as optimal to a decision-maker), deception must also have a sufficient chance to induce a decision error in at least one one other player in the same strategic game (i.e. someone whose decision can affect your payoff, e.g. a fellow bat).  This isn't a very demanding set of circumstances, hence the ubiquity of deceptive behaviour.

Johnny Chan vs Erik Seidel in the final of the 1988 World Series of Poker

Of course, the evolutionary (and strategic) response to deception is to increase investment in information gathering and processing.  But, as with the bats, it won't necessarily be possible to negate the impact of deception on your own decision-making, and even if it is possible, it won't always be worth the expense.  Poker presents an interesting case study.  Poker is complex enough that optimal strategies have only been explicitly derived for a few of its very restrictive forms.  But it is straightforward to show that, as with many games of asymmetric information, any such strategy must entail opportunistic and sometimes-random bluffing to mask the strength of one's hand. Optimal poker strategy will inevitably entail making mistakes: calling in a position of relative weakness and folding in a position of relative strength.  In poker, as with real life, being deceived doesn't mean you're playing wrongly.

"You know where you are with a complete liar, but when a chap mixes some truth with his yarns, you can't trust a word he says." - The Horse's Mouth (1944), Joyce Cary

Wednesday, 5 November 2014

Organisational Arationality

Aleph Insights' approach to analysing decision failure is based on a four-element model of an idealised decision.  These four elements are discrete decision-tasks: (i) the identification of objectives, (ii) the identification of resources, (iii) the identification of relevant outcomes and (iv) the assignment of probability to those outcomes.

For example, when you are deciding whether or not to take your umbrella to work, your objectives might include staying dry and minimising weight carried, your resources would constitute the carrying of an umbrella or not, the relevant outcomes would be the weather events that will affect your objectives, and their probability would be determined by whatever information you had at hand.

When these four elements are in place, they constitute a complete rationale for a particular decision.  They have distinct failure modes, each associated with particular cognitive and organisational characteristics, and carrying a range of predictable effects on decisionmaking.  For example, insufficient attention to the identification of outcomes - failure of imagination - is associated with organisational or cultural rigidity, and carries a risk of surprise.  Much has been written about these four main categories of decision failure.

But there is a fifth kind of decision failure that is not often discussed.  We might label it 'arationality'; it is where there are the phenomena of decision-making but an absence of any mechanisms actually to generate decisions from rationales.  This can still occur where every other component of the decision has been performed properly.

In the case of individuals it is impossible behaviourally (i.e. from the standpoint of an observer) to separate this kind of failure from any of the other failure types.  But arationality is arguably a greater risk when considering the decisionmaking of corporate persons such as companies, departments of states or states themselves.  In his analysis of the start of the First World War, Jonathan Glover writes:

"...for most people, the outbreak of war is experienced passively.  It comes like the outbreak of a thunderstorm.  Only a few people in governments take part in the discussions.  Negotiations break down: an ultimatum is issued.  The rest of us turn on the television news and find ourselves at war.  Often even the leaders who take the decisions are trapped in a war most of them do not want.  1914 was like that."  (Humanity, Glover 2001)

Organisational arationality is a sort of mechanical failure: a failure of a decision-making machine to produce the right output (the optimal decision) even when given the correct inputs.  We are comfortable thinking of organisations as people, and this is facilitated by our language ("North Korea has issued a statement..."), law, pyramidal hierarchies that lead to a single decision-maker, and by our cognitive metaphors (organisations have heads and arms).  But organisations are not people, even if people are their constituents, and sometimes the parts can all be functioning while the organisation fails.

Tuesday, 4 November 2014

Fun with the Visualisation of Uncertainty

The New York Times has a fun feature outlining a predictive model for today's Senate elections.  The most entertaining part lets you run a single simulation of the outcome by spinning wheels representing all the states.  This isn't just fun.  It's an excellent way of presenting uncertainty and some ways of handling it.  Each time you press the button, you are running one iteration of a 'Monte Carlo' model of the election.

A 'Monte Carlo' model is a way of generating a probability of an outcome by running a large number of simulations of it, and where for each simulation the uncertain variables are randomly generated anew.  It's a useful approach if there are many events that interact or combine to produce an outcome.  It's easier but not as satisfying as calculating a precise probability analytically, but many systems are sufficiently complex that this is not possible anyway.

Elections are relatively easy to present this way because each polity can be considered (at least to an extent) as an independent event, and there are no complicated interactions between results to consider since no result is in the information set of any voter.  So the Monte Carlo component of the model is quite straightforward to visualise.  (Although in this case the probabilities for each state seem to be fixed for each iteration, the underlying model probably has a far larger number of constituent variables of which these probabilities are themselves a simulated average.)

Inter alia, the use of miniature roulette wheels as a means of presenting probability allows readers to interact to an extent with an uncertain event in a way that might cement a more concrete understanding of what probabilities actually mean in terms of outcomes.  The evidence for this is currently limited but the approach has been considered in a number of contexts.  The science of data visualisation is relatively young though; we are a long way from consensus on the best way to visualise uncertainty to support decision-making most effectively.  Analysts should take an interest in experiments like the New York Times's as the future demand for interactive visualisations is only going to increase.

Monday, 3 November 2014

What do the UK's Terror Threat Levels Mean?

Since 2006, the UK government has published terrorism threat assessments in the form of 'threat levels'.  Originally only one overall threat level was published, but from 24 September 2010, different threat levels for Northern Ireland, Great Britain (both NI-related and domestic) and from international terrorism have been released.  Some explanatory wording is provided, which explains that (for example) the current international terrorism threat level, 'severe', means that an attack is 'highly likely', while the current threat to Great Britain from domestic and NI-related terrorism, 'moderate', means an attack is 'possible but not likely'.

However, these descriptive wordings are not associated explicitly with any probabilities.  This makes it difficult to apply the threat levels to decision-making.  If we don't know whether, for example, 'highly likely' means '1 in 100 per day' or '1 in 10 per day' then we can't easily use the threat level information to decide whether boarding a plane or travelling to the centre of London with our family is worth the risk.  To be fair to JTAC and the Security Service (who compile the assessments), the threat level is not supposed to affect our behaviour.  The Security Service says that:

"Threat levels in themselves do not require specific responses from the public. They are a tool for security practitioners working across different sectors of what we call the Critical National Infrastructure opens in a new window (CNI) and the police to use in determining what protective security response may be required."

But can we use the published threat levels, along with data on terrorist attacks in Great Britain and Northern Ireland, to infer which probabilities the threat levels correspond to?  For example, the published GB threat level spent a total of 709 days at 'moderate' between first publication and the last published attack data at the end of 2013.  During those 709 days there were 16 attacks in Great Britain which were attributable to domestic or NI-related actors.  A best guess for the daily probability of an attack in this category is therefore about 1 in 44, or an attack somewhere in Great Britain roughly every six weeks.

But do we have enough information to make these kinds of inferences?  Well, we have enough information to make some inferences along these lines, because we do at least have some information.  It is very unlikely, for example, that a 'moderate' threat level corresponds to a daily probability of 1 in 1000, because then 16 attacks in just 709 days would be an extraordinarily high figure.  Likewise, the daily probability is not likely to be 1 in 10 - for then the observed number of attacks would be implausibly low.  This kind of reasoning - "if this were true, how likely are the data?" - is actually the engine of all probabilistic inference, and the probabilities it generates are called 'likelihoods'.  The likelihood associated with a hypothesis isn't the same as its probability, but it is a key component of its estimation using data.

Taking the example of non-international, GB-targeted terrorism, and using the data to estimate likelihood functions for the rates of terrorist attacks under a 'substantial' and a 'moderate' threat level (the only two levels that have applied to this category of attack), produces the following:

What this tells us is that, in the absence of any other information about what the terms mean, the data are consistent with a 'substantial' threat level equating to somewhere between a 1-in-200 and 1-in-50 probability of an attack on a given day, and with a 'moderate' threat level equating to somewhere between a 1-in-100 and 1-in-25 daily probability of an attack.  Interestingly, the data suggest (but do not absolutely confirm) that the attack rate is lower under the higher, 'substantial', threat level.  We don't know why this is so, but it is consistent with a number of hypotheses; perhaps the response levels do effectively deter terrorism, for example, or perhaps threat levels merely lag, rather than predict, changes in terrorist attack rates.

We can use this kind of approach to make tentative inferences about what probabilities of attacks the other threat levels correspond to.  Northern Ireland has had a threat level of 'severe' since the first publication; this is consistent with a daily attack probability of just less than 1-in-5.  Meanwhile, the international terrorism threat level of 'severe' corresponds to a substantially lower daily attack probability of somewhere between 1-in-1000 and 1-in-40 (the wider bounds here are due to the fact that there have been around 4 international terrorism incidents - depending on how they're classified - since 2010 compared with over 200 in Northern Ireland).  Meanwhile, the international terrorism threat level of 'substantial' probably corresponds to a daily attack probability of between 1-in-1000 and 1-in-100.

Taking the current threat levels into account, then, the daily probability of a terrorist attack is around 20% in Northern Ireland, around 1-4% in Great Britain from domestic or NI-related attacks, and around 0.1-1% from international terrorist attacks.