Wednesday, 25 March 2015

Confidence and Probability Part 3: Confidence Intervals

In the last post, we defined a 'confidence statement' as follows:

"I have confidence C that hypothesis H has probability P."

in the context of our quest to find a meaningful interpretation for this kind of thing.

People with some statistical training often begin instinctively to analogise to 'confidence intervals', and specifically about the 'probability' element.  Is confidence related to a confidence interval around the 'P' figure?  The idea here is that if we think the true figure for P is between 55% and 65%, perhaps we will have a higher confidence than if we think it's between 40% and 80%?

The idea that analytical confidence - confidence in the stated probability of a hypothesis - is something to do with confidence intervals carries a lot of problematic baggage.  It presupposes that we can talk meaningfully about ranges of probabilities.  This idea is much more complex than it seems, and we'll look at it in a forthcoming post.  But before we look at that idea in detail, it is worth setting out for non-statisticians what a 'confidence interval' is.  In particular, it's important to realise that a 'confidence interval' is a range of hypotheses.  It's not possible to ascribe a 'confidence interval' to a specific hypothesis, such as 'H' in the generic sentence above.  To explain exactly why, we'll need to look in a bit more detail at two types of hypotheses: orderable, and non-orderable.

Orderable and Non-Orderable Hypotheses

A lot of the hypotheses we are interested in for the purposes of everyday decision-making do not in general have a natural ordering about them.  If I'm planning a driving trip and thinking about things that can go wrong, the list of hypotheses might include:

"I run out of petrol."
"I get a flat tyre."
"My car's engine fails."
"I crash into a lamp-post."

and so on.  These hypotheses are not meaningfully orderable.  Of course, you could put them in alphabetical order, or arbitrarily number them, but the phenomena they describe are not naturally more or less of each other; engine failure is not a particularly extreme form of flat tyre, for example.

Photo: Ildar Sagdejev

But that doesn't mean we can't attach probabilities to them, using available data and the usual inferential methods, e.g.:

"I run out of petrol." (p=0.1%)
"I get a flat tyre." (p=0.2%)
"My car's engine fails." (p=0.2%)
"I crash into a lamp-post." (p=0.01%)

If I'm planning a train trip, I might instead consider the following hypotheses:

"The train is delayed by 0-1 minutes"
"The train is delayed by 1-2 minutes"
"The train is delayed by 2-3 minutes"
"The train is delayed by 3-4 minutes"
(and so on, forever)

These hypotheses are meaningfully orderable, in terms of the length of the delay which the hypotheses propose.  As with the non-orderable hypotheses, we can attach probabilities to them, using available data and the usual inferential methods, e.g.:

"The train is delayed by 0-1 minutes" (p=40%)
"The train is delayed by 1-2 minutes" (p=24%)
"The train is delayed by 2-3 minutes" (p=14%)
"The train is delayed by 3-4 minutes" (p=9%)
(and so on, forever)

If you're not used to doing quantitative analysis, you might at this point wonder why the orderable / non-orderable distinction is supposed to be interesting.  The answer is that with orderable hypotheses, a whole range of mathematical techniques become available that simply don't work with the non-orderable type.  For example, we might be able to use a simple mathematical equation to describe the probabilities (a 'distribution') that will then act as a shorthand for the big table of hypotheses above.  In fact, we don't even need to break the hypotheses into minute-long chunks - we can just treat every possible length of delay as a separate hypothesis.  There'll be an infinite number of these, but as far as the maths is concerned, this isn't a problem.  If we drew a picture of the probability distribution, it might look something like this:

We might then be able to find some interesting metrics about this distribution, like its mean, standard deviation, maximum and so on.  One important thing to bear in mind, though, is that each hypothesis - each possible delay-length - has just one probability associated with it, and not a range of them.

(Strictly speaking, each hypothesis has a probability density associated with it.  But if you're a statistician you'll know this already, and if you're not, you probably don't need to worry about the distinction.)

Now a 'confidence interval' is a range of hypotheses that have some relationship to what we might think of as the 'true' value of the thing we're interested in.  A '95% confidence interval' for the train delay length - given whatever information we have - might be 'between 10 seconds, and 4 minutes 40 seconds'. This would mean something like that there is a 95% probability that the train will be delayed by between 10 seconds and 4 minutes 40 seconds.  (This definition is wrong, but not too wrong. There's no way to define exactly what a confidence interval means without getting overly technical for our purposes.)

The key thing to note though is this:

A confidence interval is a collection of hypotheses; it's not a concept that can be meaningfully applied to a particular hypothesis. 

With that point established, in the next post we'll look at the various competing theories of 'analytical confidence', before examining them closely to see which ones hold water.

Tuesday, 24 March 2015

Confidence and Probability Part 2: What do Confidence Statements Look Like?

In the last post we set out the problem of defining analytical 'confidence'.  In this post we'll define what 'analytical confidence statements' are, with some examples.

Probabilistic Judgements

Statements of analytical confidence attach levels of confidence to probabilistic judgements. A 'probabilistic judgement' is a probability attached to a hypothesis, such as:

"There is a 90% chance that it will rain tomorrow."

Probabilistic judgements sometimes use words to express probabilities, but they are probabilistic judgements nonetheless:

"It is extremely likely that human influence has been the dominant cause of the observed warming since the mid-20th century."  (IPCC (2013), Climate Change: the Physical Science Basis)

An assessment that something is certain or certainly false is still a probabilistic judgement, albeit one with probability 1 or 0:

"What I believe the assessed intelligence has established beyond doubt is that Saddam has continued to produce chemical and biological weapons."  (UK Government (2002), Iraq's Weapons of Mass Destruction)

So a 'probabilistic judgement' is a proposition, or statement, along with some measure of the probability which is assigned to it.

Confidence Statements

The assignment of confidence levels to probabilistic judgements is thus a further step down the road of qualifying a proposition.  It involves attaching a confidence level to a probabilistic judgement, with some intended effect on its force.  For example:

Photo: Luca Galuzzi
"The frequency of heavy precipitation events (or proportion of total rainfall from heavy falls) has increased over most areas (likely). Globally, the area of land classified as very dry has more than doubled since the 1970s (likely). There have been significant decreases in water storage in mountain glaciers and Northern Hemisphere snow cover. Shifts in the amplitude and timing of runoff in glacier- and snowmelt-fed rivers, and in ice-related phenomena in rivers and lakes, have been observed (high confidence)." (

In case it isn't clear here, the 'high confidence' at the end is intended to apply to the paragraph as a whole, including the statements flagged as 'likely'.  (The IPCC's guidance on confidence levels is here.)

Here is an example from a US National Intelligence Estimate:

"We continue to assess with low confidence that Iran probably has imported at least some weapons-usable fissile material, but still judge with moderate-to-high confidence it has not obtained enough for a nuclear weapon."

or, from a declassified paper released by the UK's Joint Intelligence Committee (JIC):

"Against that background, the JIC concluded that it is highly likely that the regime was responsible for the CW attacks on 21 August. The JIC had high confidence in all of its assessments except in relation to the regime’s precise motivation for carrying out an attack of this scale at this time – though intelligence may increase our confidence in the future."

Things that Look like Confidence Statements but Aren't what We're Talking About

There are a number of constructions which look like confidence statements, but which are not what we're talking about here.  Sometimes the language of confidence is used to make straightforward probabilistic judgements. This kind of thing is quite common  and includes statements like "I am confident it will rain" when this merely means something like "it is likely to rain".  Problematically, some analytical organisations actively recommend using confidence terms to express probability.  The extent to which probability and confidence are intertwined is something we'll look at shortly, but for clarity we are not thinking here about statements of probability disguised as confidence statements.

Confidence language is also sometimes used to express the credibility of a source of information.  This is the kind of confidence used to express reliability of or doubt about some evidence.  Ratings of this kind are used routinely in intelligence to express qualitative judgements about the nature of sources, but the idea also has a correlate in the information-theoretic concept of channel capacity.  There is a tempting symmetry in the following statements which might make us think that 'source confidence' is related to the kind of confidence we are discussing here:

"I am reasonably confident that there is an 80% chance of rain tomorrow."


"I am reasonably confident in Michael's judgement that there is an 80% chance of rain tomorrow."

However, we should not give in to this temptation.  The two statements are very different.  The second statement lends itself to a straightforward probabilistic interpretation: it suggests that there is some probability (your 'reasonable confidence') that Michael is in good command of his evidence and that his assessment of an 80% chance of rain is well-calibrated, but that there is also a probability that his assessment is poor and that some other probability (e.g. a base rate) would be more appropriate for decision-making purposes.  In theory the probabilities could then be combined (by you) into a final probability that would take account of Michael's judgement (i.e. as a piece of evidence) but not perhaps coincide with it.

But this is not an interpretation we can meaningfully use for:

"I am reasonably confident that there is a 80% chance of rain tomorrow."

unless the speaker is being particularly obtuse.  It is tempting to think that by making the statement above, you are specifying (part of) a probability distribution over a set of probabilities, and that you might clarify and continue in the following sort of manner:

"By 'reasonably confident', I mean that there is a 60% probability that that judgement is correct.  So what I mean is that there is a 60% chance that rain is 80% likely, and I also think there is a 40% chance that rain is instead 20% likely.  There is therefore a 60% x 80% + 40% x 20% = 56% chance of rain tomorrow."

This all sounds plausible.  However, for reasons we will set out later on, it cannot be given a coherent interpretation as meaning anything other than:

"There is a 56% probability of rain tomorrow."

The point here is that while there is a meaningful (but straightforward) interpretation for statements about confidence in third-party probability judgements (like Michael's) this interpretation cannot be made meaningful for statements about one's own judgements.

In Summary

The kind of statement we are interested in is one in which a confidence level is attached to a probability that is attached to a hypothesis, or, generically:

"I have confidence C that hypothesis H has probability P."

In the next post, we'll take a detour into statistical confidence intervals, primarily to clarify why the kind of confidence we're talking about here can't straightforwardly be interpreted using them.

Wednesday, 18 March 2015

'Confidence' and 'Probability': Introduction

Analysts, researchers, scientists and other seekers of truth very often wish to express a level of confidence in a probabilistic judgement. For example, most people if given some time to try out a fair coin will be happy to say:

"This coin has a 50% chance of coming up heads in the next toss."

and to express 'high confidence' in such a judgement. But if asked a question about the relative ages of celebrities, we might be prepared out of total ignorance to sign up to

"There is a 50% chance that David Cameron is older than Rick Astley."

but would be considerably less happy to do so and would want to express 'low confidence' in our judgement. (Spoiler: Rick is the elder by about eight months.)

More seriously, the issue of the meaning and communication of 'confidence' is of significance in the provision of scientific and other analytical advice to policymakers. Plausibly, the lack of a coherent way of communicating confidence was one reason for the convictions of six Italian scientists and one official following the L'Aquila earthquake of 6 April 2009. Difficulties with expressing confidence bedevil the intelligence community, as expressed by Lord Butler in his review of the intelligence concerning WMDs prior to the Iraq War:

"Such assessments often include warnings that the evidence is thin (and the word ‘Judgement’ is itself a signal to the reader that it is not a statement of fact). But it is not the current JIC convention to express degrees of confidence in the judgement or to include alternative or minority hypotheses. The consequence is that the need to reach consensus may result in nuanced language. Subtleties such as “the intelligence indicates” rather than “the intelligence shows” may escape the untutored or busy reader. We also came across instances where Key Judgements unhelpfully omitted qualifications about the limitations of the intelligence which were elsewhere in the text."

Are you sure about that?  How sure?
In short, there is widely understood to be a concept of 'confidence' that needs to be attached to judgements so that researchers can express themselves clearly and readers are not misled.

However, although in my experience analysts (and their customers) have strong intuitions regarding confidence, they have a great deal of difficulty expressing what they mean by it. Those with a scientific or statistical training are apt to try to express confidence using wholly unsuitable tools such as (unfortunately-named) confidence intervals. A number of more intuitive approaches have been developed, not all of which are wholly consistent or satisfactory. The apparently-simple concept of 'confidence' turns out to be much more complex than it appears.

In the next few posts, we will explore the concept of confidence in analytical judgements, and present the results of research and survey evidence conducted by Aleph Insights. The main questions we will seek to answer are:

  • Is it possible to make the concept of 'confidence' in a judgement, distinct from its probability, meaningful?
  • Does our intuitive concept of 'confidence' align to one of these meaningful interpretations?
  • Is it possible to design a consistent communicative tool to express 'confidence', that would be usable by analysts and comprehensible to their customers?

Saturday, 7 March 2015

Transitions between Political Regime Types

Political regimes can be classified in a number of ways.  One important distinction is that which divides democracies from the other things.  Quite apart from democracy's appeal as an end in itself, the evidence suggests that democracies engage less in oppression, human rights abuses, and internal and external uses of political violence.  If one opposed those things, one might support democracy over other systems for instrumental reasons, even if one didn't hold that it was necessarily better at producing more effective policies.

There are many theories about what causes countries to transition between democratic and non-democratic forms of government.  Wikipedia lists 14 factors advanced as possible promoters of democratisation, including wealth, education, culture, social equality and so on.  There are also the 'grand narrative' stories about political transition, such as Marxism or Fukuyama-style neoconservatism, in which the adoption of one system or another is seen as the result of a semi-predictable sequence of developments.

The proven difficulty of political forecasting, and the failure of one-size-fits-all theories to anticipate political crises, suggests that regime transition might be driven instead by numerous, possibly unobservable factors that jointly make it impossible to foresee political transitions in anything other than the very short term.  What if, instead of trying - perhaps fruitlessly - to develop complex models of political transition, we use something very simple which relies on no attempt to understand the underlying mechanics and looks instead only at observed frequencies of transition?  What would this 'total ignorance' approach tell us?

The University of Gothenburg's excellent 'Quality of Government' dataset provides (among many other things) a partly-subjective sixfold classification of political regimes into three kinds of democracy (parliamentary, presidential, and mixed), civilian dictatorship, military dictatorship, and royal dictatorship.  Data are not available for every country in every year, but assuming that at a wide zoom angle the picture is not systematically biased, the annual transition frequencies between the three dictatorship types and 'democracy', since 1946, are as follows:

The sizes of the boxes are not scaled to the average numbers of countries within them, and the percentages refer to the frequencies of countries in the box of origin: so on average 3% of military dictatorships become civilian dictatorships the following year, while 1% of democracies become military dictatorships, and so on.  Figures are to the nearest percentage point, and so transition frequencies of less than 0.5% are not shown to avoid clutter.

Purely as descriptive statistics, the figures do tell an interesting and convincing story.  For whatever reasons, democracies and royal dictatorships have been the most stable systems.  Civilian and miltiary dictatorships change state more frequently.  A military dictatorship is around four times more likely to transition to democracy than is a civilian one.  The route out of democracy is generally into military dictatorship (there is only one example of transition from democracy to royal dictatorship in the database - Nepal in 2002 - so this path is not shown on the diagram above).  And the royal dictatorship's days seem to be numbered: while dethronements occur occasionally, the creation of new royal dynasties does not.

Largely for fun, we can take an unwarranted step into forecasting by asking what would be implied if these frequencies were inherent features of the systems themselves rather than mere descriptive statistics.  We could then model political regime type as a Markov process, and ask what the 'steady state' numbers of regimes would be.  The answer is that the 'steady state' political world would look like this:

The royal dictatorships have disappeared.  The relative stability of democracy means that around three-quarters of countries are in that box.  But it's not the end of history, thanks to their occasional tendency to lapse into military dictatorships.  In this hypothetical future world, a military coup takes place in a democracy about once every 9 months.  Because it's a steady state, dictatorships go the other way at the same rate.

If - and it's a big 'if' - the assumptions required to build this model were right, how close would we be to steady state today?  The data only go as far as 2008, unfortunately, but at that time there were 118 democracies, 38 military dictatorships, 24 civilian dictatorships and 12 royal dictatorships.  Not far off, in other words, although this is less surprising given that the model is built using observed transitions.  The certainty does seem to be that the royal dictatorship has had its day.  The question for Bahrain, Brunei, Jordan, Kuwait, Morocco, Oman, Qatar, Saudi Arabia, Swaziland, Tonga, UAE and Samoa must be: "who's next?"  And can they perform the historically-difficult transition straight to democracy (only Bhutan and Nepal seem to have managed this) or is an interim military or civilian dictatorship inevitable?