Friday, 5 December 2014

Evidence of Absence

How much absence of evidence do you need for it to constitute evidence of absence?  Despite what the maxim says, all absence of evidence is evidence of absence, assuming that the evidence in question had a chance to show itself but didn't.

The question of when to conclude that something is absent is a surprisingly common problem.  When you are searching a suspect's house, when do you decide to give up looking?  When you are waiting for a lift, when do you conclude the lift is broken?  When you are searching satellite imagery for nuclear facilities, when can you assume they're not there?  When you are looking for ultrasound 'evidence' that a foetus is a boy, when do you conclude it's a girl?

These possible-needle-in-haystack problems (ones where we're not sure the needle is actually there) are all governed by the same underlying information process: in any given period of time, you have some probability of establishing a hypothesis with 100% certainty, but no chance per se of absolutely falsifying it.

It's in there somewhere... maybe

We can put numbers on the question by considering two search methods which represent extreme cases for favourable searching.  The first is when you exhaustively search the haystack, straw by straw, until the needle is found, in very much the way that the Mythbusters did.  The second is when you randomly choose bits of the haystack to search, so that at any given time you might be searching part of the haystack you've already ruled out.

The first process is governed by a simple likelihood function.  Assume it takes a hundred man-hours to search a large haystack for a possible needle.  The probability of not finding it if it's there is (100-t)%, where t is the search time so far.  (The probability of not finding it if it's not there is of course always 100%).  The second process - random searching with replacement - is governed by an exponential function, which is something that appears regularly in dynamic information problems.  Assuming the same search rate above, the probability of not finding the needle after t hours is (100-100e^(0.01t))%, where e is the natural logarithm 2.71828...

In the first process, you have a 50% chance of finding the needle, if it's there, after half the time.  In the second process you have a 50% chance of finding it after around 70% of the time.  If, when we start, we believe there's a 50% chance that the needle's in there at all, the graph of the probability it's there, over time, looks like this:

So intuition, which suggests systematic searching is better than random searching, is right in this case.  It is possible to imagine search strategies which are worse than random searching with replacement, which would involve being more likely to search areas you'd already looked at.  But in general, if it would take you time T to search the whole area, absence of evidence after time T will mean that absence is at least three times more likely, relative to presence, than it was when you started.

No comments: