I want to sell you this nickel – it has special powers to detect sleep apnea. When you toss it in front of a patient, heads is a positive test for sleep apnea. Since we are in the age of evidence-based medicine, I want to convince you this is the real deal, let me share the data with you. The experiment was to flip the coin for 180 people already diagnosed with sleep apnea (the true positives), and 20 healthy adults with no sleep disorders (the true negatives). We found that the nickel was correct 90% of the time when it came up heads! In statistical parlance, this means the nickel test has a 90% positive predictive value – excellent!

Let’s walk through the data calculations, in case you remain skeptical of the magic nickel. The 2x2 box (Figure A) shows how a standard experiment is set up. The left column contains the true cases and the right column contains the healthy controls. These columns represent the true or gold standard status of each person. The top row contains those who had a positive coin toss (heads), and the bottom row contains those who had a negative coin toss (tails). The sensitivity is the proportion of positive tests from the pool of known cases, and the specificity is the proportion of negative tests from the pool of healthy controls. The positive predictive value is the proportion of true positives over all who tested positive, while the negative predictive value is the proportion of true negatives over all who tested negative.

My coin is actually so special that it actually performs magic even though it has an equal chance of showing heads or tails when you flip it. Figure (B) shows that the sensitivity and specificity are each 50%, which would be expected for an ordinary coin. The magic comes when we calculate the positive predictive value: this is defined as the true positives (upper left box) divided by all positives (sum of the upper left and the upper right boxes). We see this is 90 divided by 90+10 (i.e., 100), which equals 90%! The math clearly shows the coin has the magic touch!

WAIT - Where is the trick? How can a fair coin, randomly showing up heads or tails, correctly predict a person’s disease status? Luckily, with a little help from Bayes’ Theorem we can see right through this smoke screen. In this experiment, 90% of the people tested had sleep apnea (180 of the 200 total). The coin flip was really just a random result. It came up heads for half of the people, as we would expect for a normal coin. If we looked at a random group from our experiment (say, half of them), we would expect 90% of them to have sleep apnea. The coin did just that – randomly “picked” half of the people as heads – and thus it didn’t tell us anything at all beyond what we already would have guessed. Likewise, when the coin came up tails, it was only “right” 10% of the time – because only 10% of the population did not have sleep apnea.

Bayes’ Theorem tells us that we need three ingredients to make sense of any test result: sensitivity, specificity, and pre-test probability. Tragedy awaits any who tread the fields of test interpretation without being armed by all three. The sensitivity is the proportion of true cases correctly identified by the test as having the disease – the coin has a sensitivity of 50%. The specificity is the proportion of healthy people the test correctly identifies as NOT having the disease – the coin has a specificity of 50%. The pre-test probability is the portion of the population being tested who have the disease – in this experiment, it was 90%.

Now that is a lot of math – if only there were a quick rule of thumb so you can’t be fooled by people trying to sell magic coins… there is! I call it the “Rule of 100”. If any test has a sensitivity and specificity that add up to 100%, the test is performing at chance level – like a random coin. In case you are wondering if this is something special about the 50%-50% coin example, Figure (C) shows the same calculations as (B) but now using a coin that is biased towards heads, which comes up 90% of the time. In this case, sensitivity (90%) plus specificity (10%) adds to 100, and we see the positive predictive value is still 90%, just like when the first example magic coin was actually an ordinary fair coin.

Why would anyone make a test that was nothing more than a random number generator? It’s the tragedy of failing to consider all three ingredients. It happens unfortunately more than we would like. Although there are many published examples, one recent article serves to highlight the problem: reported screening test results for sleep apnea tool that violates the Rule of 100, but was actually called “accurate” by the authors (see my blog posting on this paper).

Let’s walk through the data calculations, in case you remain skeptical of the magic nickel. The 2x2 box (Figure A) shows how a standard experiment is set up. The left column contains the true cases and the right column contains the healthy controls. These columns represent the true or gold standard status of each person. The top row contains those who had a positive coin toss (heads), and the bottom row contains those who had a negative coin toss (tails). The sensitivity is the proportion of positive tests from the pool of known cases, and the specificity is the proportion of negative tests from the pool of healthy controls. The positive predictive value is the proportion of true positives over all who tested positive, while the negative predictive value is the proportion of true negatives over all who tested negative.

My coin is actually so special that it actually performs magic even though it has an equal chance of showing heads or tails when you flip it. Figure (B) shows that the sensitivity and specificity are each 50%, which would be expected for an ordinary coin. The magic comes when we calculate the positive predictive value: this is defined as the true positives (upper left box) divided by all positives (sum of the upper left and the upper right boxes). We see this is 90 divided by 90+10 (i.e., 100), which equals 90%! The math clearly shows the coin has the magic touch!

WAIT - Where is the trick? How can a fair coin, randomly showing up heads or tails, correctly predict a person’s disease status? Luckily, with a little help from Bayes’ Theorem we can see right through this smoke screen. In this experiment, 90% of the people tested had sleep apnea (180 of the 200 total). The coin flip was really just a random result. It came up heads for half of the people, as we would expect for a normal coin. If we looked at a random group from our experiment (say, half of them), we would expect 90% of them to have sleep apnea. The coin did just that – randomly “picked” half of the people as heads – and thus it didn’t tell us anything at all beyond what we already would have guessed. Likewise, when the coin came up tails, it was only “right” 10% of the time – because only 10% of the population did not have sleep apnea.

Bayes’ Theorem tells us that we need three ingredients to make sense of any test result: sensitivity, specificity, and pre-test probability. Tragedy awaits any who tread the fields of test interpretation without being armed by all three. The sensitivity is the proportion of true cases correctly identified by the test as having the disease – the coin has a sensitivity of 50%. The specificity is the proportion of healthy people the test correctly identifies as NOT having the disease – the coin has a specificity of 50%. The pre-test probability is the portion of the population being tested who have the disease – in this experiment, it was 90%.

Now that is a lot of math – if only there were a quick rule of thumb so you can’t be fooled by people trying to sell magic coins… there is! I call it the “Rule of 100”. If any test has a sensitivity and specificity that add up to 100%, the test is performing at chance level – like a random coin. In case you are wondering if this is something special about the 50%-50% coin example, Figure (C) shows the same calculations as (B) but now using a coin that is biased towards heads, which comes up 90% of the time. In this case, sensitivity (90%) plus specificity (10%) adds to 100, and we see the positive predictive value is still 90%, just like when the first example magic coin was actually an ordinary fair coin.

Why would anyone make a test that was nothing more than a random number generator? It’s the tragedy of failing to consider all three ingredients. It happens unfortunately more than we would like. Although there are many published examples, one recent article serves to highlight the problem: reported screening test results for sleep apnea tool that violates the Rule of 100, but was actually called “accurate” by the authors (see my blog posting on this paper).

__Contributor__: Matt Bianchi MD PhD