Disease screening is an important part of medicine, as it can help doctors and patients prioritize who may warrant further testing and/or treatment. Often a trade-off accompanies screening: ease of use and low cost (benefits) are balanced against the sub-optimal accuracy of the screen (risk). Beyond this balancing act, we cannot stop with the result in hand - the result also requires proper interpretation by the doctor. No test in medicine is perfect. One of the most important skills a physician can develop is an intuition for recognizing false-positive and false negative results when testing for any disease. The intuition can be qualitatively illustrated in two examples. if I strongly suspect that a patient has a disease, but their screen test comes back negative, I might reasonably wonder whether the screen was a false-negative. Consider a physician evaluating a negative screening questionnaire for sleep apnea from a patient who is obese and has other medical problems often linked to sleep apnea (e.g. heart disease, diabetes, difficult to control hypertension, etc) – it would be a mistake to blindly trust the screening result in this case. Likewise, if a screen result comes back positive in a patient with very low risk of sleep apnea, we may reasonably wonder if the result is a false-positive.
How often does this really happen, and how can doctors improve their interpretation of screening tests especially when the result is unexpected compared to their clinical suspicions? Let’s look at the “STOP-BANG” questionnaire, which has 8 items and is considered the best-validated tool for sleep apnea screening. The acronym stands for snoring, tired, observed apnea, pressure (hypertension), BMI >35, age >50, neck circumference >40, gender (male). When the initial validation of this tool was published over 5 years ago, it was discussed as if the 8-item score indicated a patient’s risk. While this may seem reasonable on the surface, remember that we need that third ingredient of context: how likely was the patient to have sleep apnea in the first place, before we used the screen? Interested readers can see my editorial on how to interpret their initial validation results here. The need to combine prior information, with current data, is exactly where Bayes’ Theorem helps us: the sensitivity and specificity of the test is combined with the pre-test probability of disease, to yield a new probability of disease (the “post-test probability”). See our recent blog on this topic for more details.
Because of the ongoing interest in screening for OSA, several publications have emerged since then. Most recently, the original group published a new analysis, attempting to improve the screen. Fortunately, they do discuss the pre-test probability. Unfortunately, they continue to claim that the score indicates the patients risk, which creates a paradox. In their publication, they performed polysomnography to know what portion was normal and what portions were diagnosed with mild, moderate, and severe sleep apnea. It turns out that about one-third of the patients they tested had at least moderate sleep apnea. By using this value as the pre-test probability, and then updating the probability after a positive STOP-BANG test, the post-test probability of sleep apnea was over 50%. It seems reasonable to call this post-test probability a “high risk” for sleep apnea. But it was >50% because of all three ingredients – not just the test result – but also the pre-test probability. For those with a normal STOP-BANG score, they assign the label “low risk”. One could argue whether the post-test probability here, of 20%, of at least moderate sleep apnea should really be considered “low”, especially since it is far higher than reported in the general adult population. But that is not the paradox – that comes if we think about a positive result from the very same tool, but in a population where only 5% are expected to have sleep apnea. Remember, their claim is that a positive screen result means high risk. But in this kind of population, the positive screen result gives us a post-test probability lower than 20% - which the authors just said should be called “low risk”. This apparent paradox is understood, and avoided, by simply remembering Bayes’ Theorem: without knowing all three ingredients, we cannot interpret the test properly. Put another way, the risk is not simply given by the test result – it must be interpreted in context, and that context is the pre-test probability.
Contributed by: Matt Bianchi MD PhD
How often does this really happen, and how can doctors improve their interpretation of screening tests especially when the result is unexpected compared to their clinical suspicions? Let’s look at the “STOP-BANG” questionnaire, which has 8 items and is considered the best-validated tool for sleep apnea screening. The acronym stands for snoring, tired, observed apnea, pressure (hypertension), BMI >35, age >50, neck circumference >40, gender (male). When the initial validation of this tool was published over 5 years ago, it was discussed as if the 8-item score indicated a patient’s risk. While this may seem reasonable on the surface, remember that we need that third ingredient of context: how likely was the patient to have sleep apnea in the first place, before we used the screen? Interested readers can see my editorial on how to interpret their initial validation results here. The need to combine prior information, with current data, is exactly where Bayes’ Theorem helps us: the sensitivity and specificity of the test is combined with the pre-test probability of disease, to yield a new probability of disease (the “post-test probability”). See our recent blog on this topic for more details.
Because of the ongoing interest in screening for OSA, several publications have emerged since then. Most recently, the original group published a new analysis, attempting to improve the screen. Fortunately, they do discuss the pre-test probability. Unfortunately, they continue to claim that the score indicates the patients risk, which creates a paradox. In their publication, they performed polysomnography to know what portion was normal and what portions were diagnosed with mild, moderate, and severe sleep apnea. It turns out that about one-third of the patients they tested had at least moderate sleep apnea. By using this value as the pre-test probability, and then updating the probability after a positive STOP-BANG test, the post-test probability of sleep apnea was over 50%. It seems reasonable to call this post-test probability a “high risk” for sleep apnea. But it was >50% because of all three ingredients – not just the test result – but also the pre-test probability. For those with a normal STOP-BANG score, they assign the label “low risk”. One could argue whether the post-test probability here, of 20%, of at least moderate sleep apnea should really be considered “low”, especially since it is far higher than reported in the general adult population. But that is not the paradox – that comes if we think about a positive result from the very same tool, but in a population where only 5% are expected to have sleep apnea. Remember, their claim is that a positive screen result means high risk. But in this kind of population, the positive screen result gives us a post-test probability lower than 20% - which the authors just said should be called “low risk”. This apparent paradox is understood, and avoided, by simply remembering Bayes’ Theorem: without knowing all three ingredients, we cannot interpret the test properly. Put another way, the risk is not simply given by the test result – it must be interpreted in context, and that context is the pre-test probability.
Contributed by: Matt Bianchi MD PhD