MGH Sleep Center
  • Home
    • Contact
  • For Patients
    • Patient Intake Form
    • Tips for Sleeping Better
    • Symptoms
    • Description of Sleep Disorders
    • Sleep Study FAQs
    • Position Therapy
    • Printable Resources
  • For Providers
    • Consult, Sleep Study, or Both?
    • Choosing the Right Sleep Test
    • Home Testing
    • The Polysomnogram
    • Tips for Managing Sleep Apnea
    • Position Therapy
    • Tips for Managing Insomnia
    • Services Offered
  • Sleep Center
    • Services Offered
    • Affiliated Faculty
    • Sleep Lab Technicians
    • History of the MGH Neurology Sleep Lab
    • What's New?
  • Sleep Research
    • Matt Bianchi, MD, PhD, MMSc
    • Leonard B. Kaban, DMD, MD, FACS
    • Bernard Kinane, MD
    • Aleksandar Videnovic, MD
    • John W. Winkelman, MD, PhD
    • Michael J. Prerau, Ph.D.
    • Support Research- Neurology Development Office
  • FAQs
  • Director's Blog

The Tale of the Magic Coin: A Bayesian Tragedy

11/9/2016

 
     I want to sell you this nickel – it has special powers to detect sleep apnea.  When you toss it in front of a patient, heads is a positive test for sleep apnea.  Since we are in the age of evidence-based medicine, I want to convince you this is the real deal, let me share the data with you.  The experiment was to flip the coin for 180 people already diagnosed with sleep apnea (the true positives), and 20 healthy adults with no sleep disorders (the true negatives).  We found that the nickel was correct 90% of the time when it came up heads!  In statistical parlance, this means the nickel test has a 90% positive predictive value – excellent!
     Let’s walk through the data calculations, in case you remain skeptical of the magic nickel.  The 2x2 box (Figure A) shows how a standard experiment is set up.  The left column contains the true cases and the right column contains the healthy controls.  These columns represent the true or gold standard status of each person.  The top row contains those who had a positive coin toss (heads), and the bottom row contains those who had a negative coin toss (tails). The sensitivity is the proportion of positive tests from the pool of known cases, and the specificity is the proportion of negative tests from the pool of healthy controls.  The positive predictive value is the proportion of true positives over all who tested positive, while the negative predictive value is the proportion of true negatives over all who tested negative.     
     My coin is actually so special that it actually performs magic even though it has an equal chance of showing heads or tails when you flip it.  Figure (B) shows that the sensitivity and specificity are each 50%, which would be expected for an ordinary coin.  The magic comes when we calculate the positive predictive value: this is defined as the true positives (upper left box) divided by all positives (sum of the upper left and the upper right boxes).  We see this is 90 divided by 90+10 (i.e., 100), which equals 90%!  The math clearly shows the coin has the magic touch!
     WAIT - Where is the trick?  How can a fair coin, randomly showing up heads or tails, correctly predict a person’s disease status?  Luckily, with a little help from Bayes’ Theorem we can see right through this smoke screen.  In this experiment, 90% of the people tested had sleep apnea (180 of the 200 total).  The coin flip was really just a random result.  It came up heads for half of the people, as we would expect for a normal coin.  If we looked at a random group from our experiment (say, half of them), we would expect 90% of them to have sleep apnea.  The coin did just that – randomly “picked” half of the people as heads – and thus it didn’t tell us anything at all beyond what we already would have guessed.  Likewise, when the coin came up tails, it was only “right” 10% of the time – because only 10% of the population did not have sleep apnea.   
     Bayes’ Theorem tells us that we need three ingredients to make sense of any test result:  sensitivity, specificity, and pre-test probability.  Tragedy awaits any who tread the fields of test interpretation without being armed by all three.  The sensitivity is the proportion of true cases correctly identified by the test as having the disease – the coin has a sensitivity of 50%.  The specificity is the proportion of healthy people the test correctly identifies as NOT having the disease – the coin has a specificity of 50%.  The pre-test probability is the portion of the population being tested who have the disease – in this experiment, it was 90%. 
     Now that is a lot of math – if only there were a quick rule of thumb so you can’t be fooled by people trying to sell magic coins… there is!  I call it the “Rule of 100”.  If any test has a sensitivity and specificity that add up to 100%, the test is performing at chance level – like a random coin.  In case you are wondering if this is something special about the 50%-50% coin example, Figure (C) shows the same calculations as (B) but now using a coin that is biased towards heads, which comes up 90% of the time.  In this case, sensitivity (90%) plus specificity (10%) adds to 100, and we see the positive predictive value is still 90%, just like when the first example magic coin was actually an ordinary fair coin.
Why would anyone make a test that was nothing more than a random number generator?  It’s the tragedy of failing to consider all three ingredients.  It happens unfortunately more than we would like.  Although there are many published examples, one recent article serves to highlight the problem: reported screening test results for sleep apnea tool that violates the Rule of 100, but was actually called “accurate” by the authors (see my blog posting on this paper).
               

Contributor:  Matt Bianchi MD PhD

A reflection on risk:  the ASV safety alert 1 year later

11/2/2016

 

     On May 13, 2015, ResMed announced a safety alert ahead of publication of a large trial of their adaptive PAP system (“ASV”) for heart failure patients with central apnea [1].  The primary endpoint that the trial was designed to answer showed no effect of ASV.  The unexpected finding came in an exploratory subset analysis (not the main goal of the trial): higher cardiovascular mortality in those using ASV (annual risk of 10% versus 7.5%).  On May 15, the American Academy of Sleep Medicine posted their initial response to this announcement.  In June 2015, the annual SLEEP conference hosted a special session in which cogent criticisms and concerns were raised in a balanced discussion about the trial, but the risk announcements had already been made public. A recent editorial detailed many points of uncertainty about the trial [2].  It remains unknown why the main outcome was negative, and why the subset analysis suggested increased cardiac risk with ASV use.
     As with the controversial SAVE trial results recently published (see blog entry of September 22, 2016), the details of a study influence how confidently one can interpret the findings.  This should come as no surprise:  extensive effort and resources go into trial design to ensure the data is the highest quality possible.  The MGH Sleep Division reviewed the ASV trial, called the “SERVE-HF” study, and admittedly our own physicians held diverse opinions about the results, and each brings their own calculus of risk tolerance.  Patients also are entitled to their own risk calculus – ideally formulated together with their treating physicians for any healthcare decision.
     The challenges of the SERVE-HF trial can be summarized in two major categories: experimental design, and therapy effectiveness.  In this trial, wearing ASV for sleep apnea, versus no treatment for sleep apnea, was randomly assigned.  The intention of randomization is to make sure that other factors that could impact the results are evenly distributed (by chance) in each group.  The problem is, it didn’t work for the SERVE-HF trial:  it turned out, by chance, that the group assigned to wear ASV had a crucial significant difference: a 42% higher rate of anti-arrhythmia medication use.  Why is this important?  Because being on such a drug was associated with a greater mortality risk for patients on all three major outcomes (combined endpoint, all-cause mortality, and cardiovascular mortality).  The trial did not report smoking status, or sleeping pill use, both of which are independently associated with mortality in other studies – might these have also been unevenly occurring (by chance) in the ASV group?  Might this have contributed to the excess cardiac mortality in the ASV group?
     Although the investigators made efforts to ensure the machines were working properly, many using ASV had significant levels of ongoing sleep apnea according to objective measures including oximetry.  A machine-reported event index of <10 per hour was taken to indicate adequate therapy – however, recent data suggests that the proprietary algorithms used are too lenient, meaning that breathing problems are often worse than the machine is reporting [3].  In addition, compliance with ASV was modest, only about 3.5hrs per night.  Other details of the study include that patients with <20% Cheyne-Stokes pattern showed benefit with ASV, and a non-significant trend toward worsening risk in those with LVEF<30%. 
     These and other factors complicate interpretation of the data.  What about the possibility that ASV is actually harmful to certain patients?  Some ideas include that high pressures and over-ventilation negatively impact cardiac function, and the patients in this trial were vulnerable due to their substantial heart failure.  Years before this trial, my colleague at BIDMC, Dr Robert Thomas, described several potential concerns about the “complexity” of complex apnea and the reliance on machine algorithms [4]. 
     As the field struggles with these concerns and awaits future data to help navigate the uncertainty, providers and patients must work together to balance risk-benefit trade-offs.  This may be easier said than done when risk tolerance may differ among regulators, physicians, and patients.  

References:
[1] Cowie et al (2015) Adaptive Servo-Ventilation for Central Sleep Apnea in Systolic Heart Failure.
N Engl J Med. 373(12):1095-105.  Full text here:  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4779593/
 
[2] Javaheri et al (2016) SERVE-HF: More Questions Than Answers.  Chest. 2016 Apr;149(4):900-4.
 
[3] Reiter et al (2016) Residual Events during Use of CPAP: Prevalence, Predictors, and Detection Accuracy.  J Clin Sleep Med. 12(8):1153-8.
 
[4] Thomas (2011) The chemoreflex and sleep-disordered breathing: man and machine vs. the beast.
Sleep Med. 12(6):533-5.

 
Contributor:  Matt Bianchi MD PhD

    Dr. Bianchi's Blog

    Archives

    November 2017
    October 2017
    September 2017
    August 2017
    June 2017
    March 2017
    January 2017
    December 2016
    November 2016
    October 2016
    September 2016

    Guided Self Testing
    Insomnia Feedback Pilot
    The Mother of all Statistical Tests
    Dress to Impress, I Guess
    ACGME and residency work hours
    REM sleep and dementia
    Sleep Apnea and Sleep Architecture
    Screening for OSA: automated algorithm
    Introducing the Bayes Statistics Avenger!
    The missing link in jet-lag planning
    Reflections on drug therapy for insomnia
    Dr Bianchi tests 5 sleep trackers
    Paradoxes- Bayes to the rescue!
    The Tale of the Magic Coin: A Bayesian Tragedy
    A reflection on risk: the ASV safety alert 1 year later
    OSA screening paradox
    SAVE Trial
Powered by Create your own unique website with customizable templates.