ROC Curve

Binary Diagnostic Tests: A Friendly Cheat Sheet

3.1 One-page Concepts

Sensitivity (Se = TPR): ability to correctly flag patients as positive
Specificity (Sp = TNR): ability to correctly flag healthy people as negative
PPV: if the test is positive, the chance the person truly has the disease
NPV: if the test is negative, the chance the person is truly healthy
Prevalence: fraction of patients in your population (a prior)

Memory TIP

Se: “don’t miss cases” (minimize false negatives)

Sp: “avoid false alarms” (minimize false positives)

PPV/NPV depend heavily on prevalence (Bayes!)

2×2 Table (Confusion Matrix)

	True Disease (D)	True No Disease (D^{c})
Test Positive (+)	TP = (x_1)	FP = (x_2)
Test Negative (-)	FN = (x_3)	TN = (x_4)

Sensitivity (Se, TPR): (\displaystyle \frac{x_1}{x_1+x_3}=P(+\mid D))
Specificity (Sp, TNR): (\displaystyle \frac{x_4}{x_2+x_4}=P(-\mid D^{c}))
PPV: (\displaystyle \frac{x_1}{x_1+x_2}=P(D\mid +))
NPV: (\displaystyle \frac{x_4}{x_3+x_4}=P(D^{c}\mid -))
Prevalence (Prev): (\displaystyle P(D))

Why are PPV/NPV sensitive to prevalence? (Bayesian intuition)

[ PPV=\frac{Se\cdot Prev}{Se\cdot Prev+(1-Sp)(1-Prev)},\qquad NPV=\frac{Sp(1-Prev)}{(1-Se)Prev+Sp(1-Prev)}. ]

With a rare disease (Prev ↓), false positives dominate → PPV drops
Conversely NPV gets very high (most people are healthy, so a negative is almost surely true)

1-minute numeric feel
Prev (=0.01), Se (=0.99), Sp (=0.98)

[ PPV \approx \frac{0.99\cdot 0.01}{0.99\cdot 0.01+0.02\cdot 0.99} \approx 0.33 ] → Only ~1 in 3 positives are true patients

(NPV \approx 0.9999) → A negative is almost certainly healthy

Bigger picture (population 10,000)

100 patients, 9,900 healthy
- TP = 99, FN = 1 (Se 0.99)
- TN = 9,702, FP = 198 (Sp 0.98)
PPV = (99/(99+198) \approx 0.333)
NPV = (9702/(9702+1) \approx 0.9999)

Which metric when? (Practical cues)

Maximize Se when missing a case is costly (cancer screening, infectious-disease triage)
→ Lower the threshold (Se↑) → follow with a high-Sp confirmatory test
Maximize Sp when false positives are costly (invasive confirm, expensive therapy)
→ Raise the threshold (Sp↑)
PPV/NPV are for patient-facing probabilities (“If positive, chance you truly have it is ~%”)
→ Recompute with your prevalence via Bayes
Don’t use Accuracy alone: with rare diseases, saying “all negative” looks accurate but is useless

Thresholds and ROC (Receiver Operating Characteristic)

4.1 What’s a threshold?

Many tests/readers output a score (continuous/ordinal).
Classify as positive if score ≥ cutoff; otherwise negative.

Threshold ↑ → Sp↑, Se↓ (conservative: fewer false alarms, more misses)
Threshold ↓ → Se↑, Sp↓ (aggressive: fewer misses, more false alarms)

Se and Sp trade off. Fixing a single threshold can be misleading.

ROC curve: a performance map over thresholds

x-axis: FPR (= 1 - Sp) (false-positive rate)
y-axis: TPR (= Se) (true-positive rate)
Sweep thresholds, plot ((\mathrm{FPR}, \mathrm{TPR})) → ROC curve

AUC (area under curve)

1.0: perfect separation
0.5: random guessing (diagonal)
Interpretation: probability a random patient scores higher than a random healthy subject

In practice: threshold-agnostic summary.
Even with similar AUCs, choose by the FPR/TPR region that matters for your use case.

Empirical ROC vs Parametric ROC

Empirical ROC: compute (Se, Sp) at each threshold → connect the dots
Parametric ROC: fit score distributions (e.g., normal for healthy/disease) → can provide AUC uncertainty (CIs)

Field cheat sheet

Define your goal: minimize misses (Se↑) or false alarms (Sp↑)?
Know your prevalence: prior from literature/registry/local data.
Re-compute PPV/NPV by Bayes using your prevalence for patient counseling.
Don’t fixate on one cutoff: use ROC/AUC and set a threshold in your operating region.
Two-stage strategy: 1st screen (Se↑) → 2nd confirm (Sp↑).

One-liner

Se/Sp: intrinsic test skill (less affected by prevalence)
PPV/NPV: “probability it’s real” for patients (strongly prevalence-dependent)
ROC/AUC: fair, threshold-free comparison of overall performance

Tips

True Positive? False Positive?

On an ROC curve, FPR and TPR are plotted on the x and y axes.
“Positive” means the classifier/test said “yes”.

True vs False indicates whether that decision was correct.
- True Positive (TP): actually diseased and called positive.
- False Positive (FP): actually healthy but incorrectly called positive.

Relationship between TPR and FPR

Consider the tester (a clinician or classifier):

Doctor A calls almost everyone positive (very low threshold).
- TPR high (few missed cases) and FPR high (many false alarms).
Doctor B calls almost everyone negative (very high threshold).
- TPR low and FPR low (misses many true cases, few false alarms).

Great tests aim for high TPR with low FPR.

References

Source: [https://angeloyeo.github.io/2020/08/05/ROC.html]

PREVIOUSBayes' theorem

NEXTBinominal distribution