### Preamble

Simpson's Paradox can occur when an apparent association between two variables $X$ and $Y$ is affected by the presence of a confounding variable, $Z$. In Simpson's Paradox, the confounding is so extreme that the association between $X$ and $Y$ actually disappears or reverses itself after conditioning on the confounder $Z$.

Here's the entire 'statistical confounding' series:

- Part 1: Statistical confounding: why it matters: on the many ways that confounding affects statistical analyses

- Part 2: Simpson's Paradox: extreme statistical confounding (this post): understanding how statistical confounding can cause you to draw exactly the wrong conclusion

- Part 3: Linear regression is trickier than you think: a discussion of multivariate linear regression models

- Part 4: A gentle introduction to causal diagrams: a causal analysis of fake data relating COVID-19 incidence to wearing protective goggles

- Part 5: How to eliminate confounding in multivariate regression: how to do a causal analysis to eliminate confounding in your regression analyses### A famous example: kidney stone treatments

Here is a famous example of Simpson's Paradox occurring in nature, in a medical study comparing the efficacy of kidney stone treatments (here's a link to the original study).

In this example, we are comparing two treatments for kidney stones. The data show that, over all patients, Treatment B is successful in 83% of cases, and Treatment A is successful in only 78% of cases.

However, if we consider only patients with large kidney stones, then Treatment A is successful in 73% of cases, whereas Treatment B is successful in only 69% of cases.

And if we consider only patients with small kidney stones, the Treatment A is successful in 93% of cases, where Treatment B is successful in only 87% of cases.

Suppose you're a kidney stone patient. Which treatment would *you *prefer? Since I'd presumably have either a small kidney stone or a large one, and Treatment A works better for either one, I'd prefer Treatment A. But looking at all patients overall, this result says Treatment B is better. Does this mean that if I don't know what size kidney stone I have, I should prefer Treatment B? (No). Why is this happening?

This is happening because the small-vs-large-kidney stone factor is a confounding variable, as discussed in this post on statistical confounding from last week.

The diagram below shows the causal relationships among three variables applying to every kidney stone patient. Either Treatment A or B is selected for the patient. Either the treatment is either considered successful, or it isn't. And the confounding variable is in red: either the patient has a large kidney stone, or they do not.

The size of the kidney stone, reasonably, has an impact on how successful the treatment is; similarly, we're assuming the treatment choice affects the success of the treatment.

But here's the confounding factor: the stone size, in red, also affects the choice of treatment for the patient. Treatment A is more invasive (it's surgical), and so it's more likely
than Treatment B to be applied to severe cases with larger kidney
stones. Conversely, Treatment B is more likely to be applied to smaller
kidney stone cases, which are lower risk to begin with. Since the size of the kidney stone is influencing the choice of Treatments A vs. B, the causal
diagram has an arrow from the size variable to the Treatment variable. And this is the 'back door', from the stone size variable into the Treatment choice variable, that is causing the confounding.

To see what is actually happening, look at the total numbers of patients in each of the four kidney stone subgroups:

- Treatment A, large stones: 263
- Treatment A, small stones: 87
- Treatment B, large stones: 80
- Treatment B, small stones: 270

Clearly the size of the stone is impacting the treatment choice.

But stone size is also a huge predictor for treatment success: the larger the stone size, the harder it is for *any* treatment to succeed. So a higher proportion of small stone, Treatment B cases succeed than of large stone, Treatment A cases. And that's what's causing Simpson's Paradox.

### Visualizing Simpson's Paradox for count data

During data analysis, we'll break down the total sample of kidney stone patients into subgroups by whether they got Treatment A or B. We can break it down further in any way we choose; for example, we can subset the data by age, by gender, or by both at once. Or we can further subset the patients based on whether they had a large kidney stone. This subsetting will result in groups which we'll denote by $g$.

We can visualize subgroup $g$'s experimental results by placing it in the graph as a vector $\vec{g}$ from the origin to the point $(S_g,T_g)$, where $T_g$ is the number of patients in the group, and $S_g$ is the number of patients in the group with successful outcomes.

In the diagram above, we see that the subgroups Treatment A for small stones, and Treatment B for large stones, were much smaller in length than the other two (because there were fewer trials in those subgroups). But their lengths do not matter when considering the per-group success rates $S_g/T_g$; all that matters is their slopes. Treatment A's slope for small stones is higher than Treatment B's slope for small stones; the same holds the large stone groups. So within each subgroup, Treatment A is more successful.

But if we restrict our attention to the two longest vectors in the middle, we can see that the Treatment B, small stones vector has a higher slope than the Treatment A, large stones vector. This is mainly due to the fact that people with large kidney stones generally have worse outcomes, regardless of how they are treated.

We get the vector corresponding to the combined group in Treatment A by summing the two green Treatment A vectors. Similarly, we sum the two black Treatment B vectors to get the aggregated Treatment B vector. When we do this, we can see that the Treatment B vector has the higher slope.

Simpson's Paradox reversals don't occur often in nature, though there are a few examples (like this one). But subtler forms of statistical confounding definitely do occur, all the time, in settings where they affect the conclusions of observational studies.

## No comments:

## Post a Comment