Linear regression is trickier than you think

 Preamble

In my last two posts, I talked about statistical confounding: why it matters in statistics, and what it looks like when it gets really extreme (Simpson's Paradox)

In my next few blog posts, I want to talk about some tricks for controlling statistical confounding in the context of multivariate linear regression, which is about the simplest kind of model that can be used to relate more than 2 variables. Although I've taken a full load of statistics classes including a whole course on multivariate linear regression alone, I never learned how to choose the right variables to include for a desired analysis until I came across it in Richard McElreath's book 'Statistical Rethinking'. 

In short, it's likely to be something that most machine learning and data science practitioners wouldn't ordinarily pick up in a class on regression, and it's useful and kind of fun. 

Controlling confounding requires drawing hypothetical diagrams of how your variables might relate causally to each other, doing some checks to determine whether the data conflict with the hypotheses, and then using the diagrams to derive sets of variables to exclude and include. It's a nice interplay between high level thinking about causality, and mechanical variable selection. 

This week's post is an introduction where I'll set the stage a bit.

Multivariate linear regression

Multivariate linear regressions are the first type of frequentist models you encounter as a statistician. They are used to relate an outcome variable $Y$ in a data set to any number of covariates $X_i$ which accompany it. For example, the height that a tree grows this year, $H$, might be associated with several continuous covariates, such as the number of hours of sunlight it receives per day $S$, the amount of water it receives per day $W$, and the iron content of the soil around it, $I$. These variables in turn may be associated with each other; for example, if the tree is not artificially watered, then $S$ and $W$ may be negatively correlated, since the sun doesn't usually shine when it's raining. 

The model specification below is for a Bayesian linear regression model with $n$ covariates, and no higher-order terms. The distribution of $Y$ is normal, with a mean that linearly depends on the covariates $X_i$, and a variance parameter. All the parameters have priors, which the model specifies. Models like this are usually fitted using methods that sample the posterior distribution of the parameters given the observed data.  The results of Bayesian model fitting are usually very similar to frequentist model fitting results when there is sufficient data for analysis.

$
\begin{aligned}
Y &\sim N(\mu, \sigma^2) \\
\mu &= \alpha + \beta_1X_1 + ... + \beta_nX_n \\
\alpha &\sim N(1, 0.5) \\
\beta_j &\sim N(0, 0.2) \text{ for } j=1,...,n \\
\sigma & \sim \text{exp}(1)
\end{aligned}
$

The fact that the scale of the modeled parameter $\mu$ is the same as that of $Y$, and the absence of higher-order terms (such as $x_1x_3$), make it easy to interpret the meaning of each slope parameter: $\beta_j$ is the expected change in the value of the outcome variable when the covariate $X_j$ changes by one unit. The assumption that this expected change is always the same, independent of the values of the $n$ covariates, is built right into this model.

This model is about as simple a statistical model as you can have for modeling data sets with a lot of variables. But when I was studying multivariate regression, the covariates used for modeling were often chosen without much explanation. Sometimes we would use all the variables available, and sometimes we would only use a subset of them. It wasn't until later that I learned how to choose which variables to include in a multivariate regression model. The choice depends on what you're trying to study, and on the causal relationships among all the variables.

And, of course, you don't know the causal relationships among the variables -- often, this is what you're trying to figure out by doing linear regression -- so you need to consider several possible diagrams of causal relationships.

The ultimate goal is to get statistical models that clearly answer your questions, and don't 'lie'. Actually, statistical models never lie, but they can mislead. Statistical confounding occurs when the apparent relationship between the value of a covariate $X$ and the outcome variable $Y$, as measured by a model, differs from the true causal effect of $X$ on $Y$. The effects of confounding can be so extreme that they result in Simpson's Paradox reversals, where the apparent association between variables is the opposite of the causal association.

It takes some know-how to eliminate confounding. Sometimes you have to be sure to include a variable in a multivariate regression in order to get an unconfounded model; sometimes including a variable will *cause* confounding.

Sometimes, nothing you can do will prevent confounding, because of an unobserved variable. But here is what you can do:

1. You can hypothesize one or more causal diagrams that relate the variables under study. You can consider some that include variables you may not have measured, in order to anticipate problems.
2. You might be able to discard some of these hypotheses, if the implied condiional independence relationships between the variables aren't supported by the data.
3. You can learn how to choose what variables to include and exclude, on the basis of the remaining hypothetical causal diagrams, to get multivariate regressions that aren't confounded.
4. You can also determine when confounding can't be prevented, because you would need to include a variable that isn't available. 

In succeeding posts, I'll show you how to go about doing this yourself.

A new productivity trick



This past week, instead of posting an article from my Zettelkasten, I wrote an article for LinkedIn on a new self-management trick I've been using. I've found that dissociating my 'worker' persona from my 'manager' persona -- literally, pretending that they are different people -- has been a useful aid for me to getting work planned and done. 

I certainly don't want to give the impression that I'm any master of productivity and time management. I'm still looking for the perfect regimen that I can stick to. I've tried quite a few of them, and kept a couple of them; I'm a fan of Getting Things Done and Time Blocking, and I use both of those approaches when the mood to get my ducks in a row comes upon me. I've concluded that the best I can do is to have an arsenal of productivity tricks that I can deploy when I'm feeling uninspired (including the 'split-personality' trick), and to establish firm habits around scheduled work times. I can be found with my butt in my seat at my desk at the usual hours during every work day. That is the only trick I've ever found that really works consistently. 


Simpson's Paradox: extreme statistical confounding

 

Preamble

 

Simpson's Paradox is an extreme example of the effects of statistical confounding, which I discussed in last week's blog post, "Statistical Confounding: why it matters".

Simpson's Paradox can occur when an apparent association between two variables $X$ and $Y$ is affected by the presence of a confounding variable, $Z$. In Simpson's Paradox, the confounding is so extreme that the  association between $X$ and $Y$ actually disappears or reverses itself after conditioning on the confounder $Z$. 

Simpson's paradox can occur in count data or in continuous data. In this post, I'll talk about how to visualize Simpson's paradox for count data, and how to understand it as an example of statistical confounding.

It isn't actually a paradox; it makes complete sense, once you understand what's going on. It's just that it's not what our intuition tells us should happen. And whether it's 'wrong' depends on what goal you're shooting for. In the example below, if you want to make a choice for yourself based on understanding the relative effectiveness of the two treatments, you'd be best off choosing Treatment A. But if your goal is prediction -- who is likely to do better, a random patient who gets Treatment A or Treatment B? -- you're best off with Treatment B.
 
If that confuses you, keep reading.  
 

A famous example: kidney stone treatments

Here is a famous example of Simpson's Paradox occurring in nature, in a medical study comparing the efficacy of kidney stone treatments (here's a link to the original study).

In this example, we are comparing two treatments for kidney stones. The data show that, over all patients, Treatment B is successful in 83% of cases, and Treatment A is successful in only 78% of cases.

However, if we consider only patients with large kidney stones, then Treatment A is successful in 73% of cases, whereas Treatment B is successful in only 69% of cases.

And if we consider only patients with small kidney stones, the Treatment A is successful in 93% of cases, where Treatment B is successful in only 87% of cases.

Suppose you're a kidney stone patient. Which treatment would you prefer? Since I'd presumably have either a small kidney stone or a large one, and Treatment A works better for either one, I'd prefer Treatment A. But looking at all patients overall, this result says Treatment B is better. Does this mean that if I don't know what size kidney stone I have, I should prefer Treatment B? (No). Why is this happening?

This is happening because the small-vs-large-kidney stone factor is a confounding variable, as discussed in this post on statistical confounding from last week.

The diagram below shows the causal relationships among three variables applying to every kidney stone patient. Either Treatment A or B is selected for the patient. Either the treatment is either considered successful, or it isn't. And the confounding variable is in red: either the patient has a large kidney stone, or they do not.


The size of the kidney stone, reasonably, has an impact on how successful the treatment is; similarly, we're assuming the treatment choice affects the success of the treatment. 

But here's the confounding factor: the stone size, in red, also affects the choice of treatment for the patient. Treatment A is more invasive (it's surgical), and so it's more likely than Treatment B to be applied to severe cases with larger kidney stones. Conversely, Treatment B is more likely to be applied to smaller kidney stone cases, which are lower risk to begin with. Since the size of the kidney stone is influencing the choice of Treatments A vs. B, the causal diagram has an arrow from the size variable to the Treatment variable. And this is the 'back door', from the stone size variable into the Treatment choice variable, that is causing the confounding.

To see what is actually happening, look at the total numbers of patients in each of the four kidney stone subgroups:

  • Treatment A, large stones: 263
  • Treatment A, small stones: 87
  • Treatment B, large stones: 80
  • Treatment B, small stones: 270

Clearly the size of the stone is impacting the treatment choice. 

But stone size is also a huge predictor for treatment success: the larger the stone size, the harder it is for any treatment to succeed. So a higher proportion of small stone, Treatment B cases succeed than of large stone, Treatment A cases. And that's what's causing Simpson's Paradox.

Visualizing Simpson's Paradox for count data

 
Suppose we're running an experiment to assess the effect of a variable $x$ on a 'coin flip' variable $Y$. Each time we flip the coin, we'll call that a trial T. Each time $Y$ comes up heads, we'll call that a success S. The graph above has T on the x axis, and S on the y axis. Many experiments are modeled this way. In the kidney stone example, the variable $x$ refers to the choice of treatment, and the variable $Y$ refers to whether it had a successful outcome.

During data analysis, we'll break down the total sample of kidney stone patients into subgroups by whether they got Treatment A or B. We can break it down further in any way we choose; for example, we can subset the data by age, by gender, or by both at once. Or we can further subset the patients based on whether they had a large kidney stone. This subsetting will result in groups which we'll denote by $g$. 

We can visualize subgroup $g$'s experimental results by placing it in the graph as a vector $\vec{g}$ from the origin to the point $(S_g,T_g)$, where $T_g$ is the number of patients in the group, and $S_g$ is the number of patients in the group with successful outcomes. 
 
The slope of $\vec{g}$ is $S_g/T_g$, so the slopes of the vectors therefore indicate the success rate within each subgroup (note that the slopes of these vectors can never be larger than 1, since you can't have more successes than trials). When you compare the success rates between groups in an experiment, you only need to look at the slopes of these vectors -- the sizes of the subgroups are not visible to you. But it's the disparities in subgroup sizes that cause Simpson's paradox to occur. 
 
The lengths of the vectors are a rough indicator of how many patients there were within each subgroup; the larger the number of patients in the group, $T_g$, the longer $\vec{g}$ will be.

In the diagram above, we see that the subgroups Treatment A for small stones, and Treatment B for large stones, were much smaller in length than the other two (because there were fewer trials in those subgroups). But their lengths do not matter when considering the per-group success rates $S_g/T_g$; all that matters is their slopes. Treatment A's slope for small stones is higher than Treatment B's slope for small stones; the same holds the large stone groups. So within each subgroup, Treatment A is more successful.

But if we restrict our attention to the two longest vectors in the middle, we can see that the Treatment B, small stones vector has a higher slope than the Treatment A, large stones vector. This is mainly due to the fact that people with large kidney stones generally have worse outcomes, regardless of how they are treated.

In the diagram below, we are looking at the resulting vectors when all the Treatment A and B patients are grouped together, regardless of stone size. 

We get the vector corresponding to the combined group in Treatment A by summing the two green Treatment A vectors. Similarly, we sum the two black Treatment B vectors to get the aggregated Treatment B vector. When we do this, we can see that the Treatment B vector has the higher slope. 


 
This happens because, when we add the green vectors together to get the total Treatment A vector, the result is only slightly different from the much longer Treatment A, small stones group vector. Similarly, the summed vector for Treatment B is only slightly different from the much longer Treatment B, large stones group vector. 
 
As a result, the combined Treatment A vector has a lower slope than the combined Treatment B vector, making it look less effective overall. This is Simpson's Paradox in visual form. 

Simpson's Paradox reversals don't occur often in nature, though there are a few examples (like this one). But subtler forms of statistical confounding definitely do occur, all the time, in settings where they affect the conclusions of observational studies.

Statistical confounding: why it matters

Preamble

This article is a brief introduction to statistical confounding. My hope is that, having read it, you'll be more on the lookout for it, and interested in learning a bit more about it. 

Statistical confounding, leading to errors in data-based decision-making, is a problem that has important consequences for public policy-making. This is always true, but it seems especially true in 2020-2021. Consider these questions:

1. Is lockdown the best policy to reduce COVID death rates, or would universal masking work just as well?

2. Would outlawing assault rifles lower death rates due to violence in the US? 

3. Would outlawing hate speech reduce the incidence of crime against minorities in the US?  

4. What effect would shutting down Trump's access to social media have on his more extreme supporters?

If you're making data-based decisions (or deciding whether to support them), it's important to be aware that confounding happens. For people practicing statistics, including scientists and analysts, it's important to understand how to prevent confounding from influencing your inferences, if possible. 

Identifying and preventing confounding is a topic that I haven't seen covered in most places -- not even in my multivariate linear regression classes. It's explained beautifully in Chapter 5 of Richard McElreath's book "Statistical Rethinking", which I highly recommend if you're up for a major investment of time and thought. 

This topic is the first in a cluster about statistical inference from my slipbox (what's a slipbox?).

Important note: I use the 'masking' example below as a case where confounding might hypothetically occur. I am not suggesting for a moment that masks don't fight COVID transmission. I am a huge fan of masking! Even though I wear glasses and am constantly fogged up.

Statistical Confounding

Suppose you've got an outcome you desire: for example, you want covid cases per capita in your state to go down. Give COVID cases per capita a name: call it $Y$. 

You've also got another variable, $X$, that you believe has an effect on $Y$. Perhaps $X$ is the fraction of people utilizing masks whenever they go out in public. $X=0$ means no one is wearing masks: $X=1$ means everyone is.

You believe, based on numbers collected in a lot of other locations, that the higher the value of $X$ is, the lower the value of $Y$ is. After a political fight, you might be able to require everyone to mask up by passing a strict public mask ordinance: in this case, you would be forcing $X$ to have the value 1.


In order to determine whether to do this, you set up an experiment, a clinical trial for masks. You start with a representative group of people, and set half of them at random to wear masks whenever they go out, and the other half to not wear masks. The 'at-random' piece is important here, as it is in clinical trials. Setting $X$ forcibly to a specific value, chosen at random, can be thought of as applying an operator to $X$: call it the "do-operator". 

The do-operator is routinely applied in experimental science. For example, in a vaccine clinical trial, people aren't allow to choose whether they get the placebo or vaccine: one of these possibilities is chosen at random. This lets you assess the true causal effect of $X$ on $Y$.

If your experiment shows that mask-wearing is effective at lowering the per capita COVID case rate, you can then support a mask-wearing ordinance, with confidence that the ordinance will have the desired effect 'in the wild'. 

Statistical confounding occurs when the apparent relationship $p(Y|X)$ between the value of $X$ and the value of $Y$, observed in the wild rather than under experiment, differs from the true causal effect of $X$ on $Y$, $p(Y|do(X)$.

To put this in the context of masking, suppose we've observed in the wild that people who wear their masks when they go outside the house have lower COVID case rates per capita than people who don't. If we enforce a mask ordinance on the basis of this observation, it's possible that we might find that the law has no effect on the COVID case rate.  

This might happen because of the presence of other variables which affect the outcome variable, called confounder variables. In the case of the masking question, it may be that an important confounder is whether a person is concerned about catching COVID. If a person is concerned, it may be that in addition to wearing masks when they go out, they are also avoiding close contact with people outside their household. And perhaps that is the true cause of the reduction in the COVID rate among people who wear masks.


If it is the case that avoiding in-person meetings is the real cause of the lowered case rates, rather than wearing masks, then enforcing a masking law will not have the desired effect of reducing the case rate. And you definitely want to avoid passing an ineffective ordinance, for obvious reasons.

Next week, I'll talk about Simpson's Paradox, an extreme example of statistical confounding.

COVID vaccine efficacy

Preamble

This note started out as a reminder to myself about the definition of relative risk and vaccine efficacy, and morphed into a perusal of the FDA briefs on the Pfizer, Moderna, and J&J vaccines (links to all 3 briefs are at the bottom of the article).
 
It's really worth looking at the actual numbers of COVID cases among people in the studies -- they are surprisingly low. In some cases, they are so low that they make inference about vaccine efficacy hard. 
 
This is my first close look at the outcome of a clinical study. You have to make a lot of semi-arbitrary decisions, it seems, in order to design a clinical study. Even something as simple as a difference of 5 years in your cutoff for the 'older' age group can have an effect on inference. The 3 teams made all sorts of different decisions that make it hard to compare their outcomes head-to-head.

Above all, while writing this note, I wished many times that I could have gotten my hands on the actual data. I guess the current age of copious open data has spoiled me. 

Disclaimer: I do not have medical training, and nothing written here should be taken as medical advice. 

 

Definition of efficacy

 
Vaccine efficacy is defined as:

$$1-\text{relative risk} = 1-\frac{\text{Prob(outcome|treatment)}}{\text{Prob(outcome|no treatment)}}.$$

If the experiment has roughly equal treatment and control groups (as all the vaccine clinical trials did), then the probabilities can be replaced by counts:

$$1-\text{relative risk} \approx 1-\frac{\text{Count(outcome|treatment)}}{\text{Count(outcome|no treatment)}}.$$

So 95% effectiveness means that

$$\frac{\text{Count(outcome|treatment)}}{\text{Count(outcome|no treatment)}}\approx 1 - 0.95 = \frac{1}{20};$$

that is, for every 1 event in the vaccinated group, there were 20 in the unvaccinated group. 

What was the measured event (aka Primary Endpoint) used to measure vaccine efficacy?

 
TL;DR:  Patients needed to have more symptoms in order to satisfy the J&J or Moderna primary endpoints than to satisfy the Pfizer primary endpoint. All confirmed cases in all 3 clinical trials required positive PCR tests.

For Moderna: First Occurrence of confirmed COVID-19 (as defined by an adjudicated committee using a formal protocol) starting 14 Days after the Second Dose. Confirmed COVID-19 is defined on page 13 of the FDA brief, and requires at least 2 moderate COVID symptoms (i.e., fever, sore throat, cough, loss of taste or smell) or at least 1 severe respiratory symptom, as well as a positive PCR test. 

Moderna primary endpoint results.


For Pfizer: Confirmed COVID-19 beginning 7 days after the second dose. Confirmed cases had at least one symptom from the usual list of COVID symptoms, and a positive PCR test for COVID within 4 days of the symptom.

Pfizer primary endpoint results.

for J&J: 'Molecularly confirmed' (by a PCR test) moderate-to-severe/critical COVID infection, measured at least 14 and at least 28 days post-vaccination. They also studied the rates of severe/critical COVID, which required signs of at least one of severe respiratory illness, organ failure, respiratory failure, shock, ICU admission, or death. Definitions of the COVID illness levels are on page 15 of the FDA brief, and are similar to the Moderna definition of Confirmed COVID-19.


 

Thoughts about the results

 
Moderna and Pfizer both reported very high efficacies of about 95%. These were point estimates, i.e., single values summarizing the measured efficacy.

But the confidence interval (CI) is the thing to look at for each result, not the point estimate. The CI gives you information about not only the point estimate for efficacy, but about the certainty of the efficacy measurement. The CI for efficacy always contains its point estimate, but the wider the CI, the less confidence you can have in the point estimate. 

 
Moderna
 
 
The vaccine was tested with roughly equal control and vaccine arms. There were about 21,600 participants in each arm.

The 95% CI for people aged 18-65 is (90.6%, 97.9%), which is very high.

The point estimate of efficacy for people aged 65 and up was a bit lower, at 86.4%. The 95% confidence interval was (61.4%, 95.5%). The reason the confidence interval is wider is that only about 7000 people over 65 were enrolled in the clinical trial, and there were only 33 covid cases among that group (as opposed to 163 in the younger group). This caused the CI to be wider, reflecting increased uncertainty as to the true efficacy of the vaccine.

If the cutoff for the older age group were lower, there would have been more cases in that group, and more confidence in the result. It would have been nice to have access to the raw clinical trial data.

 
Pfizer
 
 
The vaccine was tested with roughly equal control and vaccine arms. There were about 18200 people in each arm.

The division along age lines in this table occurs at age 55 years, rather than 65 years. This made the age groups a bit more balanced and resulted in more cases in the 55+ age group. Thus the 95% CI for the older age group is narrower than Moderna's, at (80.6%, 98.8%). The results for the younger group are even better.

 
Johnson & Johnson
 
 
J&J had two endpoints, one corresponding to moderate illness, and one to severe and critical illness. J&J has emphasized the efficacy of their vaccine against their endpoint of severe or critical COVID-19, so that's where I focused my attention.

The J&J study had some issues in its design that make it hard to draw conclusions. Because severe COVID is rarer, there were fewer cases of it in the final analysis, which means increased uncertainty for the conclusions. They also ran studies across several countries with wildly different base rates of covid, and with different dominant COVID
-19 strains. This makes me think nervously about aggregation confounding (Simpson's paradox) when all the results are thrown into one bucket. Again, access to the raw data would have been nice.

J&J's point estimate of 85% efficacy in the US against severe covid, which you hear about all the time, is of questionable value, because the 95% CI was (-9,% 99.7%)! That's because there were only 8 severe COVID cases in the US arm of the trial -- 7 in the placebo group and one in the vaccine group. That's not enough to base any conclusions on. The same problem with a low total case count was found in Brazil.

Probably the best estimate of J&J efficacy against severe covid came from the South African arm of the study, where the number of severe cases was largest (26 severe cases in both arms of the study after 28 days post-vaccination -- 22 in the placebo group and 4 in the vaccinated group). The point estimate there was 81.7%, and the 95% CI was (46.2%, 95.4%). Remember that the tough South African COVID variant was spreading during this study, so that's pretty good news as to J&J's efficacy against that variant.

If you throw all the people in those 3 locations into one bucket, you get this table describing the aggregate result for severe covid:

J&J aggregate results across all sites for severe COVID

I have two thoughts about this; one is that I'm suspicious of aggregation effects, due to the fact that the studies in the 3 countries were so different. The second is that the evidence for the effectiveness of J&J's vaccine is significantly stronger for onset 28 days post-vaccination than for 14 days post-vaccination; the jump in efficacy against severe COVID in the younger age group is more than 10 percentage points.

So, although I've read that you can consider yourself officially "J&J-immunized" after 14 days post-vaccination -- I intend to wait another 2 weeks after that, till the 28-day mark, before really relaxing the rules. 

 
References
 
 

Launching "From my Slipbox"

 

Niklas Luhmann's original Zettelkasten

This post is the first in a series I'm launching on statistics, machine learning, productivity, and related interests: "From my slipbox".

A slipbox ("Zettelkasten" in German, translating to card-box) is a personal written record of ideas that you've gotten from things you've read, seen, or heard. Each Zettel is a card containing a writeup of a single concept that you've thoroughly digested and translated into your own words. The cards are also annotated with the addresses of other, related ideas captured in your slip-box, allowing you to follow the threads of ideas.

The Zettelkasten idea is credited to mid-20th-century German sociologist Niklas Luhmann, who spent decades building a physical slip-box in order to flesh out his ideas on a theory of society. It was constructed like a library card catalog, with ordered unique IDs for every card/idea (see the photo above -- it actually was housed in a library card catalog, apparently). 

 These days, a slip-box is more likely than not to be digital, and there is specialized software to support it. The Archive seems to be especially popular among ZK aficionados, but I just noticed that it is only supported on MacOS. My own choice of tool is Obsidian.md, which is supported on all architectures (including Linux!), and supports math markdown. Both tools use local markdown files so that your data is not stored in a proprietary format (links to both tools are below). I store my ZK in a private Github repository for safety and versioning support.

There are plenty of people who build their ZK using physical cards and boxes, just as Luhmann did, just for the pleasure of it. I understand that pleasure -- I think by writing longhand -- but there are huge benefits to hyperlinking and digital backups.

I took up Zetteling very recently, in January 2021. I've always written copious longhand notes about technical things I've read and digested, some of which have become the 'writeups' I've posted in the past on topics like Kalman filters, the backpropagation algorithm, and design of experiments. But my longhand notes sometimes get lost or accidentally thrown out, and the effort required to get from my handwritten notes to material worth publishing is sometimes a deterrent.

I got excited about making a Zettelkasten for the following reasons:

1. It encourages my writing habit

2. It lets me put my thoughts into semi-formal writing immediately, rather than waiting until I have a large writing job to do

3. It fights the brain leakage problem, wherein I quickly forget the details of what I've learned

4. Luhmann claimed that new ideas emerged spontaneously from his Zettelkasten, simply because of its massive size and interconnectedness -- sort of like a huge neural network developing consciousness (I'd like to see that happen!)

5. The promise of more easily generating quality written content from existing Zettels is appealing

6. The idea is for you to spend time 'curating' your slip-box -- rereading your ideas, making new connections, etc. -- which aids my memory, appeals to my love of organization, and makes me feel productive even when I'm too tired to actually write.

A little over a month after getting started, I've written around 150 Zettels on topics such as neural nets, productivity, project planning, variational calculus, causality, statistical modeling, and on Zetteling itself. Each one is a sort of soundbite of some story or idea I found interesting.

Every Friday, I'll be posting a Zettel from my Zettelkasten -- often technical, but sometimes relating to consulting, productivity, or other topics. 

I am hoping that this series results in conversations, and occasional 'super-Zetteling' -- making new connections to interesting content from minds beyond my own.

Some Zettelkasten resources:

Linear regression is trickier than you think

 Preamble In my last two posts, I talked about statistical confounding: why it matters in statistics, and what it looks like when it gets r...