Thursday, January 27, 2011

Lack of Correlation Does Not Show Lack of Causation

From XKCD
I'm sure many of you feel that it is disappointingly easy to become embarrassed for humanity whenever reading a discussion of correlations.  In academia's greatest charade, every Stats 101 class or Epidemiology 101 class or heck even a Psych 101 class will emphatically declare that correlation does not imply causation.  Then most people graduate and spend their entire lives reading causation into correlations.  Especially if they become epidemiologists.

Observational studies are entirely legitimate forms of evidence, and correlations are entirely useful statistics.  No one can question this.  However, these correlations simply show a relationship and tell us nothing about the explanation of that relationship.  

This doesn't change just because an explanation is biologically plausible.  Nothing ever changes it.  A correlation raises the possibility of a cause-and-effect relationship, but no more or less than it raises the possibility of a non-causal relationship.  

Nevertheless, many people who understand this may still believe that a lack of correlation can rule out a cause-and-effect relationship.

But it can't.  It can't even come close.

In fact, a lack of correlation tells us much less than a correlation does.  This is because a correlation at least tells us there is a relationship, even if it tells us nothing about why the relationship exists.  A lack of correlation, by contrast, does not even tell us that there isn't a relationship.

While "no relationship" is one possible explanation for a lack of correlation, there are several others:

  • Lack of statistical power.  The study may have needed a larger sample to detect the correlation.  This would not affect the correlation coefficient, but it would affect whether it is statistically significant.


  • Lack of sufficient range of variation.  Even if you have a perfectly linear and very strong correlation, if you limit the range of variation you study, the correlation coefficient will decrease.  If you limit your range narrowly enough, the correlation coefficient will essentially disappear.  For example, if study time increases test scores linearly over the range of zero to ten hours per week, the relationship may be perfectly linear between eight and ten hours per week, but because you have limited the range of variation, you will decrease the correlation coefficient.  As you decrease the range further, the correlation coefficient becomes zero.  This would make it look like the study was not actually underpowered, because there would not even be a non-significant correlation.  But it's just an illusion.


  • Lack of linearity.  Conventional correlation coefficients look for linear relationships.  As X increases, Y increases.  Or as X increases, Y decreases.  As Ned Kock frequently points out, many relationships found in nature are U-shaped or otherwise non-linear.  For example, Paul and Shou-Ching Jaminet suggest in Perfect Health Diet that there is an optimal range of carbohydrate consumption.   The risk of disease, according to their hypothesis, will be lowest in this range, and will increase as one departs from it either by increasing or decreasing carbohydrate intake.  If you're looking for a straight line when nature provides a U, you aren't going to find your line.


  • Incomplete or inappropriate adjustment for confounding factors. There may be other factors that affect the relationship that are not being taken into account.  On the other hand, perhaps there were statistical adjustments that were made that shouldn't have been made.  Assuming that the stats "have been adjusted for all the confounding factors" assumes that our knowledge of what may affect the relationship is complete or nearly complete.  In fact, our knowledge of what affects the relationship could be closer to a grain of sand in an entire seashore.  Moreover, understanding what is a true confounding factor requires understanding the cause-and-effect relationship -- and usually this is uncertain and controversial.  Failure to make the right adjustments results in a failure to make the relationship manifest, while making the wrong adjustments can hide a true relationship.

Thus, lack of correlation certainly does not imply lack of causation.  

Back to our regularly scheduled genetics series -- with a likely wheat interlude coming soon.

Enjoy the night!

Read more about the author, Chris Masterjohn, PhD, here.

21 comments:

  1. Really enjoying the genetics posts, Chris. Keep up the great work.

    Also very interested to hear more on wheat!

    Quick question if you have a minute- I have a great supply of local, grass-fed beef liver, and have developed a liking to it. In your opinion, what is an optimal amount per week assuming a pretty great diet overall?

    I'm currently eating a little over a 1/4 pound lightly cooked per week.

    TS

    ReplyDelete
  2. Tyler, thanks! That's fine. Nutritional requirements vary. Eat what makes you feel best.

    Chris

    ReplyDelete
  3. I just added a fourth bullet point:

    Incomplete or inappropriate adjustment for confounding factors. There may be other factors that affect the relationship that are not being taken into account. On the other hand, perhaps there were statistical adjustments that were made that shouldn't have been made. Assuming that the stats "have been adjusted for all the confounding factors" assumes that our knowledge of what may affect the relationship is complete or nearly complete. In fact, our knowledge of what affects the relationship could be closer to a grain of sand in an entire seashore. Moreover, understanding what is a true confounding factor requires understanding the cause-and-effect relationship -- and usually this is uncertain and controversial. Failure to make the right adjustments results in a failure to make the relationship manifest, while making the wrong adjustments can hide a true relationship.

    ReplyDelete
  4. Might-o'chondri-ALJanuary 27, 2011 at 9:55 PM

    What hasn't killed me has helped me survive - proof is I'm alive. It's a statistic that proves if something is good for you then more of it has to be better. Just don't eat any yellow snow ....

    ReplyDelete
  5. Good post, I was wondering about this since reading Dr. Eades post last week where he states:

    "...although observational studies can’t show that correlation equals causation, they probably are valid in demonstrating the opposite: if there is no correlation, there probably isn’t much of a case for causation."

    That didn't sound quite right to me.

    Definitely looking forward to more on genetics! The microcosm of Darwinian natural selection in your last post and the lecture you linked on facebook were totally fascinating.

    ReplyDelete
  6. So you're saying Donald Rumsfeld was right? :-P "absence of evidence is not the evidence of absence... simply because you don't have evidence that something does exist doesn't mean you have evidence that something doesn't exist... unknown unknowns... things that we don't know that we don't know.

    ReplyDelete
  7. Justin, thanks! I'm glad you're enjoying the genetics posts. Dr. Eades is correct in the very limited sense that if you have conclusive evidence that there is no relationship, you can rule out a cause-and-effect relationship, for the obvious reason that the latter is a subtype of the former. However, for the reasons outlined above, you cannot conclude that there is no relationship based on the mere absence of a significant correlation.

    js290, Yikes, I shudder at the thought of agreeing with Rumsfeld on anything! But, the facts is what the facts is, depending on what your definition of 'is' is. And geez, I hate it when I agree with Bill Clinton.

    ReplyDelete
  8. "larger sample to detect the correlation. This would not affect the correlation coefficient".

    Dear Chris, if you progressively increase the sample size (more statistical power), the correlation will be affected and can even be inverted. For example, Ancel Keys found a positive correlation between fat intake and CV deaths using just a few countries. But if you use many countries (http://bit.ly/eQWuDT) the correlation becomes negative. One question: in unhealthy cohorts, the observation of beneficial effects of supposedly unhealthy foods reflects a higher probability of a causality? For example, there are a series of studies that associate increased animal protein, total fat and saturated fat with lower risk of stroke. In this case, is it more probable that the association is causal, when compared for example with those studies, in healthy cohorts, that find that whole grains are very healthy even for diabetics?

    ReplyDelete
  9. It seems that the first three reasons why correlation might not show up in a statistical test despite a causation. These seem like a misapplication of statistics. Too often we use frequentist methods for answering Bayesian questions. We want to answer the question of what should I believe.

    Frequentist methods work like this
    There are two hypotheses

    Null hypothesis: assumed true, for no good reason but convenience, no correlation.
    Correlation exists hypothesis

    A p-value that is too low, 0.05. Means we do not reject the null. For most intents and purposes this means that people will accept the Null hypothesis.

    This strikes at your point about statistical power. If one really wants to prove or show no relation you have to do a lot more than just see if a correlation is statistically significant.

    Furthermore with a powerful enough test everything has a correlation. Changing one variable, especially in nutrition will have some impact. If you have a powerful enough test with enough data you can prove that just about everything in the world has some correlation. Not only do we have to watch for correlation not being the same as causation but also check if this correlation is biologically significant.

    ReplyDelete
  10. O Primitivo, I think you are misunderstanding. Of course if you include different populations, or if your sample was not a random sample, then expanding it can affect the correlation coefficient. What I meant was that if you have a random sample of a particular population, and you increase the sample size while continuing to make a random sample of the same population, it will affect the significance but not the magnitude of the correlation. No, I don't think the contextual factors you mention make any difference in the fact that these correlations do not imply causation in either case.

    Chris

    ReplyDelete
  11. Mike those are good points and I agree with most of what you said. However, lack of significance does not indicate acceptance of the null hypothesis; it indicates failure to reject the null hypothesis. I think a Bayesian approach is often to complicated for a simple observational study -- perhaps this would be appropriate for inclusion in a meta-analysis. Either way, it wouldn't change the lack of ability to infer causation.

    Chris

    ReplyDelete
  12. However, Chris, it does say, that within the range of variation of the control variable, the effect is insignificant, or being masked.

    And that means that the model being explored is not rich enough.

    It means: THINK AGAIN.

    ReplyDelete
  13. Yes, Leon, I agree it means we should look elsewhere, but it doesn't constitute evidence of a lack of effect.

    Chris

    ReplyDelete
  14. I am so glad you wrote this. *embarassed face smiley*

    ReplyDelete
  15. It DOES constitute "evidence of lack of effect" in the sample and population chosen.

    Since one use of sampling, of which correlation studies is an example, is to find evidence for or against some practical conclusion or the provision of advice.

    It is not just a theoretical philosophical speculation about an idea.

    When one does batch sampling in industry, there is the concept of accepting or rejecting the batch. Many medical tests, have the unfortunate characteristic of having false positives, or false negatives or both!

    I am afraid that in the sample chosen, from a particular population, that absence of evidence IS evidence of absence, under the conditions chosen by the experimenter.

    As for Ancel Keys, his intellectual crime was, that when the sample size was increased, and displayed quite a different correlation, he refused to acknowledge it, and no-one could bring him to book.

    Had this been a case of civil tort, he would have been appealed to higher court.

    Not a good example, Chris.

    ReplyDelete
  16. Well, I will agree to disagree. I can't say much more than I've already said -- a correlation is a relationship, not an 'effect,' and there are numerous reasons for not seeing a relationship when in fact one exists. I did not offer Ancel Keys as an example. I don't think "increasing sample size" is an accurate description of including additional countries in his ecological study. It may be technically an increase in the size of his sample, but it is qualitatively more than that, and it is a completely different phenomenon than the one I wrote about.

    ReplyDelete
  17. Great post Chris, thanks for the mention, and good comments. Leon knows his stats and math; he makes good points.

    ReplyDelete
  18. Btw, I wrote a post entirely on point 2 (link below). I think lack of variation, sometimes done on purpose, is a huge problem.

    bit.ly/9CBEw1

    ReplyDelete
  19. Thanks, Ned, and you're welcome. That's a great post you linked to.

    I agree Leon is making legitimate points, but I don't think they can be taken to contradict what you or I wrote. I agree, for example, that lack of a linear correlation in a sufficiently powered sample is evidence of a lack of a linear relationship within that sample, but that is essentially restating the fact. It doesn't demonstrate that there is no cause-and-effect relationship between the variables, as your post demonstrates.

    Chris

    ReplyDelete
  20. Well now, after due consideration, my comments are directed towards the summary of the post as headlined. The inclusion of the adverb "always" would have changed the implications that you wish drawn from the post as in:

    "Lack of Correlation Does Not Always Show Lack of Causation".

    You do not want to have the same journalistic standards as displayed in today's ScienceDaily

    Potential 'Cure' for Type 1 Diabetes? (headline)

    http://www.sciencedaily.com/releases/2011/01/110126161835.htm

    which cure proposes killing alpha islets.

    When one gets to the actual abstract, it's just another genetically maimed mouse study.

    ReplyDelete
  21. Chris, in the article "Vitamin A On Trial: Does Vitamin A Cause Osteoporosis?" you said: "cod liver oil, in fact, because of its vitamin A content, is the only source of essential fatty acids that can lower levels of harmful, free-radical lipid peroxides, while all other sources of essential fatty acids raise lipid peroxides." Yet in the article you used as reference for this (https://www.westonaprice.org/environmental-toxins/239-dioxins-in-animal-foods-a-case-for-vegetarianism.html) you said: "Cod liver oil, on the other hand, has been shown to inhibit lipid peroxidation. One study found that cod liver oil depressed drug-induced lipid peroxidation in mice under the same conditions by which soybean oil increased lipid peroxidation.52 Another study found that feeding cod liver oil entirely abolished the increased level of lipid peroxidation found in diabetic rats.55 In both studies, the depression of lipid peroxidation was related to a sparing effect on glutathione peroxidase activity, which was also the case in rats saved from a lethal dose of dioxin by vitamin A supplementation, suggesting that the protective effect of cod liver oil is due to its high vitamin A content."

    When did a suggest became a sure?

    ReplyDelete

To create a better user experience for everyone, comments are now moderated. Please allow up to one business day for your comment to post. In order to avoid the appearance of spam, please avoid posting links, especially to commercial destinations, and using all-caps.