Tuesday, September 6, 2011

Gary Taubes on Cherry-Picking and Paradigm Shifts (A Brief Thought on Science)

by Chris Masterjohn

Warning: A Serious Blog Post Occurs Somewhere Below

Some controversy recently erupted in the Twitter-sphere when a number of us including Dave Dixon and Dallas Hartwig were recently discussing Denise Minger's angular hypothesis of atherosclerosis, in which she proposed that increased concentrations of serum bananas and increased concentrations of other plasma constituents with pointy ends or sharp edges penetrate the blood vessel wall and initiate plaque development.  Andrew Badenoch's research showing that increased banana intake does not increase serum banana levels has made it difficult to base a dietary theory on this hypothesis, but we have tentatively concluded that picking cherries, because of their sphericity and resultant tendency to bounce cleanly off the blood vessel lining without incurring any injury, is likely to lengthen lifespan.  

After several of us observed that not chewing such fruits is likely to preserve their roundness, reduce their insulinogenic properties, and lower their effect on reward centers in the brain, I used the definition of cherry-picking recently put forward by Gary Taubes to suggest that dismissing studies demonstrating the benefits of not chewing your food might significantly increase lifespan.  In other words, if Mr. Taubes is seeking key experiments that are capable of distinguishing between competing hypotheses, and if he considers this "cherry-picking," then his approach to studies like the one I just linked to that support all three hypotheses are likely to lead to increased cherry-picking and thus increased immunity to heart disease. In suggesting that Gary was likely to outlive most of us, I was simply wishing well to a man that has introduced innumerable people to the work of Weston Price and to the paleo movement, an achievement Melissa McEwen recently emphasized, and infusing this wish with a little of the humor that has thus far characterized the bulk of this discussion.

Nevertheless, some found this comment to be "snarky." Dallas and I therefore decided that humor just doesn't come across that well in 140 characters, and that this issue deserves a serious blog post rather than a bunch of tweets.  I had written such a serious blog post at lunch yesterday (Monday), but duty called, more important things arose, and I never managed to finish it.  Given the issue's import, I have decided to finish and publish that post.  So here it is — a very serious post about the art of cherry-picking. 

The Unfinished Blog Post Begins

There's really nothing like spending Labor Day making up for lost Hurricane Time developing new ways to reduce artifactual formation of malondialdehyde (MDA) during the homogenization of adipose tissue.  Yet even those of us who missed the memo that the Bible labor movement invented a thing called "the weekend" have some time to blog during lunch — as long as it's just a brief thought about life, or the scientific method.

The Art of Picking Cherries

Gary Taubes recently wrote about the essentiality of "the supposedly heinous crime of cherry picking" to the progress of science:
This map-making exercise can be perceived as a justification for cherry-picking of the data, which, in a way, it is. But I’m arguing that such selective interpretation of the data is a fundamental requirement to make progress in any field of science, and particularly one as off the rails as that of obesity and nutrition. It is inherent to the process that Kuhn described as “map-making,” to taking a non-playable game – a dysfunctional paradigm – and making it playable.
Gary goes on to explain that launching a paradigm shift requires sifting through the vast mass of scientific data to isolate key experiments that can differentiate between competing hypotheses and to discard the rest.

I think Gary is making a critical point.  There is no sense in trying to wrestle with all the data.  A great deal of data (perhaps most) that has ever passed through a scientist's notebook or collection of Excel files probably remains unpublished, and most things that happen — that is, potential data — pass through time unrecorded.  Trying to amass "all" of the data is thus a naive exercise in futility.  

Moreover — and I think this is Gary's main point — most tests of a hypothesis could be interpreted as supporting multiple hypotheses, and when it comes time to tease these hypotheses apart, we get nowhere by looking at every experiment that supported one or another of them.  Instead, we need key experiments capable of distinguishing them.

Nevertheless, I don't think dismissing irrelevant data can actually be called "cherry-picking."  Cherry-picking, the way I see it, is the dismissal of relevant data.  This dismissal falls roughly into two categories:

  • If one type of observation is repeated multiple times with conflicting results, it would be cherry-picking to look only at the results that support our hypothesis.
  • If one hypothesis could be distinguished from another with several different types of observations and these different methods of hypothesis-splicing yield conflicting results, it would be cherry-picking to use only the types of tests that support our hypothesis.
Gary clearly isn't arguing for this type of cherry-picking.  He elaborates:
What we ultimately want, as Feynman suggested, is an experiment or an observation that can unambiguously  — i.e., rubbing back and forth gets us as close to nowhere as we can get — differentiate between hypotheses or paradigms. 
Once again, I think he is making a critically important point and I am mostly in agreement.

At the same time, I think it would be somewhat naive to believe there could ever be a single experiment that could definitively distinguish between two competing hypotheses.  It would be all the more naive to think such an experiment could definitively support a single hypothesis, because most hypotheses that any given experiment supports are probably undreamed of.  If we can only think of two, three, or five hypotheses that we could use to interpret the results of a single experiment, it is probably our imagination and not our experimental precision that is the limiting factor.

I imagine Gary would agree, and I am therefore not suggesting that he is naive to these points, but simply offering an elaboration and clarification of the strong arguments he already made.

What we need to do is design our experiments to be as discriminating as possible, realizing that we will never achieve infinite precision.  We must then slice the data from many different angles like we would slice a pie — or perhaps a steak, if Gary would prefer, or even a low-reward plain potato —  and attempt to paint all of these different forms of imperfectly discriminating evidence into a coherent picture.

LDL Oxidation as an Example

Take, for example, my contention that the oxidation of polyunsaturated fatty acids and proteins in the LDL membrane is a central event in the initiation of the atherosclerotic lesion and a less central but still important event in the inflammatory cascade that eventually enables that plaque to cause a heart attack.  I and many others have come to this conclusion by attempting to reconcile the evidence garnered from multiple approaches including test tube science, animal experiments, and the genetic and clinical evidence in humans.

There is no single, definitive experiment that could ever be performed that could, in and of itself, demonstrate this hypothesis to be true.

I explained in "Genes, LDL-Cholesterol Levels, and the Central Role of LDL Receptor Activity in Heart Disease" that statins, cholestyramine, and thyroid hormone all increase the activity of the LDL receptor, but none of them do so specifically.  We do not have a drug or dietary agent that only changes LDL receptor activity and does nothing else.  There are antibodies to PCSK9 currently being investigated for clinical use, which should inhibit the degradation of the LDL receptor.  If their specificity of action pans out, these might be able to show that LDL receptor activity governs heart disease risk even in people without genetic defects, and dose-response studies could define the range in which LDL receptor activity is important and whether its relationship to heart disease risk is linear.

It would be wrong, however, to consider this in and of itself definitive.  PCSK9 may do things we don't yet understand.  Future tests may show that the antibodies bind to other things besides PCSK9 that were not included in the initial specificity tests, or that the antibodies may elicit some unforeseen reaction of the immune system.

Such tests would also leave us in the dark about why increasing LDL receptor activity protects against heart disease.  Is it, as I contend, that robust clearance of lipoproteins from the blood prevents their oxidation?  

Current ways of testing the oxidation hypothesis in live humans are quite certainly insufficient.  There are no antioxidants we could supplement that would act specifically on the LDL particle, and adding single antioxidants is always risky business because doing so can actually disturb the antioxidant network and disrupt important communication signals.

Perhaps we could design an experiment where we randomize people to receive the anti-PCSK9 drug or a placebo.   Then we could inject half of each group with chemically purified oxidized LDL and half of each group with an inert solution that had gone through the same purification process (so it picks up all the same trace contaminants) but that lacks any oxidized LDL.  

That would never pass an Institutional Review Board for obvious ethical reasons.  Even if it did, however, and even if we showed that injection of oxidized LDL abolished the protective effects of increasing LDL receptor activity, there are still a whole host of objections to a definitive interpretation:
  • The fact that something can produce a disease experimentally does not mean it did produce the disease in everyone who has it.  What if this is one of many causes?  How important is it relative to other causes?
  • What if this experiment cannot be repeated in people of a different gender or ethnicity?  We would have to go back to the drawing board to attempt to explain why.  This would certainly make our results seem less definitive, but we would only know about this problem once we attempted to replicate the experiment in these other populations.
  • In humans who are not acting as laboratory guinea pigs, LDL oxidation is a continuous process.  Does injecting people on, say, a weekly basis with a larger amount of oxidized LDL than they would ever experience at one time create a fundamentally different scenario?  Perhaps in most people LDL never oxidizes fast enough to accumulate in the blood at a high enough concentration to cause harm.
We could go on and on.  The totality of all the possible objections to a definitive interpretation can never be satisfied with a single study. Developing broad support for a hypothesis requires studying it in many different ways, looking at it from many different angles, using the most discriminating evidence possible  but recognizing that its precision is imperfect, and attempting to fit all of the pieces of the puzzle together — without picking any cherries along the way. 

Read more about the author, Chris Masterjohn, PhD, here.


  1. Hi Chris. Probably due to the limitations of Twitter and not seeing the whole conversation I read the wrong thing into your tweet. It bugged me when I read it, but seemed out of character for you, which is why I asked about the intended snark level. Glad you clarified, and I probably owe you an apology.

    Just to beat my usual horse: experiments don't really distinguish between hypotheses, they shift around the relative evidential weights. Similarly, data aren't "relevant" or "irrelevant", again they may update the relative belief in different hypotheses to varying degrees; that could include have no effect, which I suppose really does imply complete irrelevancy.

    My point here is that these things live in a continuum. Trying to force things like belief in a hypothesis into a discrete true/false framework also forces you to make these sorts of binary decisions, like what to include/exclude. This can get weird, because you can wind up in situations where "truth" is path dependent, i.e. you only deemed a hypothesis true today because of some evidence you ignored years ago when it seemed irrelevant given your information then. Science (very definitely including the topics under discussion) is littered with examples of this.

    "Cherry-picking" as discussed by Taubes is probably only necessary if you're trying to jam the entire inference process in your head. Our brains are good at many things. Reasoning with uncertain evidence is not one of them, particularly when there are a lot of factors and interactions amongst them to consider. We over/underweight things pretty radically (and in a context-dependent manner), many interesting psychological experiments demonstrating this phenomenon.

    I have an idea that science has a deep need for computational tools to handle the sort of complex inference under uncertainty that lays at the core of the process. Extend that to building decision models from this information to help make choices about things like the appropriate treatment for metabolic disorders such as obesity. Sounds wacky, I know, but the mathematics and available computational power are driving towards this, where one really could potentially consider all of the data and wind up with a distribution of weights over competing hypotheses. Not nearly as far-fetched today as it would have been only ten years ago.

    If it sounds like I'm talking about Bayes Theorem, there's a reason for that. A good book on the history of this is "The Theorem That Would Not Die".

  2. Neither Taubes nor anyone else has said anything implying that any experimental results can yield conclusions with "infinite precision" or absolute certainty, so you're dispelling a myth that nobody believes. Everyone from particle physics to macroeconometrics acknowledges the possibility of unknown confounding variables... the point is that evidence becomes more logically useful the more strictly relevant variables can be isolated and confounding variables isolated, and this is accomplished in clinical trials through good study design, and such a well-designed study carries far more evidential weight than stacks of poorly-designed studies or uncontrolled epidemiological anecdotes or whatever.

  3. I know that nobody (at least non-crackpots) would ever categorically state that a hypothesis is absolutely true or false. But that's effectively how people reason about science in their heads. You can tell by how "scientific debates" often progress - the two sides don't exchange information and arrive at some common set of weights to the competing hypotheses. They tend to fight tooth and nail to defend their side, and the rest of us are supposed to somehow sort out what constitutes useful information and what's just dogmatic reasoning.

    And you still need to be able to quantify evidential weight, including how you update it. We wave our hands a lot and say things like "well-designed study carries far more evidential weight than stacks of poorly-designed studies or uncontrolled epidemiological anecdotes", but what does that mean in numbers? What does "well-designed" actually mean in terms of how an experiment shifts evidential weight? And clearly at some point I can amass enough anecdotal/epidemiological evidence to outweigh a clinical result. But how does one quantify that?

  4. Gary Taubes consistently says the preponderance of the evidence points towards excessive consumption of refined carbohydrates as the main driver of obesity. His critics point toward the Kitava Study as one that disproves his theory.

    From http://www.carbohydratescankill.com/2412/pearl-of-kitava-study-1-of-2:
    "According to the Kitava Study, Kitavans consumed 300 out of 370 grams of carbohydrates from yam, sweet potato, and taro, 50 grams from fruits, 7 grams from coconut, and 14 grams from other vegetables. Because of the fairly primitive environment where Kitavans were in at the time when the study was conducted, the yam, sweet potato, and taro, as well as fruits and other vegetables, which they consumed, were likely low in both glycemic indices and glycemic loads without the work of bioengineering on these foods for improving the contents of digestible carbohydrates including starch. Thus, the blood glucose level of Kitavans was considerably stable and lower than which of their Western counterparts including Americans and Swedish at the time when the study was conducted. The secretion of serum insulin is normally corresponding to the level of blood glucose. Thus, the serum insulin level of Kitavans also was lower than which of their Western counterparts."

    From http://healwithfood.org/diet/kitavan-diet-foods.php:
    "Tubers, which are mainstays of the Kitavan diet and one of the primary sources of carbohydrates for Kitavans, generally have a low glycemic index rating:
    Cassava (boiled): 46
    Yam (boiled): 35
    Sweet potato (boiled): 44
    Taro, boiled (boiled): 56"

    Taubes has always pointed towards large amounts of refined carbohydrates as the culprit. It seems the Kitavians eat plenty of unrefined, lower glycemic foods. This doesn't contradict what Taubes thinks.

    Why they don't get lung cancer from smoking is another issue.

  5. Hey folks,

    I'll respond to these comments when I'm done with my lab work. But Anonymous, sheesh -- you used to leave the best comments and ever since you accused me of being a USDA shill you've been leaving the worst! I used to have such faith in you! I still do, my friend, if for no other reason than you've left so many comments on my blog and made such great contributions to the discussion, ever since I started allowing anonymous comments.

    In any case, surely if you read my blog you saw that I explicitly stated that I was acknowledging something I believed Taubes already agreed with, and offering further elaboration of the strong arguments he already made.

    So you are arguing against a mythical dispelling of a myth, my friend. I never claimed to dispel some myth that Taubes was promoting.

    But please note that you are always welcome here.


  6. I thought Anonymous was talking to me. Now I'm sad. ;-)

  7. He must have been writing to Anonymous. Now I'm jealous. :(

  8. To twitter, or not to twitter? You guys are not helping me out here! These ' philosophy of science' posts are both enlightening and amusing. Keep them coming Chris, very helpful and appreciated.

  9. Hi Dave,

    Whenever I see the name "Dave Dixon," I think of Bayes. When I see the name "Bayes," I think of you. That's one of the more straightforward correlations in the world of biology.

    I accept your apology, although I don't feel one is necessary and I certainly wasn't offended. It's quite difficult to keep up with tweets comprehensively and I don't consider that anyone's responsibility. I used that as an intro in part because I wasn't sure of whether to publish this post till Dallas suggested I do so in response to this confusion, and in part because I thought it would be amusing to summarize the tweets about the sphericity theory in one place.

    I agree that experiments shift around evidential weights, and I didn't mean to suggest otherwise. Indeed, I think that the points i made suggest that each angle we slice and each repetition of a study simply shift our confidence in a hypothesis in one direction or another to a greater or lesser degree. That said, it's still important for a study to be designed in such a way that it discriminates between alternative hypotheses. This allows the study to shift our confidence toward or away from different hypotheses.

    I also agree that there is no binary division between "relevant" and "irrelevant," but certainly some data are more or less irrelevant to certain specific questions. If I'm trying to address "do plain potatoes increase the likelihood of obesity when eaten to satiety as a large proportion of the diet," and I see data on preference for certain toothpaste brands, I think it's fair to say that data is not intrinsically "irrelevant" but it is irrelevant to the question I'm asking. I don't see any point in amassing all this data and then trying to plug it into some equation where it is multiplied by zero as a relevancy coefficient (or by negative infinity for an irrelevancy coefficient).

    I do think, however, that there is an important place for categorical analysis. I don't think everything is a continuum. I think truth exhibits properties of particles and properties of waves, depending on how we're looking at it. The human mind works with categories by nature, and for good reason. That mind is, as far as I can tell, currently the best thing to use for data analysis, though with help from some tools.

    I think current meta-analyses are using very good ways of looking at the totality of the data with both continuous and categorical classifications, and attempting to assign various evidential weights to all of the data. Not perfect, but always improving.


  10. Stephen,

    Those are interesting points, but I didn't intend this post to favor or oppose the carbohydrate hypothesis of obesity.

    Gordon, glad you like them! If you want me to help you out, I will go this far for now: the verb form would actually be "to tweet, or not to tweet?" :)


  11. Uh sir you clearly are funded by the USDA-industrial-meat complex. You don't even have plausible denial at this point. The WPF is a well known right-wing front for flesh-peddling corporations and their government cronies.


    I wrote the anon post above, and I gave my name etc. so I don't know why it posted as anonymous. My point was not to defend Taubes as such, but to whine about you raising the 'lack of infinite precision and absolute certainty' issue at all given neither Taubes nor anyone else has put forward such an absurd position of infinite absolutism.

    I get annoyed by throwaways to 'lack of infinite precision and absolute certainty' because it is distracting and a non-issue. It is a point that can be made against prettymuch any claim ever and, to paraphrase Popper, a criticism that can be brought against everything ought not be brought against anything.

  12. Hi Cal,

    Thanks for the clarification and I'm sorry the system shut out your name. I don't think this is a 'distracting non-issue,' however. I was not battling a straw man claiming infinite precision; I was expanding Taubes's point about the need for proper observations that distinguish between competing explanations to say that, on top of this, we need different types of observations in order to build support. This isn't taken as obvious by everyone. Take, for example, the treatment of the Coronary Primary Prevention Trial. This was almost universally hailed as the final proof that "lowering blood lipids" prevents heart disease. It should have been considered one piece of evidence that might support this hypothesis, but could support others, and it should have been maintained that more study is needed to cut at this issue from different angles. Instead, it was broadly concluded that radically different forms of "lowering blood lipids" like not eating butter and eggs would accomplish the same thing. So, I think this is very relevant, not very distracting at all, and not a non-issue.

    I don't think the Popperian paraphrase really applies here because I wasn't criticizing anything. Certainly if I had criticized a study for having less than infinite precision, I should be severely punished, perhaps by having a cow's intestine strung about my neck in the center of a court yard, but I didn't do that.


  13. Imo an apt critique of the CPPT interpretations would probably not be to say 'no experimental result is definitive, there is never final proof, this should be studied from multiple angles, and more relevant research is always potentially needed,' (those are everpresent givens applicable to anything) but instead to point out the specific problems with the conclusions drawn from that trial and specifically how the results could reasonably be explained by or support a particular viable competing hypothesis.

    The former objection would be a distracting non-issue whereas the latter would not. I think you agree with this. But I got a different impression when initially reading "definitive" and "infinite precision" and whatnot in your post, potentially due to my reading undue philosophical positions into what you wrote.

  14. "At the same time, I think it would be somewhat naive to believe there could ever be a single experiment that could definitively distinguish between two competing hypotheses."

    Well ... as Popper noted Einstein derived from his theory three predictions of what should be observable and all of which contradicted Newton's theory.

    So you just had to watch out and see if those observations were as predicted. That's your experiment.

    So in fact I think you can say that "a single experiment" could "distinguish between two competing hypotheses".

    In practice, people tend to try to account for observations that don't match their data by coming up with "ptolemaic epicycles" as Gary explained. Indeed, we might say that some people use such devices to protect their theories. (This certainly happens with ideology - e.g., Marxian, Freudian - which is why Popper wished to say it was not science and looked for a clear demarcation between that which is science and that which is not.)

  15. Chris,

    Thanks for the post. After the post-AHS blog torrent, I had to refresh my knowledge of what Popper actually advocated. I think you are spot on to clarify that Popper's philosophy does not necessarily require a clean-cut deathblow to one hypothesis over another. And as food for though, I would like to add Popper's analogy of biological evolution to describe scientific progress.

    Biological evolution - and the progress of science - is not necessarily survival of the fittest, but rather, survival of the fit-enough for a particular niche (for a scientific theory, niche = current evidence and available methodology).

    A prevailing hypothesis (or theory) of obesity will still have many unanswered questions, and will not meet every hypothetical criteria of causality. Instead, a prevailing hypothesis will just do this better than other hypothesis.

    We are not seeking to prove a hypothesis, but rather, to fail to refute a hypothesis.

    Or simply put, I think we all need to lower our standards a bit :)

  16. Hi Cal,

    I agree with your analysis of the correct approach to take in criticizing the CPPT, but I admit I am rather confused that you are offering this as a criticism of anything I've written. Certainly, when I've written about the CPPT, I've done exactly what you suggest and listed specific criticisms and specific interpretive confounders. And this is how I also approached the criticism of a trial aiming to test the hypothesis I offered above that LDL receptor activity governs the risk of heart disease by promoting clearance of plasma lipoproteins and thus preventing their oxidation.

    You seem to think I am this guy:

    Enter THIS GUY and THAT GUY.

    I believe that lowering blood lipids will reduce the risk of heart disease.

    Prove it!

    THAT GUY conducts a randomized controlled trial with an agent that lowers blood lipids and it successfully lowers the risk of heart disease.

    That's not proof -- there's no such thing as proof!

    If I criticized studies the way you seem to be suggesting I do, I would have to consider the very action of believing anything to be so futile I would give it up entirely, yet I stated a rather specific and detailed biological hypothesis in this post that I believe strongly enough to call it a "contention" of mine, so clearly I don't cling intransigently to some silly belief that all belief is futile.

    We seem to be having some trouble communicating, which is giving birth to the irony that I consider your focus on my phrase "infinite precision" to itself be a distracting non-issue. I devoted two words to this phrase, and multiple paragraphs to delineation of specific interpretive confounders in my example hypothesis.

    My conclusion was not that we can never know anything, but rather than providing clear support for a hypothesis involves more than seeking a precisely discriminating study. It requires multiple studies approaching the issue from different angles, and repetitions of each study. Thus, there is a balance between seeking precision and seeking to reconcile the totality of the evidence, and I think Gary's post, while very illuminating, put too much emphasis on the former at the expense of the latter. We must seek to develop a substantial amount of high-quality discriminating evidence using multiple approaches and seek to incorporate its totality into our analysis.


  17. Can anybody explain to me why people in Asia don't have obesity,diabetes,heart disease epidemic while eating tons of rice and noodles everyday ? What about Italians enjoying pasta and bread with every meal ? Not to mention French with their rich cuisine. Please, I need an answer so I can convince my family members to eat more healthy.
    Thank you.

  18. Elena, I think that lack of certain kind of chronic stress and superior social cohesion are probably factors. Societal factors affect HPA-axis which mediates several inflammatory markers, which also affect oxidation of cholesterol. There are also many other possible factors, like alcohol, quality of food, genetics, amount of low-cardio exercise. However, the rates of for example heart disease are not uniformly reported, esp in the case of France and Japan, because cultural biases. I hope this helped.



To create a better user experience for everyone, comments are now moderated. Please allow up to one business day for your comment to post. In order to avoid the appearance of spam, please avoid posting links, especially to commercial destinations, and using all-caps.