Data Scientist Explains How AI’s Seductive Power Can Mislead Biomarker Researchers by Forbes – Entrepreneurs

Serebral360° found a great read by Forbes – Entrepreneurs article, “Data Scientist Explains How AI’s Seductive Power Can Mislead Biomarker Researchers.”

Add another layer to your #Business literacy. We at Serebral360° would love to know if the Forbes – Entrepreneurs article was helpful, leave a comment, like and share. Let’s dive in and discuss the information and put it to use to grow your business. #BusinessStrategy #ContentMarketing #WebDevelopment #BrandStrategy 762.333.1807
Grap a copy of our NEW Business Stratgety Books #FFSS VOL1 and #FFSS VOL2

Happy 2019! As regular readers know, I’ve been writing quite a bit over the last year about the opportunities and challenges associated with bringing advances in data and digital to bear on the discovery and development of impactful new medicines. I’ve been struck by the potential of many of these powerful approaches, tools, and techniques, but underwhelmed by the drooling that’s often accompanied them.

An important theme of this column has been that despite what seems like exceptional potential, the impact of data science and digital on drug discovery and development to date has been conspicuously limited. This may reflect the extravagant expectations around big data, which has become viewed as a self-evident religion (preached by managerialist consultants), rather than as a potentially useful tool that must rigorously prove itself in context, as I recently discussed.   I’ve also examined the impact of cultural factors (and how the culture of data science differs from that of pharma), here; the challenge of AI black boxes, here; and the importance of understanding the difference between invention and implementation (here). I’ve also highlighted opportunities around real world data (here, here, here, here) including the likely extremely positive impact of the recently-announced appointment of Deputy FDA Commissioner, Dr. Amy Abernethy, a deeply thoughtful tech-savvy oncologist, here.

Today, I am writing about where the rubber squarely meets the road in AI and drug development: the use of algorithms to mine big data for informative patterns. In particular, I share what I believe is an extremely important, fairly critical, ultimately constructive perspective on the challenge of developing valid diagnostic biomarkers, presented by one of the most thoughtful data scientists I know, Imran Haque. His story is below.

Imran Haque: Grounded Data Scientist

I first met data scientist Imran Haque when he was a scientist at Counsyl, a genetic testing company focused on carrier screening. I had no connection with the company – I wasn’t a collaborator, investor, or client – but I was very impressed by what I had heard about their product from colleagues in each of these categories, and I was particularly impressed by the conversations I had over the years with an exceptionally savvy scientist there – Haque — who had recently completed his PhD in computer science at Stanford, and seemed well-versed in both biology and common sense.

Data scientist Imran Haque, Ph.D. (Photo courtesy I. Haque.)I. Haque

I was especially intrigued by work Haque presented (and subsequently published) questioning the standard recommendation to employ carrier screening only for relatively common conditions like cystic fibrosis (CF), but not for a range of other similarly-severe conditions that individually occur much less often. Haque’s analysis demonstrated that while any individual rare disease is, indeed, uncommon, the chances of a child having at least one of a number of these diseases are actually higher than the odds of having CF. He additionally showed that the utility of CF-only screening guidelines was highly dependent on the ethnicity of the screened population; for example, because CF is far less common among East Asians than other “rare” genetic diseases, existing screening guidelines missed about 94% of affected pregnancies in this group, according to Haque. Consequently, he argued, you could make an argument that if you’re going to recommend CF screening on the basis of its frequency, it might make sense to consider screening for a broader range of rare genetic conditions as well.

After over five years at Counsyl – and several promotions, ultimately to V.P., Scientific Affairs – Haque left Counsyl to become chief scientific officer at high-profile cancer screening company Freenome (the high profile cancer screening company that’s not Grail).

While there, he co-authored with Olivier Elemento what struck me as a bombshell paper demonstrating the limitations of screening the blood for DNA derived from very early cancers (“ctDNA,” for “circulating tumor DNA”). As will be discussed in more detail below, they showed that the actual number of cancer DNA molecules available was so vanishingly small that you’d need to start with an unwieldly quantity of blood to have a good chance of collecting even a single cancer DNA molecule.

Haque remained at Freenome for nearly two years, departing in Autumn 2018; the details of the separation are not known to me, and are not the focus of this post.

In October of 2018, Haque – representing himself, and not any entity with which he’s been associated, he emphasized – delivered a captivating talk at a cancer big data conference in Rhode Island, weaving together several topical themes into what struck me as a particularly important, cautionary perspective on the application of AI and big data to biology in general, and biomarker identification in particular (mostly diagnostic biomarkers).

Haque didn’t mince words. He introduced his talk as an exploration of “the ways in which current applications of machine learning and statistics fail when applied to biomarker discovery.”

He frames the discovery of diagnostic biomarkers as proceeding the one of two general routes, mechanistic and empiric.

Mechanistic Biomarker Discovery

In mechanism-based approaches, you begin with a fairly developed sense of how a pathological process works; this understanding leads to the directed development of an assay, at which point you “pray that it generalizes.”

The prismatic mechanistic example here, says Haque, is ctDNA. As he describes it, the logic here is that:

(1) Tumor cells have DNA mutations which most normal cells do not;

(2) Cells, both normal and tumor, occasionally shed DNA into the blood;

(3) Thus it should be possible to identify ctDNA, even though likely rare, and use this info to provide an early diagnosis of cancer.

That’s the mechanistic reasoning. Unfortunately, as he proceeds to demonstrate using the analysis from the (previously cited) paper with Olivier Elemento, a problem arises. Their study revealed that for most early stage tumors, only a tiny amount of the total circulating DNA is derived from the tumor; most comes from other (normal) cells. So, in some cases, you would need to draw an impractical amount of blood – 80ml – to get just a single ctDNA molecule, and you’d presumably aspire to capture more than a single one to build a reliable test.

You might think you could mitigate this problem by testing for several tumor mutations simultaneously; the logic is that you’d be more likely to see at least one of many rare mutations than you are to see one specific rare mutation. But it turns out that actually, many normal cells happen to harbor mutations typically associated with cancer cells – 1% of healthy colon crypts (a region of the tissue) contain such “driver” mutations, according to a preprint cited by Haque. Thus, the more broadly you sample for driver mutations, say (vs testing for a single one), the greater your chances of surfacing a false positive – in other words, you lose specificity.

In short, he concludes, using biological understanding to identify biomarkers is hardly foolproof; in this case, you lose specificity, because it turns out the biology is more complex than originally assumed, and there are driver mutations in normal cells; and you lose sensitivity, because of physiological constraints – not enough ctDNA in sample. The bottom line, Haque says: “it’s never quite as rosy as it starts.”

Empirical Biomarker Discovery

Next, Haque considers a contrasting route, empirical discovery. In this approach, you don’t presume deep biological understanding, and hope to learn through a broad-based discovery process. For example, in tumor diagnosis, you might decide that a tumor might either secrete unique proteins or cause the body to secrete unique proteins, and thus you can distinguish between the presence and absence of a tumor by the patterns of proteins detected – but you don’t know which proteins, or how the pattern will shift. In other words, the extent of your biological knowledge at the start is that “protein profiles should be different in these two conditions.”

The game plan for these situations, Haque says, generally consists of four steps:

(1) Select a high-content assay (protein array, mass spec, etc….)

(2) Collect “a lot” of samples

(3) Something – lately, machine learning (ML)

(4) Voila- biomarker!

This may be a viable strategy in theory, and has worked for a number of ML applications in non-biological domains. But the problem in this case is that sample collection is actually relatively costly (vs crawling web pages, say), and the scale of samples you realistically wind up with is far, far less – many orders of magnitude less – than what you need for a suitably robust analysis.

In practice what you wind up doing, Haque says, is to begin with enriched case-control cohorts (patients with and without cancer, say), identify potential biomarkers (“patterns”) in these patients, then “validate” these (his air quotes) using progressively larger cohorts.

This carries no guarantee of reliability, he contends, invoking the experience of PLCO, a study that used this approach to identify 28 potential protein biomarkers for ovarian cancer. Unfortunately, none proved better than the current (highly imperfect) gold standard of CA125. Moreover, combinations of selected biomarkers didn’t appear to improve performance.

Haque said that four sources of difficulty observed in PLCO continue to challenge empirical discovery today:

  • Technical variability (characteristics of individual biomarker assays);
  • Population variability between cases and controls (cohorts may have been different in underappreciated ways);
  • Biological variability between the enriched case-control studies and the validation cohorts (patients who have already been diagnosed with cancer may be fundamentally different from patients who are just being screened, and do not yet carry a diagnosis);
  • Non-independence (you don’t know potential biomarkers reflect independent attributes).

Thus, Haque emphasizes, while the idea of empirical discovery is exciting, pragmatic problems – generally around the scale of samples available – fundamentally limit the actual utility of this approach. It’s not that the ML doesn’t deliver a result – it does. (I suspect this is why there are so many companies in this space pitching their wares.) The challenge is that the intrinsic likelihood that these positives will robustly generalize to a relevant population is incredibly low. Stated more directly – the vast majority of the “signatures” demonstrated by these approaches are likely to be false positives. As Haque again concludes, “it’s never quite as rosy as it starts.”

An Integrated Solution?

Haque offers a Harry Potter analogy, that perhaps some readers, and my daughters, will appreciate. We wish that ML approaches worked as effectively as Hermione’s magic, Haque says, but more often, you might initially conclude it works like Ron’s (not too well). But that’s not even the best analogy, Haque continues. “I’d argue that ML methods are more like Gilderoy Lockhart: big fakers, unless you can pin them down.”

It’s not that ML practitioners are dishonest, Haque emphasizes. Rather, it’s the techniques themselves – and “the reactions their results elicit from those who use them” (his words). In other words (my translation), the techniques seems to induce overreaching impulses in those who use them, so perhaps the actual analogy should be some kind of dark magic, or (my daughter tells me) the presence of a horcrux?.

As Haque explains it, with ML, you nearly always get some result, but it almost inevitably will be overfit to hidden variables in your training set. “A useful heuristic” he says, is that “statistical methods will always take the easiest way out to the answer you ‘want’: right for the wrong reasons looks right in your small data set, even if it is stops working in a larger set.”

One possible solution, Haque suggests, is that “instead of designing a (ML-driven) discovery project to work, you design it not to fail – specifically design around all the ways that it might cheat.”

Here’s where mechanistic approaches may come into play. Haque’s proposed solution is to strive to integrate ML and mechanism, where mechanism acts as a constraint, profoundly limiting the degrees of freedom in a way that makes it possible to solve your problem with an achievable amount of data. In particular, Haque says, mechanism “allows us to define negative and positive controls – which provide invariants you can enforce on or teach to a model.”

Bottom Line

The remarkable progress that’s been made in machine learning must be useful for more than identifying cat pictures on the internet. Biology and medicine offer no shortage of important classification problems begging for improved solutions. As Haque’s experiences show, translating ML approaches to contemporary challenges in clinical diagnosis is not just difficult, but really tricky; the most significant challenge may well be that the power of ML permits us to convince ourselves we’ve found something significant, because it’s relatively easy to find something. The hope Haque describes is that the judicious integration of ML and mechanism-based approaches will result in an implementable solution that will advance science and benefit patients.

January 3, 2019 at 08:02PM
Forbes – Entrepreneurs