There is a replication crisis in AI. Read why reproducibility in AI in healthcare is critical, and how to facilitate reproducibility in your AI deployments.

The scientific method has been the underpinning of science for the past 200 years, and one could argue that its roots go back much further, all the way to Aristotle. At the core of the scientific method lies the concept of falsifiability. It’s the idea that a scientific hypothesis must be capable of being falsified, or in other words, that it can be contradicted by evidence. For example, both Newton’s laws of motion and Einstein’s general relativity theory made different predictions about the precession of the orbit of Mercury. Close observation and measurement of Mercury’s orbit were consistent with Einstein’s theory; therefore, Newton’s laws were falsified. Of course, we still rely on Newton’s laws for convenience in situations in which the curvature of space-time is not a significant factor.


The scientific method is important in healthcare as well. The formulation of hypotheses about health, disease, and treatments is useful only if they can be tested and if new hypotheses can be developed when one is proven to be incorrect—that is, falsified. Significantly, the rise of evidence-based medicine (EBM) confirms the importance of the scientific method in healthcare. With techniques from hard science, engineering, and statistics, EBM applies the scientific method to the practice of medicine. By using meta-analysis of scientific literature, risk-benefit analysis, and randomized controlled trials, healthcare professionals can make “conscientious, explicit, and judicious use of current best evidence” in their everyday practice, which is the goal of EBM.

AI in healthcare means neural networks

These days, when we talk about AI in healthcare, we’re really talking about neural networks of some kind: convolutional, deep, recurrent, and so on. These neural networks power some of the amazing AI-based applications that are already helping patients and clinicians. They help identify abnormalities in images at the modality in radiology and cardiology and can help flag COVID patients who are at higher risk of requiring mechanical ventilation or dialysis. They give care teams precious hours to administer treatment to help prevent patients from “crashing” from sepsis. These neural networks also help people with medication adherence, power chatbots to answer patients’ questions about their care, and can help fight billing fraud. And that list is just a few of the myriad examples from current clinical practice and research.


Even though neural networks use established and well-understood mathematical tools, such as linear algebra, calculus, and differential equations, there’s still a vigorous debate in many circles about the black-box problem. It’s the idea that we don’t really know how a neural network arrives at its output. Proponents of black-box hand-wringing point out that in deep learning, which is widely used in healthcare AI, the neural networks have many hidden layers. So, we have no real way to know what each node or neuron has “learned” and how it contributes to the ultimate answer that we get from the output layers of the neural network.

Neural networks mimic the mystery of the human brain

Let’s take a step back and remember that neural networks so far are crude approximations of the human brain, in terms of both complexity and metrics such as neurons and synapses (nodes and parameters). Refinements are well underway, however. Consider the groundbreaking GPT-3 language model from OpenAI, introduced in May 2020, which has an unprecedented 175 billion parameters (still not quite the more than 100 trillion synapses in the human brain, but closer). And the pace of development in GPU computation has led some to declare that the era of Moore’s Law over, to be replaced by what some are calling Huang’s Law. It’s therefore reasonable to expect that the complexity of neural networks may approach or even surpass that of the human brain.


Yet the basic fact is that the human brain remains a mystery. Instead of focusing on comparing neural networks with the human brain, in the context of any conversation about black boxes, we should keep in mind that the human brain itself is a black box. We can easily fall into a rabbit hole by following questions such as, “How do we know that we know something?” which are vastly outside the scope of this blog. Nonetheless, it’s crucial not to lose sight of the fact that denouncing neural networks for being black boxes is ironic coming from a species that doesn’t understand its own brain.

Reproducibility is a crisis in AI

Beyond the discussion of how neural networks arrive at their output lies a problem that seems more urgent. Going back to the scientific method and reproducibility, there is a reproducibility crisis in AI. As Science Magazine put it, “The booming field of artificial intelligence (AI) is grappling with a replication crisis, much like the ones that have afflicted psychology, medicine, and other fields over the past decade.” NeurIPS, a premier AI conference, now asks researchers to submit a reproducibility checklist, including items that researchers often omit from papers. Now NeurIPS asks authors to include the number of models that they trained before they selected the “best” one, the computing power that they used, and links to their code and datasets.

NetApp technology facilitates reproducibility

What can NetApp do to help? As data-management specialists, we have data scientists on our teams. They grapple with these problems and develop tools that they know will help the data-science community. One such solution is the NetApp® AI Control Plane. Among other benefits, this solution helps easily create NetApp Snapshot™ copies of entire AI training environments (models and data) for traceability and versioning. It’s a powerful yet simple approach to tackle the reproducibility crisis in AI. AI researchers can easily create Snapshot copies of models and data directly from the environments that they work in. These Snapshot copies require only minimal extra storage capacity and incur negligible performance overhead. AI Control Plane is a powerful tool for AI researchers to keep track of entire environments so that it’s easier to reproduce their results in the future.


Take the first step to facilitate reproducibility in your AI deployments. For in-depth information about NetApp AI Control Plane, read the technical report.


You can also access on-demand content from the NetApp INSIGHT® 2020 digital event. Find sessions about AI, healthcare, AI in healthcare, NetApp AI Control Plane, and much more. To access content, register at no cost.

Esteban Rubens

Esteban joined NetApp to build a Healthcare AI practice leveraging our full portfolio to help create ML-based solutions that improve patient care, and reduce provider burnout. Esteban has been in the Healthcare IT industry for 15 years, having gone from a being storage geek at various startups to spending 12 years as a healthcare-storage geek at FUJIFILM Medical Systems. He's a visible participant in the AI-in-Healthcare conversation, speaking and writing at length on the subject. He is particularly interested in the translation of Machine Learning research into clinical practice, and the integration of AI tools into existing workflows. He is a competitive powerlifter in the USAPL federation so he will try to sneak early-morning training in wherever he's traveling.

Add comment