Brief History of DNA/Genomics

Although it’s been almost 70 years since Watson, Crick, Wilkins, and Franklin elucidated the model of DNA (deoxyribose nucleic acid), research in genomics has picked up only in last two decades or so. The two important hurdles for the technology were the high cost of sequencing, and the lack of computational infrastructure to derive usable insights from the huge amount of information available in genomes.

 

As a measure of the amount of data that genomics can produce, consider that each somatic human cell contains a 2-meter-long DNA containing 3 billion nitrogenous bases wired along the backbone of sugar-phosphate molecules. Although it could theoretically be represented in 750 MB of data per human, whole genome sequences can practically occupy up to 100 GB per human, to account for real-life error-proofing redundancies.

 

Over the last 2 decades, research in genomics has more than compensated for the initial growth delays. Experts observing the exponential growth of data generated in genomics predict that growth to continue, surpassing even other large data stores such as online streaming and astronomy. They predict that the genomics industry may require 2 to 40 exabytes of storage capacity by 2025.

 

Figure 2: Growth of DNA Sequencing (© 2015 Stephens et al, Source)

 

Growth of DNA Sequencing infographic

Here are some possible reasons for the explosive increase in genomics research:

  • Innovations in sequencing technology (Next Generation Sequencing) and associated reduction in cost of sequencing (it can cost as low as $300 per whole genome sequence for humans)
  • Availability of high computational power in CPUs (central processing units) and GPUs (graphical processing units) required for High-Performance Computing (HPC) infrastructure
  • Artificial Intelligence (AI) and Deep Learning algorithms for deriving insights from the large data
  • Availability of faster storage media such as solid-state drives and stripping technologies such as RAID for providing data at a fast speed to the AI engines
  • Development of high performance and parallel computing technologies, such as Remote direct memory access (RDMA) over Converged Ethernet (RoCE) for moving data very fast
  • Social and economic impact of epidemics such as SARS, MERS, COVID-19, etc.
  • Public interest in Personalized Medicine

The genomics market

Genomics research is driving a large market. According to an estimate by Precedence Research, the global genomics market size was valued at US$20.06 billion in 2020 and is expected to hit more than US$72.13 billion by 2030.

 

As shown in Figure 3, genomics research can help many use cases across many industries:

Genomics use cases (infographic)

We all know the core role that genomics research is playing in dealing with the COVID-19 pandemic. In last 14 months or so, there have been thousands of publications around COVID-19 genomics alone, not to mention countless webinars and other online meetups.

 

Here are some IT use cases for genomics workloads

  • High-performance computing
  • primary and secondary genome processing
  • Moving data seamlessly across multiple locations and multiple clouds
  • Securely sharing genomics data across researchers, EHRs, patients and other stakeholders
  • cost reduction and optimization of storage requirements for archives

Why FlexPod for genomics?

FlexPod® solution from Cisco and NetApp can serve as the IT backbone of genomics research. NetApp and Cisco validate infrastructural hardware components, technologies, and software. This validation makes deployment and management of the IT infrastructure significantly less risky.

 

Just as nucleic acids, sugar, and phosphate are the building blocks of DNA and RNA, FlexPod is composed of building blocks from Cisco and NetApp. ONTAP provides the storage layer, and Cisco UCS blades or rack servers and MDS and Nexus switches form the compute and networking layers of a FlexPod unit.

 

FlexPod is a total solution for genomic data management that provides one seamless platform for simplicity and speed for genomics workloads. Figure 4 illustrates FlexPod’s value for genomics workloads.

 

Figure 4: FlexPod for genomics Infographic

 

In a nutshell, genomics expects IT infrastructure to solve these challenges:

  • High storage requirements and associated costs
  • Need of heavy AI power for crunching large data
  • Ability to share data safely and seamlessly across multiple premises and multiple clouds

FlexPod addresses those challenges by:

  • Common storage platform across on-premises and private and public clouds
  • Opex and capex savings
  • Innovations like AFF, NVMe, and FlexPod AI
  • Multi-cloud connectivity
  • Ability to comply with healthcare security regulations
FlexPod Datacenter
A FlexPod unit

It has been already established that FlexPod can improve a healthcare organization’s EHR and medical imaging datacenter and help reduce clinician burnout, increase patient safety, and enable doctors to deliver better, faster, affordable care. This infographic is the latest of the many resources that FlexPod has produced for the life sciences and healthcare industry, to describe how FlexPod can also benefit a life science organization for genomics research. Thereby, FlexPod offers value to all IT infrastructural use cases in healthcare and life sciences.

 

To learn more, watch the FlexPod session at NetApp Insight 2020 Enabling Healthcare Digital Transformation with FlexPod (session SPD-1149-1) and review other resources for the healthcare and life sciences industry:

Dr. Nikhil Joshi

Dr. Nikhil Joshi is MBBS (degree for physicians in India) and M Tech in Biomedical engineering from Indian Institute of Technology, Bombay, India. Prior to joining NetApp, Nikhil has worked for solution management and development of multiple healthcare applications for over a decade, and observed how the EHRs and the underlying IT infrastructure can influence lives of physicians, clinicians and non-clinical staff as well as their ability to deliver care to patients.

Add comment