THE ROLE OF BIG DATA ANALYTICS IN GENOMICS Motivity Labs Career November 28, 2022



Big data analytics in genomics is the study of the information in DNA sequences using powerful computational and statistical methods to understand and decode it.  This functional information inside the DNA sequence helps scientists and researchers discover how DNA differences affect human well-being and disease. How big is this data and how does this big data analytics in genomics help find solutions to various life-threatening diseases in humans? Let’s find out. But first, for the sake of perspective, a little science lesson.

The little science lesson:

The human body is made up of cells, billions of them! Each cell contains a nucleus, and inside these nuclei reside chromosomes. These chromosomes contain thread-like structures that contain genes, which are orderly arranged, and contain DNA. This DNA holds the information and instructions that are needed for a human body to develop, reproduce, and survive.   It is estimated that humans have around 20000 genes in every cell of the body. Every person has two duplicates of each gene, one from each parent. The person’s characteristics are determined by the dominant gene of the two copies that are passed on by the parents.  The entire set of DNA in an organism is called a genome. Almost every cell in the body carries a copy of the genome. The genome plays the role of an instruction manual for the human body to work.  It is the blueprint of a building, where the building is the human, it is the musical score with all the notes where the human is the music!  Even though the DNA in a gene is similar between people most of the time, it can occasionally vary between people. This variant does not affect health most of the time, but a few variants can make a human ill.

The Human Genome Project:

In 1990, the Human Genome Project was instigated by the U.S. Department of Energy and the National Institute of Health with significant contributions from various countries like Japan, Germany, France, China, and others. The main goals of this project were to discover the complete set of human genes, determine the sequence of millions of base pairs that constitute human DNA, store the information, improve tools for data analytics, and address the ethical, social, and legal issues that would arise from the project.  The project was completed in 2003 with an accurate and complete human genome sequence obtained and made available for further study to researchers and scientists. The humungous data the project has provided helps develop research tools that will allow scientists to pinpoint the genes that are involved in diseases both rare and common.  Scientists are now generating more genome data from millions of people around the world to study how the genome operates and its effects on human health and disease.  The data of a single human genome is about 200 gigabytes. This big data and analytics in healthcare is a huge achievement initiated by the HGP. As big data analytics in genomics is growing exponentially by the day, scientists are now able to see more clearly the functioning of the human body and the deadly diseases it inherits or develops along.

Genetic disorders:

As learned from the little science lesson above, our genes are made of DNA which contains instructions for the cell on how to behave and what to do, that make one unique. Occasionally, these genes may go, rogue and when that happens, one is in trouble! You receive half of your genes from each parent and may inherit this rogue gene from either or both of them.  Gene mutations can also happen due to several other factors like issues with DNA, lifestyle, health habits, and even environmental factors. All these rogue genes can act strangely other than what they are supposed to do and cause genetic disorders either at birth or develop over time. The most common genetic disorders are Down syndrome, Alzheimer’s disease, Autism, Arthritis, Cancers, Coronary artery diseases, diabetes, and Migraine to name a few.  But what causes the genes to go rogue? The DNA in your genes instructs the body to make different types of proteins that make you stay healthy and fit.  When a mutation occurs, due to whatever may be the reason, the genes’ protein-making mechanism goes for a toss and it loses its ability to produce the required proteins and this causes genetic disorders and diseases. Unfortunately, most genetic disorders do not have a cure and you cannot prevent them even.  Some genetic disorders have treatments to slow down the progression of the disease depending on the severity and the type.  But all hope isn’t lost. Big data and analytics in healthcare, especially big data analytics for genomic medicine have opened the doors for new approaches to treating these genetic disorders.

Big data analytics:

Big data, as the name suggests, is data that is so big that conventional data processing systems and traditional methods can’t handle them. These data sets with large volumes and complexity are stored in data warehouses, data lakes, data pipelines, and Hadoop, an open-source framework that is used to store and process these large data sets that allow clustering of multiple computers for analyzing huge data sets in parallel for quicker results.  Big data analytics refers to the collection and analyses of huge data sets that contain diversified data. These analytics can be very useful for every business ranging from banking and financial services, advertising and marketing, entertainment, and in healthcare it consulting services. Big data and analytics in healthcare it consulting services refer to electronic health records, EHRs that are so large and diversified and gathered from various sources like clinical data, medical imaging, lab results, pharmacy, patient records, and data collected from patient monitoring machines and tools.  This humongous volume of data can be used by data scientists to discover various associations, patterns, and trends to extract various insights for diagnosis and arriving at treatment decisions.  The potentiality of big data and analytics in healthcare it consulting services is immense as the data accumulation when compared to the past,   has been at a rapid pace and in real-time.  The ability to apply real-time data analytics on high volumes of data is revolutionizing healthcare.  Let’s see what big data has to offer genomics.

Big data analytics for genomic medicine:

Let’s go back to the human genome project for a moment. The project gave us the complete draft of 3 billion letters in our DNA. There are 4 letters that represent 4 individual molecules called nucleotides viz., Adenine(A), Cytosine(C), Guanine(G), and Thymine(T).   Almost everyone contains the same 3 billion DNA base pairs. These 3 billion letters are one genome. This genome contains instructions for building and maintaining our body. We all are made of proteins and these genes contain instructions to make different types of proteins that are useful for various bodily functions and growth. Almost 99.9% of our DNA is the same for all humans, meaning humans are 99.9% similar genetically.  But in the remaining 0.1%, some differences can sometimes cause various disorders and diseases.   If we can compare the genomes of various healthy persons with that one genome having a rogue gene that is causing the genetic disorder, we can able to identify the cause for the same. This is where big data analytics steps in. The data about a single genome occupies 200 gigabytes.  It took 14 years to sequence one single genome at US$ 2.2 billion cost with thousands of scientists working in collaboration. From there, things can become a little easier by sequencing 2 genomes by 2007, 9 genomes by 2009, and about 100,000 genomes by 2013. While the first sequence cost was US$ 2.2 billion, the same costed about US$ 3,000 in 2013. With the rapid technological growth, the cost now stands around US$ 600, and can perform genome sequencing in just 5 hours! These statistics would give you an idea of how rapidly we are succeeding in bringing technology to aid healthcare with every day being one step closer to unraveling the mysteries behind the human genome.  The data thus gathered is being subjected to various analytics by various professionals from the associated fields to get a better idea of what’s happening with those rogue genes and what instructions, the supposed gene has to give, were missing.

Big data analytics for genomic medicine works by comparing one DNA with many others to examine whether the subject has any differences that might contribute to a disorder or disease. Once that difference is found, appropriate treatment or intervention can be made by designing personalized strategies for the subject.  Big data analytics in genomics reveals hidden patterns, uncovers unknown correlations, and other valuable insights by comparing and examining large datasets. 

 With next-gen sequencing technologies such as Whole Genome Sequencing, WGS, Whole Exome Sequencing, WES, and Target Sequencing taking giant strides and making rapid progress, applying them to biomedical study and medical practice would result in not only identifying various genetic diseases but also designing precision medicine that would allow medical professionals to predict more accurately what preventive and therapeutic treatments would work best for specific illnesses.

All thanks to big data analytics, without which all this progress in healthcare would not have been possible!

37+ demos
Buy now