Medicine is awash in data! Never before have we had access to so much health data. Data science is currently experiencing a boom in biomedical research and applications, which in a few years will likely become a permanent equipment in the medicine of the future.

However, as we were fascinated by the merging possibilities in genomics after the first human genome was decoded in the 90s, today we are still a long way from being able to infer the exact occurrence of complex diseases based on a single person’s genome. This is probably because complex diseases such as cancer and diabetes are caused by interactions among millions of base pairs in the human DNA, rather than by changes of individual genomes.

Exploring and analyzing these interactions require vast amounts of data that is now available. On one hand, we are currently observing an exposition of data volume in multiple dimensions and, on the other hand, it is now possible to sequence billions of base pairs of a genome in a matter of days. This opens completely new perspectives that were beyond imagination a few years ago.

Nowadays, researchers are shifting their attention from individual genomes to the DNA of humankind as a whole, i.e. at the population level.

At the same time, heath surveillance is also shifting from sporadic measurements, such as annual check-ups, to continuous real-time measurements thanks to wearable devices, such as connected watches and smartphone apps, which track all kinds of data ranging from pulse to footsteps. Along with these new possibilities, hospital patient files are also increasingly digitalized. All this data harbors great potential: when used in a focused manner, these data could help in creating personalized therapies and would typically enable doctors to intervene before a patient’s medical condition deteriorates.

To this purpose, novel algorithms must be able to spot statistically significant patterns in enormous collections of data to isolate relevant from irrelevant data. In other words, they must be able to distinguish significant correlations out of random correlations in patient data to properly identify occurrences of diseases. Systems like this requires machine learning techniques, an important branch of data science. The idea behind “machine learning” is for programs to recognize pattern and rules from a given dataset and to continuously learn to do so better and better over time. The larger is the training datasets, the better the algorithms become. Thanks to the advances in genomics and the increasing digitalization of patient data, data science is now becoming more and more relevant for medicine.

Experts agree that one of the greatest challenges for data science applications, in healthcare but also in other domains, is the lack of data harmonization.

Anonymized datasets obtained from different hospitals are often not directly comparable. Fortunately, the Swiss Personalized Health Network (SPHN) aims at ensuring data interoperability, as well as to facilitate data transfer between hospitals and research institutions.

At the same time, there are also other challenges when dealing with health data. The primary concern has always been confidentiality and data security, rather than recording and structuring all relevant data for mote in-depth data analysis. Therefore, data scientists and engineers need to show more clearly how they can support medical professionals without disrupting day-to-day routines or jeopardizing data security.

But one thing is for sure: even if the rise of data science in hospitals will take time to be fully accepted, data science will not only produce a new generation of doctors, but also create a new kind of medicine.

 

References: