- [Instructor] One of the most amazing things about the parallel growth of data science and genetics has been the growth of data. Really, we have an embarrassment of riches now. Think about it. The human genome has about 40,000 genes and that in turn is made up of about three million base pairs. And the very first time that research were able to complete the sequence of the human genome in 2001, this was the result of several years of enormous amounts of work. It was gargantuan procedure.
But in the years since then, researchers have sequenced thousands of genome and it's becoming relatively simple task for a lab to do this all on their own. And what this means is that genetic data is now produced much faster than it can be organized or analyzed or interpreted or implemented. Part of this is that the methods for structuring data, that is, having a shared way of putting it together and making it easy to transport between labs and between researchers, as well as methods of communicating the findings lag dramatically behind the development of the ability to get data.
Data's a good thing, but you have to be able to make sense of it. And so we've got a lot of data there that is potentially of great use. There's a lot of untapped potential, but let me talk about a few of the things that had been particularly successful in using genetics and the data science of healthcare. Number one is predicting disease risks through genetics. When most people think of genetics and disease, they think of single gene causes like the gene for cancer. But things are rarely so simple.
Most diseases involve what are called polygenic interactions, or the combined effect of many genes, possibly thousands all at once. The enormous increase in the amount of data available, along with the development of data science have made it possible to detect many of these polygenic effects. So they can actually now say, this combination, for instance, of 400 genes collectively work to predict a particular disease. There's a unique study on Parkinson's disease, conducted as part of the Parkinson's progression markers initiative, that's the PPMI, and that made it possible for researchers to combine data from genetics, from imaging, from clinical data, and from demographics data to altogether develop a highly accurate model for predicting a patient's risk of Parkinson's disease.
Similarly, researchers at other institutions have also made progress on mining DNA data to predict diseases ranging from coronary artery disease to breast cancer to diabetes and a host of others. But it's important to remember that DNA is not destiny. Even in identical twins who share the same DNA, diseases do not show up the same. Really, what data science (mumbles) do is they give probabilities but not a concrete this is the way it is. And that's important to remember because not only are there test for things like certain kinds of cancer or Parkinson's, but also researchers have been developing test for things like IQ and mental illness.
Some of which, because they're based on DNA, can be administered while the embryo is still in the womb. And this raises a huge number of ethical dilemmas. We'll talk about some of these later, but it recalls the movie Gattaca where people were categorized and really channeled according to their genetic potential. It's important to remember that in something like identical twins, who again, have 100% the same DNA, things like schizophrenia, major mental illness, only 50% concordance.
So what that means is if one twin has developed schizophrenia, there's only a 50% chance, a flip of a coin, that the other one has it as well. So it's important to be a little humble, a little circumspect when interpreting the probabilities that come from predictive analytics for diseases through DNA. And there's also, along with that, a need to focus on controllable factors, especially when you look at things like heart disease. There's so many things that a person can do, like diet and exercise, that can help reduce the person's risk, even if it never changes their DNA.
But keeping that all in mind, the ability to gather huge amounts of genetic data and the mechanisms that are making it easier to sort through potentially trillions of data points, to try to find an association with the disease are bringing exciting potential for finding and diagnosing diseases and understanding their nature through the combination of genetics and data science.