- [Voiceover] A critical topic within data science are the ethical issues brought about by the analysis of data in these formats. Data science is a creative field and often uses data in ways that it wasn't really intended. So, you get a few issues that are big. The first of these is privacy. Confidentiality is a major issue and that's information that shouldn't be shared, even if it's not anonymous. The reason this is important is because data science often relies on data sources that weren't designed for sharing.
The second thing in anonymity. The HIPPA regulations have been revised to make it much harder to identify unique individuals within data. Now, that means if you're using publicly available data, it may be much closer to anonymous. On the other hand, proprietary data, where you're working for a client and they're giving you their own data, that may still have identifiers, and in that case, the data is no longer anonymous. Anonymity is a major central element of research ethics and it's something that needs to be considered carefully.
Another one, and it sort of falls into a different category, is copyright. Scraping data from websites where you're pulling images or you're pulling text from a website to get the data from it, this is a common practice. It's important to know that some of them may violate copyright. If the image or the text is copyrighted, you could get yourself in some hot water by accessing that data without permission of the copyright holder. Then I'll finish with a couple of other things that maybe you haven't considered. One is there's potential bias.
The algorithms that are used in data science, of course, are value neutral in and of themselves. They don't have opinions. But the algorithms are only as neutral as the rules and the data that they are given by the programmers. The second thing is overconfidence. Data science analyses are limited simplifications. Every analysis is a simplification. You still need humans in the loop. You have the problem of people believing that if it came through a machine learning algorithm, it must be the truth, it must be the right thing, without realizing both the bias and the problems of interpretation that still result.
People are still needed here. So, our conclusions are that data science has potential and it has risks. We knew that. Importantly, analyses are not value neutral and that personal human judgment is always needed in the planning, the carrying out, and the interpretation of the results of data science projects.
- Assess the skills required for a career in data science.
- Evaluate different sources of data, including metrics and APIs.
- Explore data through graphs and statistics.
- Discover how data scientists use programming languages such as R, Python, and SQL.
- Assess the role of mathematics, such as algebra, in data science.
- Assess the role of applied statistics, such as confidence intervals, in data science.
- Assess the role of machine learning, such as artificial neural networks, in data science.
- Define the components of effective data visualization.