From the course: Ethics and Law in Data Analytics

Bias in data processing: Part 1

From the course: Ethics and Law in Data Analytics

Bias in data processing: Part 1

- There will obviously be any number of differences between human bias and machine bias. And in this course, we are of course, concerned with the bias of machines. But you may be surprised about how large the similarities are and how many they are. So let's begin with a basic distinction in human bias. And then we will use that to help us think about the problem of machine bias. When we speak of human bias, we are either talking about conscious bias or unconscious bias. And a quick programming note, many of you have heard the terms explicit and implicit in describing bias. And I mean the same thing with this distinction. But the terms conscious and unconscious are more descriptive and so I'll use those terms instead. A conscious bias is simply one you are aware of. For example you may be a male hiring manager are convinced that women are less good employees and so you disadvantage them in the hiring process. On the other hand, an unconscious bias is still a real bias but by definition you are unaware that you have it. In fact you might very strongly and sincerely insist that you are not biased. But if the bias is subconscious, it still could be affecting your decision making and you might never realize it. We are here not thinking about the machine equivalent of conscious bias. That would be to use our earlier example, if a male data scientist intentionally designs a hiring algorithm that disadvantages female candidates. It goes without saying that this is extremely unethical and certainly illegal and so must be rooted out. But this is not the present concern. We are worried about the machine equivalent of human subconscious bias. The kind that slips into our mind over time, through our experiences, but by definition, we are not aware of how it affects our assumptions and perceptions of other people. There is indeed a machine equivalent of this kind of bias. Because bias can easily enter data processing without us noticing it, in two ways. First, machines learn their decision rules from the data that we give it. It first identifies patterns in the data and then infers decision making procedures in accordance with those patterns. Now those patterns are probably quite accurate but because machines are really good at their jobs. But if a machine learns from data that was itself the product of bias, whatever historical biases existed are now in the subconscious of the machine so to speak. And this is not some hypothetical future concern. In the previous video, Eva gave some examples of how bias exists in the collection process. This is simply a version of the age old garbage in, garbage out principle. If you feed the machine biased data, what do you think it's going to learn? In human terms, this is like sending a child to school that uses biased textbooks, or introduces them to movies or language that shows racial bias. What do you think the child's going to learn?

Contents