Start free trial Sign in

From the course: Building Recommender Systems with Machine Learning and AI

Fraud, the perils of clickstream, and international concerns - Python Tutorial

From the course: Building Recommender Systems with Machine Learning and AI

Start my 1-month free trial Buy for my team

Fraud, the perils of clickstream, and international concerns

“

- Another real-world problem is people trying to game your system. If items your recommender system promotes to your users leads to those items being purchased more, the makers of those items have a financial incentive to game your system into recommending their items more often. Or people with certain ideological agendas might purposely try to make your system recommend items that promote their own ideology, or to not recommend items that run counter to it. Some hacker might even be bored and try to create humorous pairings in your recommender system, just for their own amusement. Google bombs are an example of this. Fighting people like this is generally a never-ending arms race, but there is one technique that works remarkably well. Make sure that recommendations are only generated from people who actually spent money on the item. Recommendations based on implicit ratings from purchasing data are almost impervious to these sorts of attacks because it would be prohibitively expensive for someone to buy enough of an item to artificially inflate its presence in your recommendations. And when people vote with their wallets, it's a very strong and reliable indication of interest that leads to better recommendations overall. Sometimes, however, you don't have enough purchase data to work with. There are still precautions you can take, however. For example, if your recommender system is based on star reviews, you can make sure that you only allow reviews from people you know actually purchased or consumed the content in question. If you allow people to rate items they haven't actually seen or used, you're opening yourself up to all sorts of attacks. And using click data should always be a last resort. It's very easy to fake click data using bots, and even if it's not a bot, click data has its own set of problems. Using implicit clickstream data, such as images people click on, is fraught with problems. You should always be extremely skeptical about building a recommender system that relies on only things people click on, such as ads. Not only are these sorts of systems highly susceptible to gaming, they're susceptible to quirks of human behavior that aren't useful for recommendations. I've learned this the hard way a couple of times. If you ever build a system that recommends products based on product images that people click on when they see them in an online ad, I promise you that what you will build will end up as a pornography detection system. The reality is people instinctively click on images that contain images that appear sexual in nature. Your recommender system will end up converging on products that feature pictures that include a lot of flesh, and there won't be anything you can do about it. Even if you explicitly filter out items that are sexual in nature, you'll end up discovering products that just vaguely look like sex toys or various pieces of sexual anatomy. I'm not making this up. I'm probably not at liberty to talk about the details, but I've seen this happen more than once. Never, ever build a recommender system based on image clicks. And implicit data in general tends to be very low quality, unless it's backed by a purchase or actual consumption. Clickstream is a very unreliable signal of interest. What people click on and what they buy can be very different things. We're not done yet. There are just a lot of little gotchas when it comes to using recommender systems in the real world and I don't want you to have to learn them the hard way. Another consideration is dealing with international markets. If your recommender system spans customers in different countries, there may be specific challenges that you need to consider. For example, do you pool international customer data together when training your recommender system, or keep everything separated by country? In most cases, you'll want to keep things separate, since you don't want to recommend items in a foreign language to people who don't speak that language, and there may be cultural differences that influence people's tastes in different countries as well. There is also the problem of availability and content restrictions. With movies in particular, movies will often be released on different schedules in different countries, and may have different licensing agreements, depending on which country you're in. You may need to filter certain movies out based on what country the user is in before presenting them as a recommendation. Some countries have legal restrictions on what sort of content can be consumed as well, which must be taken into consideration. You can't promote content about Nazi Germany within Germany, for example, nor can you promote a long list of political topics within China. Since your recommender system depends on collecting data on individual interest, there are also privacy laws to take into consideration, and these too vary by country. I am not a lawyer, and these laws are changing all of the time, but you'll want to consult with your company's legal department and IT security departments to ensure that any personal information you're collecting in the course of building your recommender system is being collected in accordance with international laws.

Contents