From the course: Data Ingestion with Python

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Finding outliers (manual)

Finding outliers (manual) - Python Tutorial

From the course: Data Ingestion with Python

Start my 1-month free trial

Finding outliers (manual)

- [Narrator] Outlier is a data point that differs significantly from others. Basically, bad data. Pandas makes it easy to find these values and replace them. Let's have a look. So we start ipython. Then we import pandas as pd and our data is going to be in an SQLite database, so we import sqlite3. Connection is sqlite3.connect and the database is rides.db and our dataframe is pd.read sql, select star from rides and we give it the connection. And we have 10,000 taxi rides. And we can be nice and close the connection, since we don't need it anymore. So if you look, for example, on the df trip distance, on the 90% quantile, we see that it's about seven miles. However, if you look at the maximum value of the trip distance, this is 932.9 miles, which seems, not like a real taxi ride. So we're going to change all the rides that are above 100 miles. So first we need to find it, so the mask is dataframe trip distance is bigger than 100. And we have seven of these rides. Now let's take a fill…

Contents