From the course: Faster pandas
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
The limitations of object dtype
- Pandas and Numpy, has a lot of specialized types for fast processing. For example, Numpy int64 takes less memory and can be used directly by the CPU. There are cases when loading data that pandas won't be able to guess the type, and then it will default to a Python type, which has the object dtype. Let's have a look. iPython, we can import Pandas as pd and then df = pd.read CSV off the logs that we have. And now we can look at the dtypes with df.dtypes. And you see that origin, date time, method, and path are object. Especially date and time should probably be some kind of a timestamp and not an object. So we can have a look, df at date and time.head. We can convert them to a timestamp using either pandas to daytime or while reading the CSV with the option pause dates in the C-suite. Let's see what's the performance implication of the object dtype. Say you would like to know how many unique time values there are. So we're going to do df time.nunique. and have 6533 unique times. Now…
Contents
-
-
-
-
-
(Locked)
The limitations of appending3m 28s
-
(Locked)
The limitations of object dtype2m 21s
-
(Locked)
The limitations of row iteration3m 18s
-
(Locked)
Understanding the isin function4m 39s
-
(Locked)
Parsing time once2m 42s
-
(Locked)
Challenge: Query a DataFrame1m 38s
-
(Locked)
Solution: Query a DataFrame1m 29s
-
(Locked)
-
-
-
-
-
-