From the course: Python Functions for Data Science

Create a pandas DataFrame - Python Tutorial

From the course: Python Functions for Data Science

Start my 1-month free trial

Create a pandas DataFrame

- [Instructor] As a reminder, the panda library is a powerful open source tool commonly used by data scientists it supports data manipulation and data analysis. And the two primary data structures of pandas are series and DataFrame. A DataFrame is a two-dimensional labeled data structure with columns that can hold any data type. In this video, I'll be demonstrating a handful of approaches to create DataFrames using functions from pandas. First, I'm going to import the libraries I'll need in this notebook. I'll import NumPy and give it the alias np and I'll import pandas and give it the alias pd. Let's say that I have a CSV file named grades.csv that contains students' grades across five exams in a particular course. To read that CSV file and create a pandas DataFrame consisting of the data lying in the CSV file, I can use the read_csv function from pandas. I'll call read_csv, pass in the name of the CSV file and save the result in a variable named grades. It would look like this. Now, in this cell, I'll type in grades and run the cell to see the pandas DataFrame that I created from the CSV file. If I want to view just the first few rows of the DataFrame, I can use the head function like this. Another way to create a pandas DataFrame is by calling the DataFrame function and passing in data. The data can be in the form of any of the following. A Python dictionary of one-dimensional NumPy arrays, lists, dictionaries or series. A multi-dimensional NumPy array. A structured or record array. A series. Or another DataFrame. Let's say that I want to create a pandas DataFrame from a Python dictionary of series. I would use the DataFrame function. It would look something like this. As you can see, in this example, the dictionary's keys became the DataFrame's column labels and the dictionary's values became the DataFrame's column values. Also, the index given for both of the series and the dictionary formed the DataFrames' index. Note that the DataFrame's index can be considered row labels. Next, say I want to create a pandas DataFrame from a Python dictionary of lists. I would use the DataFrame function again. It would look something like this. As you can see, in this example, the dictionary's keys became the DataFrame's column labels and the dictionary's values became the DataFrame's column values. Also, the DataFrame's index was specified in the function call. Then say I want to create a pandas DataFrame from a multi-dimensional NumPy array. I would again use the DataFrame function. It would look something like this. As you can see, the rows of the multi-dimensional NumPy array I specified in the function call became the DataFrame's row values. The index I specified became the DataFrame's row labels and the keyword argument columns that I specified became the DataFrame's column labels. And that's it. You've seen the most common approaches to creating pandas DataFrames. Keep these in mind when you want to store your data in a pandas DataFrame.

Contents