In this video, learn about schemas.
- [Narrator] A schema defines the column names,…and then what data type they are.…So, for example, are the columns going to have…IntegerTypes, StringTypes, DayTypes, and so on.…In Spark, we just type df.types or df.printSchema.…It's very similar to what you would have…done in pandas, with the dftypes.…Spark can infer the schema by default.…Spark takes a look at a couple of rows of the data,…and tries to determine what kind of column each should be.…What I found is that in a production environment,…you want to explicitly define your schemas.…
Defining schemas in Spark is easy.…You just need to remember a couple of things.…You need to import the different types…from pyspark.sql.types.…A schema is a StructType made up of…a number of fields of type StructFields,…so each of the StructFields have three components.…That's the name of the column, the type of that column,…so is it a string, a float, and so on.…And finally, if that column can…contain missing or null values.…You can also optionally specify metadata,…that history information about that specific column.…
- Benefits of the Apache Spark ecosystem
- Working with the DataFrame API
- Working with columns and rows
- Leveraging built-in Spark functions
- Creating your own functions in Spark
- Working with Resilient Distributed Datasets (RDDs)
Skill Level Intermediate
1. Introduction to Apache Spark
2. Technical Setup
3. Working with the DataFrame API
5. Resilient Distributed Datasets (RDDs)
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.