From the course: Data Ingestion with Python

Unlock the full course today

Join today to access over 22,700 courses taught by industry experts or purchase this course individually.

What should be in schema

What should be in schema - Python Tutorial

From the course: Data Ingestion with Python

Start my 1-month free trial

What should be in schema

- I hope you're convinced that you need to have a Schema without doubt. But what should go in it? I say, everything has to make sense of the data. Here are some parts to consider, description, some text about what this data is. In our example, PGTN should be spelled out as Pick Gust Type. Types, what's the type of the data? Is it a integer, a float, a text, units. What are the units of the data? In our case, temperature is a tenth of a Celsius. Constraints, the lowest ever recorded was minus 89.2 or about 60% Celsius. The highest recorded was 57.8 celsius. However, if you measure agent temperature, those limits will differ. Constraints between fields, you can't have snow when temperature is above a certain point. Relation, what contains what? Is it a one to one or one to many relation. Anything that can help you make sense of the data should be there. Don't get to the state where you don't know what the piece of data means and how to check it's quality.

Contents