From the course: Effective Serialization with Python

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Detect encoding

Detect encoding - Python Tutorial

From the course: Effective Serialization with Python

Start my 1-month free trial

Detect encoding

- Most of the time input data will be bytes. And in order to convert it to a stir, you need to know the encoding. How can you know the encoding? Some protocols provide a way for you to know the encoding. For example, in HTP there's a content-type HTP header. So curl, which is a command line HTP client. They say, show me the headers, and https://www.linkedin.com. And we are going to pipe it through head to see just the beginning of the data that is coming in. And we see that LinkedIn is sending us text HTML document, which is an HTML document, but it also says that the character and coding is utf-8. In other formats, such as Jason, you know in advance because Jason by definition is utf-8. In other cases you might not be that lucky, and you need to guess the encoding. You can use the external chardet package to guess the encoding. So python-m pip install chardet. And after it's installed we can play around with it. So…

Contents