From the course: Data Ingestion with Python

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Unstructured text

Unstructured text - Python Tutorial

From the course: Data Ingestion with Python

Start my 1-month free trial

Unstructured text

- [Instructor] Sometimes, data is written in a way that's easier for humans to understand. This is called unstructured data or semi-structured data. The usual tool for these situations are regular expressions. I won't teach you regular expressions here, see our classes on the subject. Regular expressions have a bad reputation since, once they are written, it's hard to understand what they do. For example, here's a regular expression to pass email addresses. Even knowing what it passes, it's hard to understand. I usually starts with sites like pythex. I copy over some of the lines from our logs and then start constructing the regular expression. So I want 'of' and then number of passengers, then 'started at' and then everything that is not a space to capture the date. Then 'paid', then I need a dollar sign and then some digits followed by dots, followed by dot, followed by some more digits. And now it looks good, right? So I have one passenger, the date and the price and here, five…

Contents