Easy-to-follow video tutorials help you learn software, creative, and business skills.Become a member

Operating on character data with bytes and byte arrays

From: Python 3 Essential Training

Video: Operating on character data with bytes and byte arrays

Bytes and bytearrays are like tuples and lists except instead of containing arbitrary objects, bytes and bytearrays contain bytes. 8-bit words of data. An 8-bit word of data can hold up to 256 different values and this is sometimes a very convenient thing. In particular, it's convenient for converting strings and this is where you will see it used often. You will see it used for other binary things as well but it's often times used for converting strings. And we have a great example of this right here.

Operating on character data with bytes and byte arrays

Bytes and bytearrays are like tuples and lists except instead of containing arbitrary objects, bytes and bytearrays contain bytes. 8-bit words of data. An 8-bit word of data can hold up to 256 different values and this is sometimes a very convenient thing. In particular, it's convenient for converting strings and this is where you will see it used often. You will see it used for other binary things as well but it's often times used for converting strings. And we have a great example of this right here.

This is a text file that I created for this purpose and when I created it on my Mac, it had this lovely little pattern of international characters that makes a little picture. It's a little viral thing that had been floating around on Facebook that I got and I thought it would be great for illustrating this problem, because there are some circumstances where you cannot display it and it doesn't look right, or if you try to read it as ASCII data in Python, you will get an exception error. So I loaded it up on the PC that I am using here and I saw this and I went oh, drat.

It's not pretty. It doesn't look like it, but in fact this is a great illustration of the problem because this particular system is not handling the UTF-8 International Characters properly, whereas my other system was. They are both running the same software. They are both running Eclipse. They are both running the same version of Python and yet here we are trying to display this file here and it looks like this, whereas on my Mac, it looked different. And you will see it in a moment, we'll show you here, because we are going to convert it in a way that it will display here and we're going to use Python to do this.

So this is what the file looks like here on this PC and if you are using a different operating system and you actually see the pretty fancy characters, just shh, don't tell anybody and the example will still work just fine. I'll start by making a working copy of containers.py and we'll call this containers-working.py and I'll just close this one and we'll open the working copy and we are going to start by opening the file. I am going to call this file.

fin open and it's called utf8.txt, open it for read, and we are going to set its encoding as utf_8 and this is the exact character string that you need to use. This is meaningful inside of Python and that tells Python that when it's reading this file, that it needs to read it as UTF-8 and ignore whatever the default encoding is on your system, which is almost certainly something different than UTF-8.

UTF-8 is really, really useful encoding. When the Unicode people came up with Unicode, it's this double wide character set that doesn't work right in normal ASCII systems where normal 8-bit wide text context and they tried to get the whole world to adopt it and the whole world didn't adopt it. So they came up with UTF-8, which is a version of Unicode that works in an 8-bit encoding scenario. So the first 127 characters of it works exactly like ASCII does.

So you can set your encodings to UTF-8 safely and it will work just fine with normal ASCII code and then it has this clever system of setting high bits in order to tell the system that it needs a couple more bytes to represent a particular character. And it all happens kind of transparently behind the scenes if your system is properly implementing UTF-8. And these days most web browsers do handle UTF-8, just fine. But a lot of desktop systems don't and this one here that I am working at obviously doesn't. So we are opening this file as UTF-8 and we are telling that the encoding is UTF-8 and for it to ignore its default encoding.

I am going to go ahead and open an output file. I am going to call this utf8.html because we are opening the browser, even though we are not going to put any actual HTML in it. And we'll open that for write. We are going to setup a bytearray, we call it outbytes, initialize the bytearray, with the bytearray constructor. And a bytearray is a mutable list of bytes.

So it doesn't hold any other kind of object but bytes and we'll start iterating through the file for line in file in and then we are going to immediately iterate through the line for character in line because a string is an iterable object and we are going to use the ord built in. if ord of c, and that gives us the integral equivalent of that character.

Is greater than 127. So there is 128 values in UTF-8 that are just normal ASCII and they are 0 through 127. So if this one is higher than 127, we are going to do something special with it. And otherwise, we are just going to append it to outbytes. We are going to say outbytes.append ord of c, like that. And then if it is greater than 127, we are going to do this fancy thing here. outbytes +=.

When you use the addition operator on a mutable container type. It has the same effect as appending, but you can append more than one element at a time this way. So what I am going to do here is I am going to create a bytes object and bytes are immutable arrays of bytes and I am going to encode a string. The constructor of bytes will expect a string within an encoding and so a string is going to be this XML entity with the ampersand and the pound. If you are familiar with XML entities, they look kind of like that, where inside of here you can put a decimal value that will be interpreted as UTF- 16, which is the normal Unicode.

So in there I am going to have a format and I am going to use this format here, 04decimal. I know this is all looking very complicated. I told you this line is where all the magic happens. And I am going to use format ord(c) and then the bytes constructor is going to have an encoding, that encoding is UTF-8, because we use UTF-8 for everything wherever we can. So now what we have done is, if the character is outside of the normal ASCII range, we are going to encode it with this XML entity which can be used in an HTML context and that will allow us to display our fancy little picture.

Otherwise, if it's not greater than 127, if it's in the normal ASCII range, we just append it to our outbyte. So now we have an outbytes bytearray which has all of the characters for our string and now what we need to do is to turn it in to a string. We'll call it outstring and we'll use this string constructor and we'll construct it out of outbytes and guess what? We are going to use encoding = 'utf_8'.

Now all we need to do is to print it to our outfile, print (outstr,file = fout), and we'll print it also to the screen here so we can see it, and we'll print the word Done. So this will read our UTF-8 text from our file that we are not able to read on this system, go ahead and save this so no catastrophe happens. This will read our UTF-8 text file and it'll read it with the UTF-8 encoding and it will write it out to our UTF-8 HTML file, and for the characters that are outside of the normal ASCII range, it's going to replace them with an XML entity and that's really all that we are doing here.

So we saved it, we are going to run it, and it looks like I have got a typo some place here. Yes, right there. I needed an S. That's all right. Save that and we'll run it and there we have our fancy string. So this stuff here got converted to UTF- 16 and these are the Unicode values for each of those fancy characters and now if we refresh our file system because Eclipse doesn't like to do that for us and we open this up in the little browser inside of Eclipse, there is our fancy little picture. And so this is what it looked like in the text file.

This UTF-8 file has some interesting characters in it and so we weren't able to see that on this system and by encoding them with the Unicode XML entities, we are able to see it and there we have it. So the way that we did this is by using a bytearray. The beauty of a bytearray is that you can operate on character data because characters are bytes and a bytearray is mutable, so you can insert things, you can change it up and all we did here we basically used it as an accumulator.

As we went through the string with the bad data in it, if we found an element that we needed to operate on, we pushed all of these characters onto the bytearray, using the bytes constructor and appending them to our outbytes which is a bytearray. Otherwise we just appended the regular character. If it was within the range we just appended the regular character. So these characters here just got appended in the normal way, but these characters, we ended up using these XML entities which represent the Unicode characters and we got our little fancy guy to display just the way that we needed him to display.

So that is a very common use of bytearrays. Bytearrays are a very effective way to do things like this. You will see an example very much like this one in our example code later on in the course.

Show transcript

This video is part of

Image for Python 3 Essential Training
Python 3 Essential Training

87 video lessons · 39672 viewers

Bill Weinman
Author

 
Expand all | Collapse all
  1. 5m 14s
    1. Welcome
      1m 32s
    2. Understanding prerequisites for Python
      2m 4s
    3. Using the exercise files
      1m 38s
  2. 33m 29s
    1. Getting started with "Hello World"
      4m 43s
    2. Selecting code with conditionals
      4m 45s
    3. Repeating code with a loop
      4m 13s
    4. Reusing code with a function
      2m 43s
    5. Creating sequences with generator functions
      2m 46s
    6. Reusing code and data with a class
      4m 39s
    7. Greater reusability with inheritance and polymorphism
      7m 17s
    8. Handling errors with exceptions
      2m 23s
  3. 22m 32s
    1. Installing Python 3 and Eclipse for Windows
      11m 24s
    2. Installing Python 3 and Eclipse for Mac
      11m 8s
  4. 28m 0s
    1. Creating a main script
      3m 27s
    2. Understanding whitespace in Python
      4m 8s
    3. Commenting code
      3m 28s
    4. Assigning values
      3m 37s
    5. Selecting code and values with conditionals
      4m 46s
    6. Creating and using functions
      3m 54s
    7. Creating and using objects
      4m 40s
  5. 31m 23s
    1. Understanding variables and objects in Python
      2m 46s
    2. Distinguishing mutable and immutable objects
      2m 41s
    3. Using numbers
      3m 34s
    4. Using strings
      6m 38s
    5. Aggregating values with lists and tuples
      4m 55s
    6. Creating associative lists with dictionaries
      4m 24s
    7. Finding the type and identity of a variable
      4m 45s
    8. Specifying logical values with True and False
      1m 40s
  6. 9m 42s
    1. Selecting code with if and else conditional statements
      2m 22s
    2. Setting multiple choices with elif
      2m 14s
    3. Understanding other strategies for multiple choices
      2m 38s
    4. Using the conditional expression
      2m 28s
  7. 11m 26s
    1. Creating loops with while
      1m 27s
    2. Iterating with for
      3m 54s
    3. Enumerating iterators
      3m 22s
    4. Controlling loop flow with break, continue, and else
      2m 43s
  8. 23m 28s
    1. Performing simple arithmetic
      2m 14s
    2. Operating on bitwise values
      3m 30s
    3. Comparing values
      3m 32s
    4. Operating on Boolean values
      2m 59s
    5. Operating on parts of a container with the slice operator
      6m 52s
    6. Understanding operator precedence
      4m 21s
  9. 11m 34s
    1. Using the re module
      1m 4s
    2. Searching with regular expressions
      3m 12s
    3. Replacing with regular expressions
      3m 29s
    4. Reusing regular expressions with re.compile
      3m 49s
  10. 9m 10s
    1. Learning how exceptions work
      1m 18s
    2. Handling exceptions
      4m 15s
    3. Raising exceptions
      3m 37s
  11. 23m 1s
    1. Defining functions
      6m 23s
    2. Using lists of arguments
      2m 26s
    3. Using named function arguments
      4m 32s
    4. Returning values from functions
      1m 55s
    5. Creating a sequence with a generator function
      7m 45s
  12. 47m 29s
    1. Understanding classes and objects
      5m 12s
    2. Using methods
      6m 12s
    3. Using object data
      10m 4s
    4. Understanding inheritance
      5m 11s
    5. Applying polymorphism to classes
      7m 13s
    6. Using generators
      9m 48s
    7. Using decorators
      3m 49s
  13. 18m 54s
    1. Understanding strings as objects
      3m 25s
    2. Working with common string methods
      5m 24s
    3. Formatting strings with str.format
      5m 31s
    4. Splitting and joining strings
      2m 49s
    5. Finding and using standard string methods
      1m 45s
  14. 25m 27s
    1. Creating sequences with tuples and lists
      4m 6s
    2. Operating on sequences with built-in methods
      5m 50s
    3. Organizing data with dictionaries
      4m 56s
    4. Operating on character data with bytes and byte arrays
      10m 35s
  15. 11m 46s
    1. Opening files
      2m 4s
    2. Reading and writing text files
      4m 33s
    3. Reading and writing binary files
      5m 9s
  16. 21m 27s
    1. Creating a database with SQLite 3
      6m 56s
    2. Creating, retrieving, updating, and deleting records
      7m 31s
    3. Creating a database object
      7m 0s
  17. 18m 27s
    1. Using standard library modules
      8m 0s
    2. Finding third-party modules
      5m 47s
    3. Creating a module
      4m 40s
  18. 23m 11s
    1. Dealing with syntax errors
      8m 19s
    2. Dealing with runtime errors
      4m 0s
    3. Dealing with logical errors
      4m 22s
    4. Using unit tests
      6m 30s
  19. 19m 56s
    1. Normalizing a database interface
      6m 39s
    2. Deconstructing a database application
      8m 9s
    3. Displaying random entries from a database
      5m 8s
  20. 29s
    1. Goodbye
      29s

Start learning today

Get unlimited access to all courses for just $25/month.

Become a member
Sometimes @lynda teaches me how to use a program and sometimes Lynda.com changes my life forever. @JosefShutter
@lynda lynda.com is an absolute life saver when it comes to learning todays software. Definitely recommend it! #higherlearning @Michael_Caraway
@lynda The best thing online! Your database of courses is great! To the mark and very helpful. Thanks! @ru22more
Got to create something yesterday I never thought I could do. #thanks @lynda @Ngventurella
I really do love @lynda as a learning platform. Never stop learning and developing, it’s probably our greatest gift as a species! @soundslikedavid
@lynda just subscribed to lynda.com all I can say its brilliant join now trust me @ButchSamurai
@lynda is an awesome resource. The membership is priceless if you take advantage of it. @diabetic_techie
One of the best decision I made this year. Buy a 1yr subscription to @lynda @cybercaptive
guys lynda.com (@lynda) is the best. So far I’ve learned Java, principles of OO programming, and now learning about MS project @lucasmitchell
Signed back up to @lynda dot com. I’ve missed it!! Proper geeking out right now! #timetolearn #geek @JayGodbold
Share a link to this course

What are exercise files?

Exercise files are the same files the author uses in the course. Save time by downloading the author's files instead of setting up your own files, and learn by following along with the instructor.

Can I take this course without the exercise files?

Yes! If you decide you would like the exercise files later, you can upgrade to a premium account any time.

Become a member Download sample files See plans and pricing

Please wait... please wait ...
Upgrade to get access to exercise files.

Exercise files video

How to use exercise files.

Learn by watching, listening, and doing, Exercise files are the same files the author uses in the course, so you can download them and follow along Premium memberships include access to all exercise files in the library.


Exercise files

Exercise files video

How to use exercise files.

For additional information on downloading and using exercise files, watch our instructional video or read the instructions in the FAQ.

This course includes free exercise files, so you can practice while you watch the course. To access all the exercise files in our library, become a Premium Member.

Are you sure you want to mark all the videos in this course as unwatched?

This will not affect your course history, your reports, or your certificates of completion for this course.


Mark all as unwatched Cancel

Congratulations

You have completed Python 3 Essential Training.

Return to your organization's learning portal to continue training, or close this page.


OK
Become a member to add this course to a playlist

Join today and get unlimited access to the entire library of video courses—and create as many playlists as you like.

Get started

Already a member?

Become a member to like this course.

Join today and get unlimited access to the entire library of video courses.

Get started

Already a member?

Exercise files

Learn by watching, listening, and doing! Exercise files are the same files the author uses in the course, so you can download them and follow along. Exercise files are available with all Premium memberships. Learn more

Get started

Already a Premium member?

Exercise files video

How to use exercise files.

Ask a question

Thanks for contacting us.
You’ll hear from our Customer Service team within 24 hours.

Please enter the text shown below:

The classic layout automatically defaults to the latest Flash Player.

To choose a different player, hold the cursor over your name at the top right of any lynda.com page and choose Site preferencesfrom the dropdown menu.

Continue to classic layout Stay on new layout
Exercise files

Access exercise files from a button right under the course name.

Mark videos as unwatched

Remove icons showing you already watched videos if you want to start over.

Control your viewing experience

Make the video wide, narrow, full-screen, or pop the player out of the page into its own window.

Interactive transcripts

Click on text in the transcript to jump to that spot in the video. As the video plays, the relevant spot in the transcript will be highlighted.

Are you sure you want to delete this note?

No

Your file was successfully uploaded.

Thanks for signing up.

We’ll send you a confirmation email shortly.


Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

Keep up with news, tips, and latest courses with emails from lynda.com.

Sign up and receive emails about lynda.com and our online training library:

Here’s our privacy policy with more details about how we handle your information.

   
submit Lightbox submit clicked
Terms and conditions of use

We've updated our terms and conditions (now called terms of service).Go
Review and accept our updated terms of service.