One of the most important features that have been introduced in Python3 is that the string type is now Unicode and there is a significant difference between text and binary data. When a computer receives data in the form of characters/text it converts it into Unicode. The format of the string encoding is always provided with the string, but if it is missing the interpreter won’t be able to understand it. The standard encoding method is Unicode Transformation Format – 8-bit (UTF-8) decided worldwide but other encoding methods will definitely be encountered.
According to programming concepts byte is a group of 8 bits and these 8 bits represent 256 values. A string is a group of characters and these characters are then encoded to bytes for the machine i.e computer to understand.
In this article, we will learn how to create byte objects in Python3. We shall also look at different methods for encoding and decoding bytes and string. Lastly, we will understand the importance of the conversion of bytes to string and vice versa.
Creating Bytes Object in Python
First, let’s see how to create a bytes object in python. Python has two built-in methods for creating byte objects which are bytes() and bytearray(). The difference between these two methods is that the bytes() method is immutable which means it cannot be changed while bytearray() method is mutable, let us look at some examples for better understanding.
bytes(x, encoding, error)
Let’s look at an example of how to use bytes() method first. If we pass this method an integer as a parameter it returns a byte object with the specified size while if we pass it a string, it returns the string bytes object.
#using string str = "Bytes Object in String" convert = bytes(str,'utf-8') print(convert) print(type(convert)) #using integer convert = bytes(8) print(convert) print(type(convert))
bytearray() method requires a source value which can be an integer or a string as a parameter. other parameters can be the encoding format you want to use ( UTF-8 is a default format in case you don’t provide this parameter) and error which tells what to do if encoding fails. This method returns a new bytes array object. The syntax for using bytearray() method is as follow:
bytearray(x, encoding, error)
Let’s look at an example using the bytearray() method. If we use an integer it returns a byte array with the specified size while string returns the byte array string.
#using string str = "Bytes Object in String" convert = bytearray(str,'utf-8') print(convert) print(type(convert)) #using integer convert = bytearray(8) print(convert) print(type(convert))
Using b before a string also creates a byte object in python. The b that is prefixed before the string stands for byte and is the designated way to represent byte object in python. The syntax of creating a byte object this way is simple:
b"mystring" b''' mystring '''
Using b we can create byte objects using triple or double quotations. Let’s look at an example of how to use this.
print(b"Encoding byte string") print(b''' Encoding byte string ''')
encode() method in python can also be used to convert string to bytes. It encodes the string to the specified encoding. By default “UTF-8” is used and encode method returns a byte object. The parameters used in the encode method are optional. The syntax for encode() method is:
For the first example let’s suppose we have a string that we need to convert to bytes. We will see simple conversion,UTF-8, and UTF-16.
#Simple encoding to bytes str = "$c6" str = str.encode() print(str) #special characters to utf-8 encoded = 'τoρνoς'.encode('utf-8') print(encoded) #special characters to utf-16 encoded = 'τoρνoς'.encode('utf-16') print(encoded)
The decode() method in python converts bytes to the original string. The default encoding assumed is “UTF-8”. Decode() method is the opposite of encode() method. The syntax used for decode() method is:
#Simple decoding to string byte = b'$c6'.decode() print(byte) #utf-8 to string decoded = b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-8') print(decoded) #utf-16 to string decoded = b'\xff\xfe\xc4\x03o\x00\xc1\x03\xbd\x03o\x00\xc2\x03'.decode('utf-16') print(decoded)
Encoding Newline in Bytes
We have seen how encoding works when we have a string input. Suppose the string we have has multiple words on new lines so let’s look at how encoding shall work in this scenario.
str_multiline = ''' Python 2.7 Python 3.6 Python 3.7 ''' print(str_multiline.encode('utf-8'))
As you can see in the output the newline is shown using “\n”.
Splitting Bytes in Newlines
In the example above we saw a string that had newlines been converted to bytes. Let’s now take a look at how we can convert bytes to string keeping the formatting intact.
bytes_multiline = b'\nPython 2.7\nPython 3.6\nPython 3.7\n' bytes_multiline_str = bytes_multiline.decode('utf-8') print(bytes_multiline_str) print(type(bytes_multiline_str))
Decoding bytes using UTF-8 parameter replaces the “\n” with a newline also the type returned using the decoded method is a string. This is one way to split bytes in newlines let’s take a look at another way this can be achieved.
bytes_multiline = b'\nPython 2.7\nPython 3.6\nPython 3.7\n' for word in bytes_multiline.decode('utf-8').split("\n"): print(word)
This method loops over each character and splits each word on the basis of “\n” and prints it accordingly. Decoding is done on a character level.
Writing Bytes to File
First, we need to understand what the “File I/O Operations” actually work like. There are different access modes in the open() method which includes r,rb+,w,wb+. The r and rb mode means that file is opening in read-only mode while rb means that file is opening in binary format in read-only mode.
Suppose we want to write bytes to a file in python. The mode for writing is “w” but since we want to write in binary format the mode “wb” is used.
bytes_input = b"0x41x420x43xcfx84oxcf" file = open("bytes.txt", "wb") #opening file in binary form file.write(bytes_input) file.close()
Reading Bytes from File
As we saw in writing bytes it is important to understand “File I/O Operations”. In this example, we shall assume we have a text file that will be read in binary mode. The sample.txt file contains a single string “Hello”. Let’s see how to output each character as a byte.
with open("sample.txt", "rb") as file: byte = file.read(1) while byte: print(byte) byte = file.read(1)
Each character is read and printed as a byte from the text file until it reaches the end of the line.
Importance of Bytes and String Conversion
As we know bytes can be understood by computers directly. This helps in saving data on the disk immediately unlike strings that require being encoded and then saved.
One of the most popular applications nowadays of conversion of strings to bytes is in Machine Learning. Machine Learning models are stored in pickle files that are encoded in bytes.
Subscribe to our newsletter
Get new tips in your inbox automatically. Subscribe to our newsletter!