This article is part 3 of 4 in the series Encoding and Decoding Python Strings Series

Prerequisites

Knowledge of  the following is required:

  1. Python 3
  2. Basic Python data structures like dictionary
  3. File operations in Python

Introduction

Literally, the term pickle means storing something in a saline solution. Only here, instead of vegetables its objects. Not everything in life can be seen as 0s and 1s (gosh! philosophy), but pickling helps us achieve that since it converts any kind of complex data to 0s and 1s (byte streams). This process can be referred to as pickling, serialization, flattening or marshalling. The resulting byte stream can also be converted back into Python objects by a process known as Unpickling.

Why Pickle?

Since we are dealing with binary, the data is not written but dumped and similarly, the data is not read, it is loaded. For example, when you play a game like 'Dave' and you reach a certain level, you would want to save it right? As you know there are various attributes to this game like, health, gems collected etc. So when you save your game, say at level 7 when you have one heart for health and thirty hundred points, an object is created from a class Dave with these values. When you click the 'Save' button, this object is serialized and saved or in other words pickled. Needless to say, when you restore a saved game, you will be loading data from its pickled state thus unpickling it.

The real world uses of Pickling and Unpickling are widespread as they allow you to easily send data from one server to another, and store it in a file or database.

WARNING: Never unpickle data received from an untrusted source as this may pose some serious security risks. The Pickle module is not capable of knowing or raising errors while pickling malicious data.

Pickling and Unpickling can be used only if the corresponding module Pickle is imported. You can do this by using the following command:

import pickle

Pickle at Work

Now let’s see a simple example of how to pickle a dictionary.

import pickle
emp = {1:"A",2:"B",3:"C",4:"D",5:"E"}
pickling_on = open("Emp.pickle","wb")
pickle.dump(emp, pickling_on)
pickling_on.close()

Note the usage of “wb” instead of “w” as all the operations are done using bytes. At this point, you can go and open the Emp.pickle file in the current working directory using a Notepad and see how the pickled data looks.

So, now that the data has been pickled, let’s work on how to unpickle this dictionary.

pickle_off = open("Emp.pickle","rb")
emp = pickle.load(pickle_off)
print(emp)

Now you will get the employees dictionary as we initialized earlier. Note the usage of “rb” instead of “r” as we are reading bytes. This is a very basic example, be sure to try more on your own.

If you want to get a byte string containing the pickled data instead of a pickled representation of obj, then you need to use dumps. Similarly to read pickled representation of objects from byte streams you should use loads.

Data stream format

The data stream format is referred to as the protocol which specifies the output format of the pickled data. There are several protocol versions that are available. You must be aware of the protocol version to avoid compatibility issues.

  • Protocol version 0 - the original text-based format that is backwards compatible with earlier versions of Python.
  • Protocol version 1 -  an old binary format which is also compatible with earlier versions of Python.
  • Protocol version 2 - introduced in Python 2.3 and provides efficient picking of classes and instances,
  • Protocol version 3 - introduced in Python 3.0 but it is not backwards compatible.
  • Protocol version 4 - added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations.

Note that the protocol version is saved as a part of the pickle data format. However, to unpickle data in a specific protocol, there are provisions to specify it while using the dump() command.

To know the protocol used, use the following command after importing the pickle library. This will return the highest protocol being used.                                                                     

pickle.HIGHEST_PROTOCOL

Exceptions

Some of the common exceptions to look out for:

  1. Pickle.PicklingError: This exception is raised when you are trying to pickle an object that doesn’t support pickling.
  2. Pickle.UnpicklingError: This exception is raised when a file contains corrupted data.
  3. EOFError: This exception is raised when the end of file is detected.

Advantages

  1. Helps in saving complicated data.
  2. Quite easy to use, doesn’t require several lines of code and hence not bulky.
  3. Saved data is not so readable hence provides some data security.

Disadvantages

  1. Non-Python programs may not be able to reconstruct pickled Python objects.
  2. Security risks in unpickling data from malicious sources.

Pickling is considered an advanced topic so keep practicing and learning to get a hang of it. Be sure to check out these interesting topics related to Pickling - Pickler, Unpickler, CPickle. Happy Pythoning!

To Practice: Try this interactive course on the basics of Lists, Functions, Packages and NumPy in Python.