Last Updated: Friday 31st May 2013

Remember that a hash is a function that takes a variable length sequence of bytes and converts it to a fixed length sequence. Calculating a hash for a file is always useful when you need to check if two files are identical, or to make sure that the contents of a file were not changed, and to check the integrity of a file when it is transmitted over a network. Sometimes when you download a file on a website, the website will provide the MD5 or SHA checksum, and this is helpful because you can verify if the file downloaded well.

Hashing Algorithms

The most used algorithms to hash a file are MD5 and SHA-1. They are used because they are fast and they provide a good way to identify different files. The hash function only uses the contents of the file, not the name. Getting the same hash of two separating files means that there is a high probability the contents of the files are identical, even though they have different names.

MD5 File Hash in Python

The code is made to work with Python 2.7 and higher (including Python 3.x).

The code above calculates the MD5 digest of the file. The file is opened in rb mode, which means that you are going to read the file in binary mode. This is because the MD5 function needs to read the file as a sequence of bytes. This will make sure that you can hash any type of file, not only text files.

It is important to notice the read function. When it is called with no arguments, like in this case, it will read all the contents of the file and load them into memory. This is dangerous if you are not sure of the file's size. A better version will be:

MD5 Hash for Large Files in Python

If you need to use another algorithm just change the md5 call to another supported function, e.g. SHA1:

SHA1 File Hash in Python

If you need a list of supported hash algorithms in your system use hashlib.algorithms_available. (Only works in Python 3.2 and superior). Finally, for another look into hashing, be sure to checkout the hashing Python strings article.

To Practice: Try this interactive course on the basics of Lists, Functions, Packages and NumPy in Python.

  • andri_ch

    Thanks for your article!

    Under “MD5 File Hash in Python” paragraph, afile.close() is useless because you opened afile using the ‘with’ statement, which calls afile.close() anyway under the hood.

  • Christopher Mann

    Great article! Concise, descriptive, and with compact code examples. Thanks for publishing this.

    • Jackson Cooper

      Thanks Christopher :-).

  • Ali

    Great. Thanks for this article 🙂

  • Mike Ill Kilmer

    Helped me understand how the hashing is being used in your find duplicate files article.

  • Jürgen Erhard

    “while len(buf) > 0”


    “while buf” is how you do that.

    • Arthur Burkart


  • Draqun

    You saved my life bro! Grate article.

  • Spencer Davis

    Great article. thanks 🙂