Last Updated: Wednesday 14th August 2013

A hash function is a function that takes input of a variable length sequence of bytes and converts it to a fixed length sequence. It is a one way function. This means if f is the hashing function, calculating f(x) is pretty fast and simple, but trying to obtain x again will take years. The value returned by a hash function is often called a hash, message digest, hash value, or checksum. Most of the time a hash function will produce unique output for a given input. However depending on the algorithm, there is a possibility to find a collision due to the mathematical theory behind these functions.

Now suppose you want to hash the string "Hello Word" with the SHA1 Function, the result is `0a4d55a8d778e5022fab701977c5d840bbc486d0`.

Hash functions are used inside some cryptographic algorithms, in digital signatures, message authentication codes, manipulation detection, fingerprints, checksums (message integrity check), hash tables, password storage and much more. As a Python programmer you may need these functions to check for duplicate data or files, to check data integrity when you transmit information over a network, to securely store passwords in databases, or maybe some work related to cryptography.

I want to make clear that hash functions are not a cryptographic protocol, they do not encrypt or decrypt information, but they are a fundamental part of many cryptographic protocols and tools.

Some of the most used hash functions are:

• MD5: Message digest algorithm producing a 128 bit hash value. This is widely used to check data integrity. It is not suitable for use in other fields due to the security vulnerabilities of MD5.
• SHA: Group of algorithms designed by the U.S's NSA that are part of the U.S Federal Information processing standard. These algorithms are used widely in several cryptographic applications. The message length ranges from 160 bits to 512 bits.

The `hashlib` module, included in The Python Standard library is a module containing an interface to the most popular hashing algorithms. `hashlib` implements some of the algorithms, however if you have OpenSSL installed, `hashlib` is able to use this algorithms as well.

This code is made to work in Python 3.2 and above. If you want to run this examples in Python 2.x, just remove the `algorithms_available` and `algorithms_guaranteed` calls.

First, import the `hashlib` module:

Now we use `algorithms_available` or `algorithms_guaranteed` to list the algorithms available.

The `algorithms_available` method lists all the algorithms available in the system, including the ones available trough OpenSSl. In this case you may see duplicate names in the list. `algorithms_guaranteed` only lists the algorithms present in the module. `md5, sha1, sha224, sha256, sha384, sha512` are always present.

## MD5

The code above takes the "Hello World" string and prints the HEX digest of that string. `hexdigest` returns a HEX string representing the hash, in case you need the sequence of bytes you should use `digest` instead.

It is important to note the "b" preceding the string literal, this converts the string to bytes, because the hashing function only takes a sequence of bytes as a parameter. In previous versions of the library, it used to take a string literal. So, if you need to take some input from the console, and hash this input, do not forget to encode the string in a sequence of bytes:

## Using OpenSSL Algorithms

Now suppose you need an algorithm provided by OpenSSL. Using `algorithms_available`, we can find the name of the algorithm you want to use. In this case, "DSA" is available on my computer. You can then use the `new` and `update` methods:

## Practical example: hashing passwords

In the following example we are hashing a password in order to store it in a database. In this example we are using a salt. A salt is a random sequence added to the password string before using the hash function. The salt is used in order to prevent dictionary attacks and rainbow tables attacks. However, if you are making real world applications and working with users' passwords, make sure to be updated about the latest vulnerabilities in this field. I you want to find out more about secure passwords please refer to this article

To Practice: Try this interactive course on the basics of Lists, Functions, Packages and NumPy in Python.

### Related Articles

• Dre Peters

Andres, thanks for your wonderful tutorial, I learnt some things in it and I’ve had to edit to taste.

• Kristoffer Legind

Very nice overview, cudos.
There is also pyBlake and pyStein that provide nice alternatives.

• is that can be use to jython too?

• Python Lover

hehhe, u wanna say username or password not matched 😛 we dont want to give out any hints now !!

• alain

can you salt using a static value? and if so how would i do that ?

• You could, but using the same salt for everything is the same as not using a salt. Salts should be randomly generated every time. This article uses `uuid.uuid4()` to do that. There’s other ways too – `random`, `time`, `/dev/random`, etc.

• Akshay

How to delete the image from the directory if its hash value is same

• `os.remove()`

• Akshay

Its not working for me ….Here is my code i want to delete same hash value data from the disk

import hashlib

import os

k=open(‘C:HomepandpKoala.jpg’)

j=open(‘C:HomepandpKoala1.jpg’)

#dic={l:k.name,m:j.name}

if(l==m):

os.remove(k)

or tell me how to do it with the help of dictionary

• `if (l == m) os.remove(dic[m])`

• Akshay

but before that i have to the close it otherwise it will show error

• Akshay

But before that i have to close the file otherwise it will show error

• If you store hashes as keys in a dictionary, duplicate keys will be missing. e.g. `dic = {'hash': '123', 'hash': '234'}` evaluates to `{'hash': '234'}`.

Although you could use a dictionary for each file to check, containing the filename and hash. Here’s an example (untested):

``` from os import chdir from hashlib import md5```

``` def file_hash(filepath): with open(filepath) as f: return md5(f.read()).hexdigest() chdir(r'C:Homepandp') file1 = {'hash': file_hash('Koala.jpg'), 'filename': 'Koala.jpg'} file2 = {'hash': file_hash('Koala1.jpg'), 'filename': 'Koala1.jpg'} ```

```if file1['hash'] == file2['hash']: os.remove(file2['filename']) ```

Note the `os.remove()` call should be indented (Disqus is playing up).

• Akshay

Got it ……..thanks a lot jackson !!!

• Akshay

Jackson i need to ask one more thing that if i want to scan whole directory and remove duplicate .jpg file from that directory …. for each image hash key and its value ie path Then how to tackle with that one.. because dictionary one contain unique values

• Depends how you want to handle deleting the duplicates. The simplest way would be to simply store a list of hashes, adding them as you find & hash each file. If you come across a file where the hash is already in the list, simply delete the duplicate file. e.g (pseudocode):

``````
hashes = []
for filename in walk():
file_hash = hash(filename)
# Duplicate!
if file_hash in hashes:
os.remove(filename)
else:
hashes.append(file_hash)
``````
• Akshay

ok …. thanks for the pseudocode jackson !!!

• Akshay

Its not working for me ….Here is my code i want to delete same hash value data from the disk

import hashlib

import os

k=open(‘C:HomepandpKoala.jpg’)

j=open(‘C:HomepandpKoala1.jpg’)

#dic={l:k.name,m:j.name}

if(l==m):

os.remove(k.name)

or tell me how to do it with the help of dictionary

• Akshay

Thanks Jackson for your reply …. after closing that file it worked for me !!!!

• Dimsum

Hi ! I’ve tried you method, but it seems that it doesn’t work with special char.

The inpout is : &é”‘(-è_çà)=

and I get an error :

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’xe9′ in position 1: ordinal not in range(128)

Do you know how to avoid such a thing ?

• Derek

Is there any particular reason you didnt just return a tuple for hash_password?

• agrim khanna

How would we convert a hash back to the original string in python ?

• guest

Hashes are one-way functions. The only way to reverse a hash is to try all combinations of what the original input could be.

• Gleb Zverinskiy

Spelling error in the sentence under the first picture.

Now suppose you want to hash the string “Hello Word” with the SHA1 Function, the result is 0a4d55a8d778e5022fab701977c5d840bbc486d0.

Missing “l”

• how to turn it back to string again?