Last Updated: Wednesday 14th August 2013

When you use a scripting language like Python, one thing you will find yourself doing over and over again is walking a directory tree, and processing files. While there are many ways to do this, Python offers a built-in function that makes this process a breeze.

Basic Python Directory Traversal

Here's a really simple example that walks a directory tree, printing out the name of each directory and the files contained:

os.walk takes care of the details, and on every pass of the loop, it gives us three things:

  • dirName: The next directory it found.
  • subdirList: A list of sub-directories in the current directory.
  • fileList: A list of files in the current directory.

Let's say we have a directory tree that looks like this:

+--- [subdir1]
|     |
|     +--- file1a.txt
|     +--- file1b.png
+--- [subdir2]
+--- file2a.jpeg
+--- file2b.html

The code above will produce the following output:

Changing the Way the Directory Tree is Traversed

By default, Python will walk the directory tree in a top-down order (a directory will be passed to you for processing), then Python will descend into any sub-directories. We can see this behaviour in the output above; the parent directory (.) was printed first, then its 2 sub-directories.

Sometimes we want to traverse the directory tree bottom-up (files at the very bottom of the directory tree are processed first), then we work our way up the directories. We can tell os.walk to do this via the topdown parameter:

Which gives us this output:

Now we get the files in the sub-directories first, then we ascend up the directory tree.

Selectively Recursing Into Sub-Directories

The examples so far have simply walked the entire directory tree, but os.walk allows us to selectively skip parts of the tree.

For each directory os.walk gives us, it also provides a list of sub-directories (in subdirList). If we modify this list, we can control which sub-directories os.walk will descend into. Let's tweak our example above so that we skip the first sub-directory.

This gives us the following output:

We can see that the first sub-directory (subdir1) was indeed skipped.

This only works when the directory is being traversed top-down since for a bottom-up traversal, sub-directories are processed before their parent directory, so trying to modify the subdirList would be pointless since by that time, the sub-directories would have already been processed!

It's also important to modify the subdirList in-place, so that the code calling us will see the changes. If we did something like this:

... we would create a new list of sub-directories, one that the calling code wouldn't know about.

For a more comprehensive tutorial on Python's os.walk method, checkout the recipe Recursive File and Directory Manipulation in Python. Or to take a look at traversing directories in another way (using recursion), checkout the recipe Recursive Directory Traversal in Python: Make a list of your movies!.

To Practice: Try this interactive course on the basics of Lists, Functions, Packages and NumPy in Python.

  • jaime

    Good article.. But, please use lower_case_with_underscores for variable names…

    See the style guide (PEP8) –

  • @Jaime, its just a convention, not a necessity, I guess this was to show the usage, keeping simplicity at its best. Cheers..Thumbs Up for the article, well explained.

  • Pingback: Processing music tags with Python « Accidents Happen()

  • Pingback: Recursive Directory Traversal in Python: Make a list of your movies! - Python Central()

  • Carter Jim

    great article!

    • Jackson Cooper


  • Henry Lin

    Good tutorial =]

    • John Scarborough

      @Henry Lin, from Shreveport?

      • Henry Lin

        Sorry, you’re probably thinking of a different Henry Lin.

        • Akhil Tandon

          From UIUC

  • Pingback: Python – My FAQ « MaRZ VIX()

  • Tim33Swinson

    Practical post – Apropos , if your business needs to fill out a CA DMV REG 138 , We discovered a blank form here

  • Ninfa Shock

    My business partners were looking for DS-86 a few days ago and learned about a web service that has a searchable database . If others need DS-86 too , here’s

  • Donny

    this might be stupid, but how do you actually run this scrip? i did it in Python IDE and clicked ‘run module’ but nothing happened.

  • Very very helpful … BUT the first example does not show, and should show what to do to get the info embedded in the var “subdirList”.