This article is part of in the series
Published: Friday 28th February 2025

how to convert files with pandoc and python

Pandoc is a well-known, powerful, and flexible document conversion tool that lets you convert files between different markup formats like Markdown, HTML, LaTeX, and PDF. It is commonly used to publishing workflows, academic writing, and software documentation.

Today, let us take you through this tool, how it works, its key features, how to use it with Python effectively.

What is Pandoc?

Pandoc is an open-source command-line tool that enables format conversion between various text-based document formats. It supports a vast range of input and output formats, making it the "only-tool-you-need" tool for document conversion.

Supported Input and Output Pandoc Formats

Here are the input file formats supported by it:

  • Markdown
  • LaTeX
  • HTML
  • Word (.docx)
  • EPUB

It can give you output files in these following formats:

  • PDF
  • HTML
  • LaTeX
  • Word (.docx)
  • EPUB
  • RTF

How to Install Pandoc

Here are the instructions to install Pandoc on different operating systems.

Windows

  1. Download the Pandoc installer from Pandoc’s official site.
  2. Run the installer and follow the setup instructions.

macOS (Using Homebrew)

Execute this command in Terminal window to install Pandoc on macOS:

brew install pandoc

Debian-based Linux Operating System

sudo apt install pandoc

Basic Pandoc Commands

After installing, it can be used via the command line to convert documents.

How to Convert Markdown to HTML with Pandoc

Execute this command to convert a markdown file to HTML.

pandoc input.md -o output.html

How to Convert Markdown to PDF with Pandoc

To convert a markdown file to PDF, run this command:

pandoc input.md -o output.pdf

How to Convert Word Document to Markdown

Execute this command to convert a DOCX file to markdown:

pandoc input.docx -o output.md

How to Convert Markdown to LaTeX

To convert Markdown to LaTeX, execute this command:

pandoc input.md -o output.tex

How to Convert EPUB to Word with Pandoc

Execute this command to convert a EPUB file to DOCX format:

pandoc input.epub -o output.docx

How to Use Pandoc with Python

Pandoc can be integrated with Python using the "subprocess" module or the "pypandoc" library.

Here is how you can use "subprocess" to run commands:

import subprocess

def convert_md_to_pdf(input_file, output_file):
command = ["pandoc", input_file, "-o", output_file]
subprocess.run(command, check=True)

convert_md_to_pdf("input.md", "output.pdf")

To use "pypandoc" integration, use this script:

import pypandoc

output = pypandoc.convert_file("input.md", "pdf", outputfile="output.pdf")
assert output == ""

Advanced Usage and Features

How to Add Metadata

Here is how you can add metadata like (title, author, and date) to a file:

pandoc input.md -o output.pdf --metadata title="My Document" --metadata author="Python Central"

How to Customize Output with Templates

Pandoc allows you to define custom templates for consistent formatting. For example, you can use a LaTeX template for PDF conversion.

pandoc input.md -o output.pdf --template=custom.tex

How to Create a Table of Contents

It is easy to create a TOC for any file format. Here is how you can create one for a markdown file:

pandoc input.md -o output.pdf --toc

How to Apply CSS for HTML Output

To add a CSS style to a markdown file and get the output as HTML, use this command:

pandoc input.md -o output.html --css=style.css

How to Merge Multiple Files

Here is how to merge multiple files with Pandoc:

pandoc file1.md file2.md -o merged.pdf

Basic Python Scripts to use Pandoc

Here is a cheatsheet for you for all quick references:

Convert Markdown to HTML with Python

import subprocess

# This to define the input and output files
input_file = "example.md"
output_file = "example.html"

# This step is to run the Pandoc command
subprocess.run(["pandoc", input_file, "-o", output_file])

print(f"Converted {input_file} to {output_file}")

Convert Markdown to PDF with Python

import subprocess

# This step is to define the input and output files
input_file = "example.md"
output_file = "example.pdf"

# This step runs the Pandoc command
subprocess.run(["pandoc", input_file, "-o", output_file"])

print(f"Converted {input_file} to {output_file}")

These examples should help you get started with automating document conversion using Pandoc and Python.

Wrapping Up

Pandoc is a great tool for developers, writers, and researchers who need to convert, format, and publish documents efficiently. By mastering Pandoc’s command-line options, you can streamline your workflow and ensure high-quality document outputs.

Related Article

A Beginner’s Guide to File Sharing with Python