Pandoc is a well-known, powerful, and flexible document conversion tool that lets you convert files between different markup formats like Markdown, HTML, LaTeX, and PDF. It is commonly used to publishing workflows, academic writing, and software documentation.
Today, let us take you through this tool, how it works, its key features, how to use it with Python effectively.
What is Pandoc?
Pandoc is an open-source command-line tool that enables format conversion between various text-based document formats. It supports a vast range of input and output formats, making it the "only-tool-you-need" tool for document conversion.
Supported Input and Output Pandoc Formats
Here are the input file formats supported by it:
- Markdown
- LaTeX
- HTML
- Word (.docx)
- EPUB
It can give you output files in these following formats:
- HTML
- LaTeX
- Word (.docx)
- EPUB
- RTF
How to Install Pandoc
Here are the instructions to install Pandoc on different operating systems.
Windows
- Download the Pandoc installer from Pandoc’s official site.
- Run the installer and follow the setup instructions.
macOS (Using Homebrew)
Execute this command in Terminal window to install Pandoc on macOS:
brew install pandoc
Debian-based Linux Operating System
sudo apt install pandoc
Basic Pandoc Commands
After installing, it can be used via the command line to convert documents.
How to Convert Markdown to HTML with Pandoc
Execute this command to convert a markdown file to HTML.
pandoc input.md -o output.html
How to Convert Markdown to PDF with Pandoc
To convert a markdown file to PDF, run this command:
pandoc input.md -o output.pdf
How to Convert Word Document to Markdown
Execute this command to convert a DOCX file to markdown:
pandoc input.docx -o output.md
How to Convert Markdown to LaTeX
To convert Markdown to LaTeX, execute this command:
pandoc input.md -o output.tex
How to Convert EPUB to Word with Pandoc
Execute this command to convert a EPUB file to DOCX format:
pandoc input.epub -o output.docx
How to Use Pandoc with Python
Pandoc can be integrated with Python using the "subprocess" module or the "pypandoc" library.
Here is how you can use "subprocess" to run commands:
import subprocess def convert_md_to_pdf(input_file, output_file): command = ["pandoc", input_file, "-o", output_file] subprocess.run(command, check=True) convert_md_to_pdf("input.md", "output.pdf")
To use "pypandoc" integration, use this script:
import pypandoc output = pypandoc.convert_file("input.md", "pdf", outputfile="output.pdf") assert output == ""
Advanced Usage and Features
How to Add Metadata
Here is how you can add metadata like (title, author, and date) to a file:
pandoc input.md -o output.pdf --metadata title="My Document" --metadata author="Python Central"
How to Customize Output with Templates
Pandoc allows you to define custom templates for consistent formatting. For example, you can use a LaTeX template for PDF conversion.
pandoc input.md -o output.pdf --template=custom.tex
How to Create a Table of Contents
It is easy to create a TOC for any file format. Here is how you can create one for a markdown file:
pandoc input.md -o output.pdf --toc
How to Apply CSS for HTML Output
To add a CSS style to a markdown file and get the output as HTML, use this command:
pandoc input.md -o output.html --css=style.css
How to Merge Multiple Files
Here is how to merge multiple files with Pandoc:
pandoc file1.md file2.md -o merged.pdf
Basic Python Scripts to use Pandoc
Here is a cheatsheet for you for all quick references:
Convert Markdown to HTML with Python
import subprocess # This to define the input and output files input_file = "example.md" output_file = "example.html" # This step is to run the Pandoc command subprocess.run(["pandoc", input_file, "-o", output_file]) print(f"Converted {input_file} to {output_file}")
Convert Markdown to PDF with Python
import subprocess # This step is to define the input and output files input_file = "example.md" output_file = "example.pdf" # This step runs the Pandoc command subprocess.run(["pandoc", input_file, "-o", output_file"]) print(f"Converted {input_file} to {output_file}")
These examples should help you get started with automating document conversion using Pandoc and Python.
Wrapping Up
Pandoc is a great tool for developers, writers, and researchers who need to convert, format, and publish documents efficiently. By mastering Pandoc’s command-line options, you can streamline your workflow and ensure high-quality document outputs.
Related Article