Zydecx's Site

Debug code, debug life, debug today!

How to Convert Documents With Pandoc

Time: , by zydecx

What's Pandoc

According to official site, Pandoc is your swiss-army knify to convert files from one markup format into another.

Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, EPUB, or Haddock markup to

  • HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides.
  • Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML
  • Ebooks: EPUB version 2 or 3, FictionBook2
  • Documentation formats: DocBook, GNU TexInfo, Groff man pages, Haddock markup
  • Page layout formats: InDesign ICML
  • Outline formats: OPML
  • TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides
  • PDF via LaTeX
  • Lightweight markup formats: Markdown, reStructuredText, AsciiDoc, MediaWiki markup, DokuWiki markup, Emacs Org-Mode, Textile
  • Custom formats: custom writers can be written in lua.

How to Install Pandoc

As for Windows users, download a package installer at pandoc's download page and install on your computer. After that, run pandoc -v in command prompt to verify if it is correctly installed.

NOTE: The default package doesn't support PDF output, additional tool LaTeX is needed. MiKTeX is recommended by official site. However, it does have some issues with Chinese characters exporting. In this case, CTeX Full instead is a better choice.

For users of Mac OS X or Linux, refer to offcial site for more information about installation.

How to Convert Document With Pandoc

  1. Convert a webpage(html) to docx

    pandoc -f html -t docx -o savefile.docx http://www.baidu.com
  2. Convert a html to markdown

    pandoc -f html -t markdown -o savefile.md fromfile.html
  3. Convert a html to pdf

    pandoc -f html -t latex -o savefile.pdf fromfile.html
    pandoc -o savefile.pdf fromfile.html
  4. Convert a markdown to mediawiki

    pandoc -f markdown_github -t mediawiki -o savefile.wiki fromfile.md

How to Export Document with Chinese Characters to PDF

If your task is all about documents with English characters only, you can skip this section. This part talks about problems of exporting documents with Chinese characters to PDF.

  1. Install CTeX Full instead of MiKTeX

  2. Define template

    Export Pandoc standard template using the following command:

    pandoc -D latex > template.tex

    Open the template template.tex and find phrase % if luatex or xelatex, add the code below after this phrase.

    % SUPPORT for Chinese
    \usepackage[boldfont,slantfont,CJKsetspaces,CJKchecksingle]{xeCJK}
    \usepackage{fontspec,xltxtra,xunicode}
    \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
    
    \punctstyle{quanjiao}
    \setCJKmainfont{SimSun} 
    \setCJKsansfont{KaiTi}
    \setCJKmonofont{SimSun}

    Note:
    In my version of Pandoc(1.13.2), below is the default code after phrase % if luatex or xelatex.

    20. \else % if luatex or xelatex
    21. \ifxetex
    22.   \usepackage{mathspec}
    23.   \usepackage{xltxtra,xunicode}
    24. \else
    25.   \usepackage{fontspec}
    26. \fi
    27. 
    28. \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
    29. \newcommand{\euro}{€}

    Errors occur if just add code after Line#20. Finally, it turns out to be OK to add the code at Line#27.

  3. Export documents

    Export documents to PDf using the following command:

    pandoc -o savefile.pdf fromfile.html --latex-engine=xelatex --template=template.tex

    template.tex is just the template modified in stage 2.

Thanks to this blog for solving the problem.

According to another blog, it's also possible to download pm-template.latex and use this template to export documents to PDF. For this template, the only thing needs to be noticed is, replace LiHei Pro to a Chinese font you have installed in your machine.

Pandoc's Markdown

Pandoc's author is really proud of its extension of markdown, or he wouldn't put 2/3 of the document talking about it.

How to Produce Slide Shows with Pandoc

It's fantastic to find that simple and concise slides can be made by Pandoc. One could keep collecting knowledges while occasionly transform them to slides to share with other people, without put so much time considering how to write PPT.


This is a magic phrase. You CANNOT see it(I'll really FULE you if you do that), but it does work. Why? You may feel confused. OK, at least it doesn't afftect your experience and it works. That is what we call MAGICE!