How to Convert Documents With Pandoc
What's Pandoc
According to official site, Pandoc is your swiss-army knify to convert files from one markup format into another.
Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, EPUB, or Haddock markup to
- HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides.
- Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML
- Ebooks: EPUB version 2 or 3, FictionBook2
- Documentation formats: DocBook, GNU TexInfo, Groff man pages, Haddock markup
- Page layout formats: InDesign ICML
- Outline formats: OPML
- TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides
- PDF via LaTeX
- Lightweight markup formats: Markdown, reStructuredText, AsciiDoc, MediaWiki markup, DokuWiki markup, Emacs Org-Mode, Textile
- Custom formats: custom writers can be written in lua.
How to Install Pandoc
As for Windows users, download a package installer at pandoc's download page and install on your computer. After that, run pandoc -v
in command prompt to verify if it is correctly installed.
NOTE: The default package doesn't support PDF output, additional tool LaTeX is needed. MiKTeX is recommended by official site. However, it does have some issues with Chinese characters exporting. In this case, CTeX Full instead is a better choice.
For users of Mac OS X or Linux, refer to offcial site for more information about installation.
How to Convert Document With Pandoc
Convert a webpage(html) to docx
pandoc -f html -t docx -o savefile.docx http://www.baidu.com
Convert a html to markdown
pandoc -f html -t markdown -o savefile.md fromfile.html
Convert a html to pdf
pandoc -f html -t latex -o savefile.pdf fromfile.html pandoc -o savefile.pdf fromfile.html
Convert a markdown to mediawiki
pandoc -f markdown_github -t mediawiki -o savefile.wiki fromfile.md
How to Export Document with Chinese Characters to PDF
If your task is all about documents with English characters only, you can skip this section. This part talks about problems of exporting documents with Chinese characters to PDF.
Install CTeX Full instead of MiKTeX
Define template
Export Pandoc standard template using the following command:
pandoc -D latex > template.tex
Open the template template.tex and find phrase
% if luatex or xelatex
, add the code below after this phrase.% SUPPORT for Chinese \usepackage[boldfont,slantfont,CJKsetspaces,CJKchecksingle]{xeCJK} \usepackage{fontspec,xltxtra,xunicode} \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase} \punctstyle{quanjiao} \setCJKmainfont{SimSun} \setCJKsansfont{KaiTi} \setCJKmonofont{SimSun}
Note:
In my version of Pandoc(1.13.2), below is the default code after phrase% if luatex or xelatex
.20. \else % if luatex or xelatex 21. \ifxetex 22. \usepackage{mathspec} 23. \usepackage{xltxtra,xunicode} 24. \else 25. \usepackage{fontspec} 26. \fi 27. 28. \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase} 29. \newcommand{\euro}{€}
Errors occur if just add code after Line#20. Finally, it turns out to be OK to add the code at Line#27.
Export documents
Export documents to PDf using the following command:
pandoc -o savefile.pdf fromfile.html --latex-engine=xelatex --template=template.tex
template.tex is just the template modified in stage 2.
Thanks to this blog for solving the problem.
According to another blog, it's also possible to download pm-template.latex and use this template to export documents to PDF. For this template, the only thing needs to be noticed is, replace LiHei Pro to a Chinese font you have installed in your machine.
Pandoc's Markdown
Pandoc's author is really proud of its extension of markdown, or he wouldn't put 2/3 of the document talking about it.
How to Produce Slide Shows with Pandoc
It's fantastic to find that simple and concise slides can be made by Pandoc. One could keep collecting knowledges while occasionly transform them to slides to share with other people, without put so much time considering how to write PPT.