LaTeX to Word Conversion with pandoc, by Jochen Autschbach

[Home] [GRP] [RES] [PUB] [ETC] [LEC] [S/W]

LaTeX to Word Conversion with Pandoc

(see an update re. conversion directly from PDF at the bottom of this page)

Suppose you have a LaTeX article, with plenty of literature references included via BibTeX, and you need to convert it to MS Word. I found that conversion with pandoc requires the least amount of time, meaning the least amount of work spent on ﬁxing problems in the resulting Word ﬁle. For some comments about LaTeX vs. word processors, see this page.

One issue that I came across is that using LaTeX Math mode in-line, via $...$ , triggers the creation of equation editor objects in the converted Word ﬁle. I prefer to keep the resulting Word ﬁle simple, with sub-and superscript formatting in Word instead of an equation object, if possible. Therefore, I deﬁne

   \renewcommand{\sup}[1]{\textsuperscript{#1}} 
  \newcommand{\sub}[1]{\textsubscript{#1}}

and use \sub{...} and \sup{...} for sub- and superscripts outside of display equations. This will translate into simple sub-/superscript formatting in the Word ﬁle, instead of an equation object. [A few comments are in order: (i) There is of course no need to deﬁne the \sub{...} and \sup{...} macros, if you don’t mind writing out the full names of the commands every time. (ii) If you don’t mind equation editor objects for simple sub-/super-scripts, just use LaTeX math mode. (iii) Depending on other packages that you load, \sup may have to be deﬁned newly, or re-deﬁned.]

Also to avoid unnecessary equation editor objects, I deﬁne macros to generate Greek characters and selected other special characters via Unicode instead of using LaTeX math mode. Examples are shown in the LaTeX template for conversion with pandoc provided here. You can also use the Unicode characters directly in the text if your editor has a simple way of inserting them. If you compile the LaTeX ﬁle with xelatex, for example, using a font such as Cambria as the base font, the Greek characters will show up properly both in the PDF generated by xelatex as well as in the converted Word ﬁle. lualatex should work the same way, but I haven’t tested that.

Key to a successful conversion is that you keep the LaTeX code plain and don’t load any convenience packages that aren’t supported by pandoc.

I provide here

a LaTeX template for conversion to Word via pandoc, along with the bib ﬁle used
the corresponding PDF ﬁle generated by xelatex
the converted Word ﬁle
the corresponding PDF saved with Word

The workﬂow is explained in the preamble of the LaTeX ﬁle. I repeat the key steps here. Of course, steps 1 and 2 only have to be done the ﬁrst time, or when there is an update for pandoc or pandoc-crossref.

Install pandoc: https://pandoc.org
Install a matching version of pandoc-crossref: https://github.com/lierdakil/pandoc-crossref/releases
For your literature citations, download a suitable citation style ﬁle from the Zotero style library and put the cls ﬁle in the same directory as the latex ﬁle to be converted. For my example, I used journal-of-the-american-chemical-society.csl
Make sure the LaTeX ﬁle compiles properly with xelatex and bibtex, and the typeset result (PDF) looks as intended
Assuming you compile the citations from a larger set of bib ﬁles, create a single bib ﬁle for the article that contains all of the citations used, via
bibexport -o thecitations.bib pandoc-template.aux
Then change the bibliography command in the LaTeX ﬁle to read
\bibliography{thecitations}
and repeat the xelatex-bibtex-xelatex cycle once more to conﬁrm your citations show up properly.

Conversion of pandoc-template.tex to Word ﬁle converted.docx is then done with (single command)

      pandoc -F pandoc-crossref -M autoEqnLabels -M tableEqns --citeproc -t markdown-citations  -s pandoc-template.tex -f latex -t docx -o converted.docx  --bibliography=thecitations.bib  --csl=journal-of-the-american-chemical-society.csl

You can also provide a reference Word ﬁle to control formatting anf font selections a bit better. For example, the download link is a Word ﬁle set up to use the Cambria font for most elements. To use this ﬁle, add in the conversion command above the option
--reference-doc cambria-template.docx
Automatic numbering of equation references in the Word output does not seem to work yet, or I’m using the pandoc-crossref ﬁlter incorrectly. Let me know if you have an idea how to get this to work. For the time being, we will have to update equation-references manually in the Word output, which is relatively easy if you use descriptive labels as suggested here.

Update 2024-06-12: Shen Lin <slin@nankai.edu.cn> informed me by email that Word 2019 can open a LaTeX-generated PDF directly. I tried this with the template provided on this page as well as with a simple (no equations) 20-page article manuscript, and the results were actually quite good. I use XeLaTeX to generate PDFs, and it was important to turn oﬀ ligatures with the font that I was using; otherwise, character combinations that led to ligature glyphs (ﬂ, ﬁ, ﬀ, etc.) were not converted into the Word ﬁle causing many misspelled words. To avoid this, I used

\defaultfontfeatures{Ligatures=CommonOff,Ligatures=TeXOff, 
Ligatures=RequiredOff,Ligatures=ContextualOff,Ligatures=HistoricOff}

______________