[Home] [GRP] [RES] [PUB] [ETC] [LEC] [S/W]
 

LaTeX to Word Conversion with Pandoc

(see an update re. conversion directly from PDF at the bottom of this page)

Suppose you have a LaTeX article, with plenty of literature references included via BibTeX, and you need to convert it to MS Word. I found that conversion with pandoc requires the least amount of time, meaning the least amount of work spent on fixing problems in the resulting Word file. For some comments about LaTeX vs. word processors, see this page.

One issue that I came across is that using LaTeX Math mode in-line, via $...$, triggers the creation of equation editor objects in the converted Word file. I prefer to keep the resulting Word file simple, with sub-and superscript formatting in Word instead of an equation object, if possible. Therefore, I define

  \renewcommand{\sup}[1]{\textsuperscript{#1}} 
  \newcommand{\sub}[1]{\textsubscript{#1}}

and use \sub{...} and \sup{...} for sub- and superscripts outside of display equations. This will translate into simple sub-/superscript formatting in the Word file, instead of an equation object. [A few comments are in order: (i) There is of course no need to define the \sub{...} and \sup{...} macros, if you don’t mind writing out the full names of the commands every time. (ii) If you don’t mind equation editor objects for simple sub-/super-scripts, just use LaTeX math mode. (iii) Depending on other packages that you load, \sup may have to be defined newly, or re-defined.]

Also to avoid unnecessary equation editor objects, I define macros to generate Greek characters and selected other special characters via Unicode instead of using LaTeX math mode. Examples are shown in the LaTeX template for conversion with pandoc provided here. You can also use the Unicode characters directly in the text if your editor has a simple way of inserting them. If you compile the LaTeX file with xelatex, for example, using a font such as Cambria as the base font, the Greek characters will show up properly both in the PDF generated by xelatex as well as in the converted Word file. lualatex should work the same way, but I haven’t tested that.

Key to a successful conversion is that you keep the LaTeX code plain and don’t load any convenience packages that aren’t supported by pandoc.

I provide here

The workflow is explained in the preamble of the LaTeX file. I repeat the key steps here. Of course, steps 1 and 2 only have to be done the first time, or when there is an update for pandoc or pandoc-crossref.

  1. Install pandoc: https://pandoc.org
     

  2. Install a matching version of pandoc-crossref: https://github.com/lierdakil/pandoc-crossref/releases
     

  3. For your literature citations, download a suitable citation style file from the Zotero style library and put the cls file in the same directory as the latex file to be converted. For my example, I used journal-of-the-american-chemical-society.csl
     

  4. Make sure the LaTeX file compiles properly with xelatex and bibtex, and the typeset result (PDF) looks as intended
     

  5. Assuming you compile the citations from a larger set of bib files, create a single bib file for the article that contains all of the citations used, via

    bibexport -o thecitations.bib pandoc-template.aux

    Then change the bibliography command in the LaTeX file to read

    \bibliography{thecitations}

    and repeat the xelatex-bibtex-xelatex cycle once more to confirm your citations show up properly.
     

  6. Conversion of pandoc-template.tex to Word file converted.docx is then done with (single command)

        pandoc -F pandoc-crossref -M autoEqnLabels -M tableEqns --citeproc -t markdown-citations  -s pandoc-template.tex -f latex -t docx -o converted.docx  --bibliography=thecitations.bib  --csl=journal-of-the-american-chemical-society.csl

     

  7. Automatic numbering of equation references in the Word output does not seem to work yet, or I’m using the pandoc-crossref filter incorrectly. Let me know if you have an idea how to get this to work. For the time being, we will have to update equation-references manually in the Word output, which is relatively easy if you use descriptive labels as suggested here.
     

Update 2024-06-12: Shen Lin <slin@nankai.edu.cn> informed me by email that Word 2019 can open a LaTeX-generated PDF directly. I tried this with the template provided on this page as well as with a simple (no equations) 20-page article manuscript, and the results were actually quite good. I use XeLaTeX to generate PDFs, and it was important to turn off ligatures with the font that I was using; otherwise, character combinations that led to ligature glyphs (fl, fi, ff, etc.) were not converted into the Word file causing many misspelled words. To avoid this, I used

\defaultfontfeatures{Ligatures=CommonOff,Ligatures=TeXOff, 
Ligatures=RequiredOff,Ligatures=ContextualOff,Ligatures=HistoricOff}

______________  
 

© 2021 – 2024 J. Autschbach.