Generating HTML pages from Latex
While latex is pretty much "not designed" for web content, it is very useful to generate a web-version of a latex document. The purpose of latex is clearly for typesetting layouts on a pre-defined page, but when you want to share the information with others, it's generally a lot easier for them to go to a webpage then it is to download and open a PDF. In addition, it's generally easier to view a webpage than a PDF because the content is continuous, and one can scroll around and click hyperlinks in a way that is far more fluid than on a PDF.
Now that MathML and SVG are becoming more supported by web browsers, there is a strong case for sharing mathy documents on the web in addition to paper documents (or PDFs, which are only slightly more readable than paper).
To this end, I've been evaluating various different Latex to HTML converters. I've tried the following on Linux (Ubuntu):
By far my favorite is LaTeXML. It generates crisp, simple pages using MathML and CSS, making it easy to customize the style. It doesn't support a whole lot of packages that I generally would like to use (like algorithm2e), but then again none of them do. Also, the ArXiV project is working on a branch of LaTeXML so there is promise that it will grow quickly to support a lot of the best packages.
Document Setup
My current approach to generating both PDFs and HTMLs from latex source is to use separate top-level documents for both. The directory structure looks something like this:
document |- document_html.tex |- document_pdf.tex |- document.tex |- preamble_common.tex |- preamble_html.tex |- preamble_pdf.tex \- references.bib
The two versions of document_[output].tex
are the top-level files. They look
like this:
%document_html.tex \documentclass[10pt]{article} \input{preamble_common} \input{preamble_html} \begin{document} \input{document} \end{document}
The pdf version is the same but it uses preamble_pdf
as an input. Note that
in latex you cannot nest \include
directives, but you can nest \input
directives. Also, \include
inserts a page-break so there is no need to use
them here. Rather document.tex
may \include
it's chapters as tex files or
the like.
Makefile
To ease the process of generating the different types, I'm using a makefile.
# The following definitions are the specifics of this project PDF_OUTPUT := document.pdf HTML_OUTPUT := document.html PDF_MAIN := document_pdf.tex HTML_MAIN := document_html.tex COMMON_TEX := document.tex \ preamble_common.tex PDF_TEX := $(COMMON_SRC) \ document_pdf.tex \ preamble_pdf.tex HTML_TEX := $(COMMON_SRC) \ document_html.tex \ preamble_html.tex BIB := references.bib # these variables are the dependencies for the outputs PDF_SRC := $(PDF_TEX) $(BIB) HTML_SRC := $(HTML_TEX) $(BIB) # the 'all' target will make both the pdf and html outputs all: pdf html # the 'pdf' target will make the pdf output pdf: $(PDF_OUTPUT) # the 'html' target will make the html output html: $(HTML_OUTPUT) # the pdf output depends on the pdf tex files # we use a shell script to optionally run pdflatex multiple times until the # output does not suggest that we rerun latex $(PDF_OUTPUT): $(PDF_TEX) @echo "Running pdflatex on $(PDF_MAIN)" @pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_0.log @echo "Running bibtex" @-bibtex $(basename $(PDF_MAIN)) > bibtex_pdf.log @echo "Checking for rerun suggestion" @for ITER in 1 2 3 4; do \ STABELIZED=`cat $(basename $(PDF_MAIN)).log | grep "Rerun"`; \ if [ -z "$$STABELIZED" ]; then \ echo "Document stabelized after $$ITER iterations"; \ break; \ fi; \ echo "Document not stabelized, rerunning pdflatex"; \ pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_$$ITER.log; \ done @echo "Copying pdf to target file" @cp $(basename $(PDF_MAIN)).pdf $(PDF_OUTPUT) # the html output depends on the html tex files # we have to process all of the bibliography files separately into xml files, # and then include them all in the call to the postprocessor $(HTML_OUTPUT): $(HTML_TEX) @echo "Running latexml on $(HTML_MAIN)" @latexml $(HTML_MAIN) -dest=$(basename $(HTML_OUTPUT)).xml > $(basename $(HTML_MAIN)).log 2>&1 @BIBSTRING=""; \ for BIBFILE in $(BIB); do \ echo "Running latexml on $$BIBFILE"; \ XMLFILE=`basename "$$BIBFILE" .bib`.xml; \ LOGFILE=`basename "$$BIBFILE" .bib`_html.log; \ latexml $$BIBFILE -dest=$$XMLFILE > $$LOGFILE 2>&1; \ BIBSTRING="$$BIBSTRING -bibliography=$$XMLFILE"; \ done; \ echo $$BIBSTRING > bibstring.txt @echo "postprocessing with `cat bibstring.txt`" @latexmlpost $(basename $(HTML_OUTPUT)).xml `cat bibstring.txt` -dest=$(HTML_OUTPUT) -css=navbar-left.css # the 2>/dev/null redirects stderr to the null device so that we don't get error # messages in the console when rm has nothing to remove clean: @-rm -v *.log 2>/dev/null @-rm -v *.out 2>/dev/null @-rm -v *.aux 2>/dev/null @-rm -v *.xml 2>/dev/null @-rm -v *.pdf 2>/dev/null @-rm -v *.html 2>/dev/null @-rm -v bibstring.txt 2>/dev/null
Some notes on the makefile. I execute bibtex ignoring errors (the dash symbol before 'bibtex') because bibtex will exit with an error if it doesn't find any citations, or if there is no bibliography. Each iteration of pdflatex is output to a logfile named "document_pdf_<i>.log" where "<i>" is the iteration number. The output of pdflatex and bibtex is supressed by dumping it to the logfile (I the verbosity useless to have in the console).
The shell script in the PDF recipe iterates up to four times. The first thing
it does is greps the output of the most recent run pdf latex looking for the
line where latex recommends that we "Rerun" latex. If it finds such a line it
sets the shell variable STABELIZED
to that string. Otherwise it gets the
empty string. Then we test to see if the string is empty. If it's empty, we're
done so we break the loop. If it's not, then we rerun pdflatex.
The shell script in the HTML recipe iterates over each of the (potentially
multiple, potentially zero) bibliography files, processing each of them with
latexml. It then appends the string "-bibliography=<filename>.xml" to
the BIBSTRING
shell variable. The last thing it does is echos the contents
of that shell variable to the file "bibstring.txt". This so so that subsequent
commands by make can find it.
Comments
Comments powered by Disqus