Writing a textbook using Jupyter and Sphinx

Sun 15 November 2020

I spent some time under coronavirus lockdown writing a book on generating mathematical artwork using Python. If you want a preview of how the PDF format turned out, look at the free sample of the book here.

Because the book tightly integrates Python code, generateed images, and text, I decided to try writing the manuscript in Jupyter Notebooks and use nbsphinx to convert the notebooks into PDF and EPUB outputs. The primary purpose of Sphinx is writing online documentation for Python libraries, so adapting it for a book with chapters and sections and biliographies took a bit of experimenting, but in the end produced a very nice result.

A template for setting up the Ebook can be obtained from my Bitbucket repository. More details of the settings and configuration follow.

The Sphinx tool, typically used to generate documentation for Python packages, was used to convert the Jupyter notebooks and RST files into the book output, both in Portable Document Format (PDF) and Electronic Publication (EPUB) formats. Another Python package, Nbsphinx, was installed to handle conversion of the Jupyter notebooks to allow Sphinx to read them. Jupyter notebooks have the built-in option for writing documentation cells in Markdown syntax. While this is useful in single notebooks, it is limited when adding citations or crossreferences to sections written in other notebook files. For this reason, the text was written in Jupyter raw cells with RST syntax. For these cells, the cell type was changed from Code to Raw, then the “Raw NBConvert Format” option changed to “ReStructured Text” on every cell with text in it. Code cells were added and run as normal in Jupyter. Some code cells were hidden for cases where the code is not relevant to the chapter, such as repeated imports or functions already defined in previous chapters. For hidden cells, the metadata was added to the cell (in Jupyter, from the View menu, select Cell Toolbar > Edit MetaData, then click the Metadata button on the cell to hide and add the following dictionary item.

{"nbsphinx": "hidden"}

There is no easy way, that I could find, to hide the code cell but still show its output using Nbsphinx. For these cases, the code cell was hidden with the above metadata while its output was saved to an image file. The next cell then loaded that image file with the ".. image::" directive in RST.

A few Sphinx extensions were installed and enabled to help with formatting:

  • sphinx.ext.imgmath: for converting equations into PNG format for EPUB compatibility
  • sphinxcontrib.bibtex: for adding citations and a bibliography
  • sphinxcontrib.cairosvgconverter: for converting SVG output from Jupyter cells into PDF for Latex compatibility
  • sphinx.ext.mathjax: to use Mathjax for rendering equations in HTML output.

HTML was used for proofing and debugging the book, but not in the final PDF or EPUB product.

The other formatting issue I didn’t like was how, like in typical Jupyter notebooks, each cell is numbered with a prompt In[1] or Out[1]. While useful in a code development notebook, it just adds clutter and confusion to a book. While the Sphinx developers offer a way using CSS to hide these prompts, it only works in HTML or HTML-based output formats, which does not include Latex. Setting these lines

nbsphinx_input_prompt = ''
nbsphinx_output_prompt = ''

in the conf.py file remove the prompts from all output formats. For EPUB format, a separate cover page and header file was required, while the Latex version stored its cover page in the Latex preamble variable defined in the Sphinx conf.py file. The PDF is generated via LaTeX, which usually does a nice job placing images and text and paginating the document. In some cases, however, page breaks were put in places that don’t work, for example in Chapter 1 the original document had a page break right in the middle of the text-based Mandelbrot output. To remedy these, Latex raw directives were used:

.. raw:: latex

Since the raw:: latex directive is only processed for PDF outputs, this won’t affect the EPUB version or HTML documents that do not have the same pagination.

My Python environment was installed using the Anaconda Python distribution, with a specific conda environment set up to specify all the package versions for the book. The specific environment, including all version numbers used to compile the book, is defined by the environment.yml file. Some of these packages, such as numpy, matplotlib, scipy, and sympy, are not necessary for typesetting the book itself, but were used by my particular code.

    name: book
    - conda-forge
    - numpy==1.19.1
    - matplotlib==3.3.0
    - scipy==1.5.2
    - sympy==1.6.1
    - pillow==7.2.0
    - jupyterlab==2.2.4
    - numba==0.50.1
    - sphinx==3.3.1
    - nbsphinx==0.7.1
    - sphinxcontrib-bibtex==1.0.0
    - cairosvg==2.4.2
    - pip==20.2.1:
    - sphinxcontrib-svg2pdfconverter[CairoSVG]==1.1.0