Pandoc, LaTeX, and Memoir

A while back, I wrote about the “inner workings” of my writing. My stories are created using Markdown, which I run through a program called Pandoc to turn into EPUB format. (Then, to make Amazon happy, I send that through KindleGen, which spits out a MOBI file that can then go on the Kindle Store.) It works, and there’s a minimum of fuss. No fiddling with margins and page layout, no worrying about arcane or proprietary file formats, just a lot of text that already looks pretty much like a book.

Well, Amazon has a new thing for their KDP self-publishers: paperbacks. If you remember Createspace, it’s kinda like that, but integrated with the “main” Kindle Store. All you really have to do is upload a new format manuscript, and they’ll even give you an ISBN. (Note for non-US readers: my country seriously overcharges for ISBNs, so getting one for free is a big deal.) And the paper book shows up on Amazon as an option alongside the Kindle digital version. My brother already tried it with his book Angel’s Sin, and it seems to have worked.

So, of course, now I’m going to do the same with Before I Wake and the forthcoming Nocturne, as well as some of my future projects. To do this, however, I’ve had to delve deeper into the mechanics of my workflow.

The format issue

Amazon doesn’t like EPUBs. That’s well known. For digital books, they really, really want you to send them either a MOBI file, or something like HTML or a Word document. That’s most assuredly because of DRM. (It can’t be because they don’t know how to convert, since they give you a command-line tool to do so!) Be that as it may, I don’t really mind the last little step of running KindleGen to make an Amazon-friendly version; it’s easily automated, and I’ll still have the EPUB ready to go on Patreon or wherever.

With this new paperback option, however, there’s a problem: they don’t take MOBI, either! Nope, if you want to upload a manuscript for actual printing, your options are Word DOC/DOCX, plain HTML (possibly zipped with images and stylesheets), or “print-ready” PDF. That last is code for, “Do all the layout yourself, ’cause we ain’t touching it.”

Well, there’s the dilemma. Pandoc will happily output just about whatever format you like, but each of the options available has its downsides. Microsoft Word documents require (naturally) Microsoft Word, which isn’t really an option for a Linux user like myself. (The web app version of Office is also a nonstarter, for much the same reasons.) Zipped HTML is essentially an EPUB already, but then you have all the layout issues that come from shoving a “streaming” markup format like HTML into the “blocks” of a printed page. Fiddly bits like margins and headers and page numbers, and all with no usable previewer.

So what does that leave? Only one thing: PDF. And Pandoc can make a PDF, but not by itself. Fortunately, it knows someone who can help.

The type type

TeX (that’s really how it’s meant to be written in plain text) is the famous typesetting program originally developed by the equally famous Donald Knuth. I’ve used it many times before, on Linux and on Windows, and it works great for what it is: a “programmer’s” interface to text layout. Not a word processor, but a text processor.

TeX has been extended a few times over the past 40 or so years, and it has accrued an entire ecosystem of add-ons, bells and whistles, and documentation. If you’re willing to put in the work, you can get a seriously beautiful document. By default, it comes out in PostScript format, which is relatively arcane and not really useful to anyone. But far more common these days is its PDF option. Its print-ready PDF option.

I don’t mind writing a bit of code. I’d rather do that than play around in a word processor GUI, clicking at buttons and tweaking margins. Give me the linear word any day of the week. So I decided I’d try to use TeX (actually, the much simpler wrapper LaTeX, and be absolutely sure you capitalize that one right!) with Pandoc to make a printable PDF of one of my books.

Writing my memoir

The full story is going to play out over the next few weeks. I’ve been searching for new material for the “Code” posts here, and now I’ve found it: a deep look into what it takes for me, a very non-artistic writer experienced with programming in multiple languages and environments, to create something that looks like a book.

In the first of multiple upcoming posts, I’ll look at memoir, a wonderful LaTeX extension (“class”, as they’re called) used for creating books that truly look like they were designed by professionals. It’s not exactly plug-and-play, and I’ll gladly admit that I had to do a lot of work to beat it into shape, but I only had to do it once. Now, every book I write can use the same foundation, the same basic template.

After that, I’ll go back to Pandoc and show you the work I did to convince it to do what I wanted. I’ve never written a horror story before, but this might be the closest to it, from a programmer’s perspective. It was a coding nightmare, one I’m not sure I’m out of yet, but the end result is everything I need in a book, as you’ll see.

How I made a book with Markdown and Pandoc

So I’m getting ready to self-publish my first book. I’ll have more detail about that as soon as it’s done; for now, I’m going to talk a little about the behind-the-scenes work. This post really straddles the line between writing and computers, and there will be some technical bits, so be warned.

The tech

I’ll admit it. I don’t like word processors that much. Microsoft Word, LibreOffice Writer, or whatever else is out there (even the old standby: WordPerfect), I don’t really care for them. They have their upsides, true, but they just don’t “fit” me. I suspect two reasons for this. First, I’m a programmer. I’m not afraid of text, and I don’t need shiny buttons and WYSIWYG styling. Second, I can be a bit obsessive. Presented with all the options of a modern word processor, like fonts and colors and borders and a table of contents, I’d spend more time fiddling with options than I would writing! So, when I want to write, I don’t bother with the fancy office apps. I just fire up a text editor (Vim is my personal choice, but I wouldn’t recommend it for you) and get to it.

“But what about formatting?” you may ask. Well, that’s an interesting story. At first, I didn’t even bother with inline formatting. I used the old-school, ad hoc styling familiar to anybody who remembers USENET, IRC, or email conversations. Sure, I could use HTML, just like a web page would, but the tags get in the way, and they’re pretty ugly. So I simply followed a few conventions, namely:

  • Chapter headers are marked by a following line of = or -.
  • A blank line means a paragraph break.
  • Emphasis (italics or text in a foreign language, for example) is indicated by surrounding _.
  • Bold text (when I need it, which is rare) uses *.
  • Scene breaks are made with a line containing multiple * and nothing else. (e.g., * * *)

Anything else—paragraph indentation, true dashes, block quotes, etc.—I’d take care of when it was time to publish. (“I’ll fix it in post.”) Simple, quick, and to the point. As a bonus, the text file is completely readable.

Mark it up

I based this system on email conventions and the style used by Project Gutenberg for their text ebooks. And it worked. I’ve written about 400,000 words this way, and it’s certainly good for getting down to business. But it takes a lot of post-processing, and that’s work. As a programmer, work is something I like to avoid.

Enter Markdown. It’s not much more than a codified set of conventions for representing HTML-like styling in plain text, and it’s little different from what I was already using. Sounds great. Even better, it has tool support! (There’s even a Wordpress plugin, which means I can write these posts in Markdown, using Vim, and they come out as HTML for you.)

Markdown is great for its intended purpose, as an HTML replacement. Books need more than that, though; they aren’t just text and formatting. And that’s where the crown jewel comes in: Pandoc. It takes in Markdown text and spits out HTML or EPUB. And EPUB is what I want, because that’s the standard for ebooks (except Kindle, which uses MOBI, but that’s beside the point).

Putting the pieces together

All this together means that I have a complete set of book-making tools without ever touching a word processor, typesetting program, or anything of the sort. It’s not perfect, it’s not fancy, and it certainly isn’t anywhere near professional. But I’m not a professional, am I?

For those wondering, here are the steps:

  1. Write book text in Pandoc-flavored Markdown. (Pandoc has its own Markdown extensions which are absolutely vital, like header identifiers and smart punctuation.)

  2. Write all the other text—copyright, dedication, “About the Author”, and whatever else you need. (“Front matter” and “back matter” are the technical terms.) I put these in separate Markdown files.

  3. Create EPUB metadata file. This contains the author, title, date, and other attributes that ebook readers can use. (Pandoc uses a format called YAML for this, but it also takes XML.)

  4. Make a cover. This one’s the hard part for me, since I have approximately zero artistic talent.

  5. Create stylesheet and add styling. EPUB uses the same CSS styling as HTML web pages, and Pandoc helps you a lot with this. Also, this is where I fix things like chapter headings, drop caps/raised initials, and so on.

  6. Run Pandoc to generate the EPUB. (The command would probably look something like this: pandoc --smart --normalize --toc-depth=1 --epub-stylesheet=<stylesheet file> --epub-cover-image=<cover image> -o <output file> <front matter .md file> <main book text file(s)> <back matter .md file> <metadata .yml or .xml file>)

  7. Open the output file in an ebook reader (Calibre, for me) and take a look.

  8. Repeat steps 5 and 6 until the formatting looks right.

  9. Run KindleGen to make a MOBI file. You only need this if you intend to publish on Amazon’s store. (I do, so I had to do this step.)

  10. Bask in the glory of creating a book! Oh, and upload your book to wherever. That’s probably a good idea, too.

Yeah, there are easier methods. A lot of people seem allergic to the command line; if you’re one of them, this isn’t the way for you. But I’m comfortable in the terminal. As I said, I’m a programmer, so I have to be. The hardest part for me (except the cover) was figuring out the options I needed to make something that looked like a proper ebook.

Even if you don’t use my cobbled-together method of creating an ebook, you still owe it to yourself to check out Pandoc. It’s so much easier, in my opinion, than a word processor or ebook editor. There are even graphical front-ends out there, if that’s what you prefer. But I like working with plain text. It’s easy, it’s readable, and it just works.