How I made a book with Markdown and Pandoc

So I’m getting ready to self-publish my first book. I’ll have more detail about that as soon as it’s done; for now, I’m going to talk a little about the behind-the-scenes work. This post really straddles the line between writing and computers, and there will be some technical bits, so be warned.

The tech

I’ll admit it. I don’t like word processors that much. Microsoft Word, LibreOffice Writer, or whatever else is out there (even the old standby: WordPerfect), I don’t really care for them. They have their upsides, true, but they just don’t “fit” me. I suspect two reasons for this. First, I’m a programmer. I’m not afraid of text, and I don’t need shiny buttons and WYSIWYG styling. Second, I can be a bit obsessive. Presented with all the options of a modern word processor, like fonts and colors and borders and a table of contents, I’d spend more time fiddling with options than I would writing! So, when I want to write, I don’t bother with the fancy office apps. I just fire up a text editor (Vim is my personal choice, but I wouldn’t recommend it for you) and get to it.

“But what about formatting?” you may ask. Well, that’s an interesting story. At first, I didn’t even bother with inline formatting. I used the old-school, ad hoc styling familiar to anybody who remembers USENET, IRC, or email conversations. Sure, I could use HTML, just like a web page would, but the tags get in the way, and they’re pretty ugly. So I simply followed a few conventions, namely:

  • Chapter headers are marked by a following line of = or -.
  • A blank line means a paragraph break.
  • Emphasis (italics or text in a foreign language, for example) is indicated by surrounding _.
  • Bold text (when I need it, which is rare) uses *.
  • Scene breaks are made with a line containing multiple * and nothing else. (e.g., * * *)

Anything else—paragraph indentation, true dashes, block quotes, etc.—I’d take care of when it was time to publish. (“I’ll fix it in post.”) Simple, quick, and to the point. As a bonus, the text file is completely readable.

Mark it up

I based this system on email conventions and the style used by Project Gutenberg for their text ebooks. And it worked. I’ve written about 400,000 words this way, and it’s certainly good for getting down to business. But it takes a lot of post-processing, and that’s work. As a programmer, work is something I like to avoid.

Enter Markdown. It’s not much more than a codified set of conventions for representing HTML-like styling in plain text, and it’s little different from what I was already using. Sounds great. Even better, it has tool support! (There’s even a Wordpress plugin, which means I can write these posts in Markdown, using Vim, and they come out as HTML for you.)

Markdown is great for its intended purpose, as an HTML replacement. Books need more than that, though; they aren’t just text and formatting. And that’s where the crown jewel comes in: Pandoc. It takes in Markdown text and spits out HTML or EPUB. And EPUB is what I want, because that’s the standard for ebooks (except Kindle, which uses MOBI, but that’s beside the point).

Putting the pieces together

All this together means that I have a complete set of book-making tools without ever touching a word processor, typesetting program, or anything of the sort. It’s not perfect, it’s not fancy, and it certainly isn’t anywhere near professional. But I’m not a professional, am I?

For those wondering, here are the steps:

  1. Write book text in Pandoc-flavored Markdown. (Pandoc has its own Markdown extensions which are absolutely vital, like header identifiers and smart punctuation.)

  2. Write all the other text—copyright, dedication, “About the Author”, and whatever else you need. (“Front matter” and “back matter” are the technical terms.) I put these in separate Markdown files.

  3. Create EPUB metadata file. This contains the author, title, date, and other attributes that ebook readers can use. (Pandoc uses a format called YAML for this, but it also takes XML.)

  4. Make a cover. This one’s the hard part for me, since I have approximately zero artistic talent.

  5. Create stylesheet and add styling. EPUB uses the same CSS styling as HTML web pages, and Pandoc helps you a lot with this. Also, this is where I fix things like chapter headings, drop caps/raised initials, and so on.

  6. Run Pandoc to generate the EPUB. (The command would probably look something like this: pandoc --smart --normalize --toc-depth=1 --epub-stylesheet=<stylesheet file> --epub-cover-image=<cover image> -o <output file> <front matter .md file> <main book text file(s)> <back matter .md file> <metadata .yml or .xml file>)

  7. Open the output file in an ebook reader (Calibre, for me) and take a look.

  8. Repeat steps 5 and 6 until the formatting looks right.

  9. Run KindleGen to make a MOBI file. You only need this if you intend to publish on Amazon’s store. (I do, so I had to do this step.)

  10. Bask in the glory of creating a book! Oh, and upload your book to wherever. That’s probably a good idea, too.

Yeah, there are easier methods. A lot of people seem allergic to the command line; if you’re one of them, this isn’t the way for you. But I’m comfortable in the terminal. As I said, I’m a programmer, so I have to be. The hardest part for me (except the cover) was figuring out the options I needed to make something that looked like a proper ebook.

Even if you don’t use my cobbled-together method of creating an ebook, you still owe it to yourself to check out Pandoc. It’s so much easier, in my opinion, than a word processor or ebook editor. There are even graphical front-ends out there, if that’s what you prefer. But I like working with plain text. It’s easy, it’s readable, and it just works.

One thought on “How I made a book with Markdown and Pandoc”

Leave a Reply

Your email address will not be published. Required fields are marked *