Lettrine and other packages

TeX and its descendants (LaTeX, et al.) have a vast array of add-on packages for an author to use. Most of these are so specific that they’d probably only be useful to a handful of people, but some are almost universal. Memoir, of course, is one of them, though I’ve already spoken about it. This time, I’d like to look at a few others that I use.

Lettrine

The lettrine package is what I use to make drop caps and raised initials, as you’ll recall from the debacle that is my Pandoc filters. For paperback books, especially fiction, these are a nice typographic touch, the kind of thing that, I feel, makes a book look more professional. Personally, I prefer raised initials rather than the dropped capitals, but lettrine works for both.

It’s geared towards European languages, and the examples are actually only in French and German, not English. The documentation, however, is perfectly readable.

Using lettrine isn’t that hard. Unless you need some serious customization, you can get by with just putting the first letter (the one you want to raise or drop) in one set of braces, then anything you’d like in small caps in another: \lettrine{L}{ike this}.

By default, that gives you a two-line dropped capital, but you can change that with options that you place in square brackets before the text. So, to get my preferred raising, I would do: \lettrine[lines=1]{T}{his}. The manual has more options you can use, mostly for tweaking problem letters like J and Q, typesetting opening quotation marks in normal size, and even adding images for something like a medieval manuscript.

Microtype

The second package, microtype, is one of the more complicated ones. Fortunately, there’s not a lot you have to do to use it. Just including it in your document already gets you something subtly better.

What microtype actually does is hard to explain without delving deep into typography. Basically, it gives you a way to change aspects of a font such as kerning and have those changes affect the entire document. I’ll freely admit that I don’t understand everything it does, nor how it works. And the manual is over 240 pages long, so that won’t change anytime soon. Still, I can’t deny its usefulness.

selnolig

Finally, we have selnolig. This one is a bit obscure, compared to the other two, but it turned out to be exactly what I needed for one very specific scenario. Thus, I thought it made a good illustration of the breadth of TeX packages.

If you look closely at a (good) printed book, you’ll notice that the letters aren’t always distinct. In printing, we use a number of ligatures to join letters, which helps them “flow” together. Letter pairs and triplets like “fl”, “ffi”, and “ft” are often combined like this, though there are cases where it’s recommended that you don’t.

The selnolig package handles all that for you, breaking up the automatic ligatures TeX likes to add in the words where they don’t necessarily belong. It also activates some “historic” ligatures, if you want them, so your book can look like it was written in the 1700s.

Far more important, however, is the ability to selectively disable ligatures that gives the package its name. The font I used in Before I Wake (which I’ll probably continue to use in future books) has a very annoying “Th” ligature. Personally, I just don’t like the way it looks; it makes that combination of letters look too…thin. So I went looking for a way to get rid of it, and I found selnolig. Ridding myself of this pesky addition was a single line of code: \nolig{Th}{T|h}. That tells selnolig to always break “Th” into a separate “T” and “h”, with an invisible space in between. This space stops the ligature from forming, which is exactly what I wanted.

Everything else

I haven’t even touched on the myriad other TeX packages out there. There’s not enough time in my life to go through them. Of course, there are quite a few that I couldn’t live without: geometry, fontspec, graphicx, etc. For my aborted attempt at a popular book on mathematics, I used tikz to draw diagrams. I tried to write a book about conlangs almost ten years ago, and I used tipa for that one. Whatever you’re looking for, you’ll probably find it over at CTAN.

And that concludes this little series. Now, it’s back to writing books, rather than writing about writing them. Look for the latest fruit borne from my work with Pandoc, LaTeX, Memoir, and all the rest coming in July.

Pandoc filter for books

Pandoc is a great tool, as I’ve already stated, but it can’t do everything. And some of the things it dies aren’t exactly what you’d want when creating a book. This is especially true when working on a print-ready PDF, as I’ve been doing.

Fortunately, there is a solution. Unfortunately, it’s not a pretty one. The way Pandoc works internally—I’m simplifying a lot here, so bear with me—is by turning your input document (Markdown, in my case) into an AST, then building your output from that. That’s basically the same thing a compiler does, so you could think of Pandoc as something like a Markdown compiler that can output PDF, EPUB, HTML, or whatever.

In addition to the usual “compiler” niceties, you’re also given access to an intermediate stage. If you tell Pandoc to let you, you can use filters to modify the AST before it’s sent off to the output stage. That lets you do a lot of modifications and effects that aren’t possible with plain Markdown or LaTeX or even HTML. It’s great, but…

Cruel and unusual

But Pandoc is written in Haskell. Haskell, if you’re not familiar with programming languages, is the tenth circle Dante didn’t tell you about. It’s awful, if you’ve ever written code in any other language, because it’s designed around a philosophy that doesn’t really match anything else in the programming world. (Seriously, go look up “monads” if you’re bored.) For us mere mortals, it’s sheer torture trying to understand a Haskell program, much less write one. And Pandoc’s default language for writing filters, alas, is this monstrosity.

If I had to do that, I’d have given up months ago. But I’m in luck, because Pandoc’s developer recognizes that we’re not all masochists, and he gave us the option to write filters in Python instead. Now that I can use. It’s not pretty. It’s not nice. But it gets the job done, and it does so without needing to install extra libraries or anything like that.

So, I’ve written a few filters that take care of some of the drudgery of converting Markdown into a decent-looking PDF. You can find them in this Gist if you want to see the gory details, and I’ll describe each of them below.

Fancy breaks

In Pandoc’s version of Markdown, you can get a horizontal rule (the HTML hr element) by making a line containing only asterisks with spaces between them: * * * is what I use these days. It’s simple enough, and you can use CSS to make it not appear as an actual line across the page, but as a nice vertical blank space that serves as a scene break. It even carries over into MOBI format when you use Kindlegen.

But it doesn’t work for PDFs. Well, it does, but there’s an even better way. Since I’m using Memoir, I get what are called “fancy” breaks. In print, they’re nothing more than a centered set of asterisks, stars, or any other icon you’d like to use. Those can be a bit tacky if they show up after every seen, though, so there’s another option that only shows the “fancy” breaks when they’d be at the end of a page, but instead puts in a “plain” blank otherwise. In Memoir, this is the \pfbreak command, and it’s smart enough to choose the right style every time.

So all the fancybreak.py filter does is swap out Pandoc’s HorizontalRule AST element, replacing it with the raw LaTeX code for Memoir’s “plain fancy break”. Take out the boilerplate, and it’s literally only three lines of code. Simple, even for me.

Writing links

Another difference between print and digital editions of a book comes from the formatting available. E-books are interactive in a way paper can’t be. They can use hyperlinks, and I do exactly that. But it’s impossible to click on a link in a paperback, and blue doesn’t show up in a black and white book, so I need to get rid of the link part. Ideally, I’d like to keep the address, though.

For this, I wrote the writelinks.py filter. This one’s a little bit harder to explain from a code point of view. From the reader’s perspective, though it’s easy: every link is removed, but its address is added to the text in parentheses instead. It comes out as preformatted (or verbatim) text, in whatever monospaced font I’m using. (I actually don’t remember which one.)

The guts of this filter are only 5 lines, and the hardest part was working out exactly what I had to do to get the link address. Pandoc’s API documentation isn’t very helpful in this regard, and it gets even worse in a moment.

Drop caps and raised initials

Here’s where I was ready to gouge my own eyeballs out. If you look at the code for dropcaps.py and raisedinitials.py, you’ll probably see why. Let’s back up just a second, though, so we can ask a simple question: What was I thinking? (Don’t answer that.)

I like the “raised initial” style for books. With this, the first letter of a chapter is printed bigger than the rest, and the rest of the first word is printed in regular-sized small caps. Other people like “drop caps”, where the initial letter hangs down into the first paragraph. Either way, one LaTeX package, lettrine, takes care of your needs. Using it with Memoir is a matter of importing it and adding a bit of markup at the beginning of each chapter.

Using it with Pandoc, on the other hand, takes more work. Since I don’t want to sprinkle LaTeX code all over my source documents, I made these filters to inject that code later in the process. And that was…not fun at all. After a lot of trial and error (going from Haskell to Python and back doesn’t give you a lot of useful diagnostics), I settled on the process I used in these filters. They’re the same thing, by the way. The only real difference is their output.

Well, and dropcaps.py has to break up a Quoted element so it doesn’t blow up the opening quotation mark instead of the first letter. Doing that required some trickery. If you’d like to try it for yourself, I suggest drinking heavily. If you don’t drink, well, you’ll want to by the time you’re done.

Limitations and future expansion

Anyway, after I finished this herculean task, I had a set of filters that would let me use my original source files but produce something much more suited to Memoir and the paperback format. Now I’ve got fancy scene breaks, links that write themselves out when they’re in a PDF, and those wonderfully enormous initial letters starting each chapter.

Can I do more? Of course I can. The last two filters don’t take into account “front matter” chapters. For my current novels, that’s not a problem, as I don’t use those. But if you need something with, say, an extended foreword, then you’d need to hack on the scripts to fix that.

There’s also nothing at all I can do for the opening pages of the book, the parts that come before the text. Here, the formats are too different even for filters. I’m talking about the title page, copyright page, dedication, and things like that. (These, in fact, are considered front matter, but they’re not part of a chapter, so the last paragraph doesn’t apply.) I still need to maintain two versions of those, and I don’t see any real way around that.

Still, what I’ve got so far is good. It was a lot of work, but it’s work I only have to do once. That’s the beauty of programming in a nutshell. It’s automation. Sure, I could have done the editing by hand instead of writing scripts to do it for me, and I probably would have been done sooner, but now I won’t have to do it all over again for Nocturne or any other book I write in the future.

To close out this miniseries, I have one more post in mind. This one will look at some of the additional LaTeX packages I used, like the lettrine one I mentioned above. By the time that comes out, maybe I’ll even have another book ready.

Playing with Memoir

Last time, I talked a little about how I used Pandoc to create a paperback book. Well, since I wrote that, I’ve not only posted the thing, but I have a copy of my own. Seriously. That’s a strange feeling, as I wrote about on Patreon.

Anyway, I promised I’d talk about how I did it, so that’s what I’ll do. First off, we’ll look at Memoir, one of the greatest inventions in the history of computer-aided authorship.

Optional text

Memoir is a LaTeX class; essentially, it’s a software package that gives you a framework for creating beautiful books with less painstaking effort than you would expect. (Not none, mind you. If you don’t know what you’re doing—I can’t say I do—then it can be…unwieldy.)

It’s not perfect, and the documentation is lacking in some respects (the package’s author actively refuses to tell you how to do some things that upset his aesthetic sensibilities), but it’s far superior to anything you’d get out of a word processor. Oh, and it’s like code, too, which is great for logical, left-brain types like me.

So, let’s assume you know how to use LaTeX and include classes and all that, because this isn’t a tutorial. Instead, I’ll talk about what I did to beat this beast into shape.

First off, we’ve got the class options. Like most LaTeX packages, Memoir is customizable in the extreme. It’s not meant only for books; you can do a journal article with it, or a thesis, or just about anything that could appear in print. So it has to be ready for all those different printing formats. Want to make everything print only on one side of the page? You can do that. Multicolumn output, like in a newspaper? Sure, why not?

The list goes on, but I only need a few options. “Real” books are single-column and double-sided, so I’ll be using the appropriate class options, onecolumn and twosided. Books in English start on the right-hand page, so add in openright. But wait! Since most books use these options anyway, Memoir simply makes them the default, so I don’t have to do anything! (Now, if you’re making manga or something, you might need to use openleft instead, but that’s the exception, not the rule.)

Besides those, I only need to specify two other options. One is ebook, which sets the page to a nice 6″ x 9″—exactly the same as Amazon’s default paperback size. If you want something else, it can get…nontrivial, but let’s stick to the basics. Oh, and I want american, because I am one; this changes some of the typography rules, though I’ll confess I don’t know which ones.

Set it up

The remainder of the LaTeX “coding” is mostly a series of markup commands, which work a bit like HTML tags. The primary “content” ones are \frontmatter, \mainmatter, and \backmatter, which are common to Memoir and other packages; they tell the system where in the book you are. A preface, for instance, is in the front matter, and you can configure things so it gets its pages numbered in Roman numerals. Pretty much the usual, really, and not Memoir-specific.

For typography, some of the things I did include:

  • Changing margins. Amazon is finicky when it comes to these. It actually rejected my original design, because Memoir’s 0.5″ is apparently less than their 0.5″. So I’m using 0.75″ on the left and right for Before I Wake, and I suspect Nocturne will need something even bigger on the inside edge. Top and bottom get 1″ each, which seems comfortable.

  • Adding subtitle support. I don’t need this for either of the two novels I mentioned, but I might later on. Pandoc passes the subtitle part of its metadata through to LaTeX, but Memoir doesn’t support it. So I fixed that.

  • Creating a new title page. This was fun, for varying values of “fun”. Mostly, I just needed something functional. Then I had to do it again, to make the “half-title” page that professional books have.

  • Fixed headers and footers. This was mostly just configuration: page numbers in the outer corner of the header, author and title alternately in the middle, and footers left blank. Not too bad.

  • Changing the chapter style. Here’s where I almost gave up. By default, Pandoc tells LaTeX to create numbered chapters. Well, I did that myself. Rather than go back and change that (it would screw up the EPUB creation), I told Memoir to ignore the pre-made numbering completely. This is especially important when I get to Nocturne, because it has a prologue and epilogue. Having it put “Chapter 1: Prologue” would just be stupid.

  • Add blank pages. Now, you might be wondering about this one. Trust me, it’s for a good cause. Memoir is smart enough to add blank pages to make a chapter start on the right side (that openright thing I mentioned earlier), but it won’t do that at the end of the book, or if you go and manually make a title page, like I did. Oh, and if you’re doing a print book, remember that it ends on the left page.

The whole thing was almost a hundred lines of code, including the text for, e.g., the copyright and dedication pages. All in all, it took about three or four hours of work, but I really only have to do it once. Next time around, I just tweak a few values here and there, and that’s it. Automation. It’ll eventually take everybody’s job.

Coming up

So that’s enough to get something that looks like a book, but I’m still not done. Next up, you’ll get to see the bane of my existence: Pandoc filters. And then I’ll throw in a little bit about some interesting LaTeX packages I use, because I need Code posts. See you then!

Pandoc, LaTeX, and Memoir

A while back, I wrote about the “inner workings” of my writing. My stories are created using Markdown, which I run through a program called Pandoc to turn into EPUB format. (Then, to make Amazon happy, I send that through KindleGen, which spits out a MOBI file that can then go on the Kindle Store.) It works, and there’s a minimum of fuss. No fiddling with margins and page layout, no worrying about arcane or proprietary file formats, just a lot of text that already looks pretty much like a book.

Well, Amazon has a new thing for their KDP self-publishers: paperbacks. If you remember Createspace, it’s kinda like that, but integrated with the “main” Kindle Store. All you really have to do is upload a new format manuscript, and they’ll even give you an ISBN. (Note for non-US readers: my country seriously overcharges for ISBNs, so getting one for free is a big deal.) And the paper book shows up on Amazon as an option alongside the Kindle digital version. My brother already tried it with his book Angel’s Sin, and it seems to have worked.

So, of course, now I’m going to do the same with Before I Wake and the forthcoming Nocturne, as well as some of my future projects. To do this, however, I’ve had to delve deeper into the mechanics of my workflow.

The format issue

Amazon doesn’t like EPUBs. That’s well known. For digital books, they really, really want you to send them either a MOBI file, or something like HTML or a Word document. That’s most assuredly because of DRM. (It can’t be because they don’t know how to convert, since they give you a command-line tool to do so!) Be that as it may, I don’t really mind the last little step of running KindleGen to make an Amazon-friendly version; it’s easily automated, and I’ll still have the EPUB ready to go on Patreon or wherever.

With this new paperback option, however, there’s a problem: they don’t take MOBI, either! Nope, if you want to upload a manuscript for actual printing, your options are Word DOC/DOCX, plain HTML (possibly zipped with images and stylesheets), or “print-ready” PDF. That last is code for, “Do all the layout yourself, ’cause we ain’t touching it.”

Well, there’s the dilemma. Pandoc will happily output just about whatever format you like, but each of the options available has its downsides. Microsoft Word documents require (naturally) Microsoft Word, which isn’t really an option for a Linux user like myself. (The web app version of Office is also a nonstarter, for much the same reasons.) Zipped HTML is essentially an EPUB already, but then you have all the layout issues that come from shoving a “streaming” markup format like HTML into the “blocks” of a printed page. Fiddly bits like margins and headers and page numbers, and all with no usable previewer.

So what does that leave? Only one thing: PDF. And Pandoc can make a PDF, but not by itself. Fortunately, it knows someone who can help.

The type type

TeX (that’s really how it’s meant to be written in plain text) is the famous typesetting program originally developed by the equally famous Donald Knuth. I’ve used it many times before, on Linux and on Windows, and it works great for what it is: a “programmer’s” interface to text layout. Not a word processor, but a text processor.

TeX has been extended a few times over the past 40 or so years, and it has accrued an entire ecosystem of add-ons, bells and whistles, and documentation. If you’re willing to put in the work, you can get a seriously beautiful document. By default, it comes out in PostScript format, which is relatively arcane and not really useful to anyone. But far more common these days is its PDF option. Its print-ready PDF option.

I don’t mind writing a bit of code. I’d rather do that than play around in a word processor GUI, clicking at buttons and tweaking margins. Give me the linear word any day of the week. So I decided I’d try to use TeX (actually, the much simpler wrapper LaTeX, and be absolutely sure you capitalize that one right!) with Pandoc to make a printable PDF of one of my books.

Writing my memoir

The full story is going to play out over the next few weeks. I’ve been searching for new material for the “Code” posts here, and now I’ve found it: a deep look into what it takes for me, a very non-artistic writer experienced with programming in multiple languages and environments, to create something that looks like a book.

In the first of multiple upcoming posts, I’ll look at memoir, a wonderful LaTeX extension (“class”, as they’re called) used for creating books that truly look like they were designed by professionals. It’s not exactly plug-and-play, and I’ll gladly admit that I had to do a lot of work to beat it into shape, but I only had to do it once. Now, every book I write can use the same foundation, the same basic template.

After that, I’ll go back to Pandoc and show you the work I did to convince it to do what I wanted. I’ve never written a horror story before, but this might be the closest to it, from a programmer’s perspective. It was a coding nightmare, one I’m not sure I’m out of yet, but the end result is everything I need in a book, as you’ll see.

How I made a book with Markdown and Pandoc

So I’m getting ready to self-publish my first book. I’ll have more detail about that as soon as it’s done; for now, I’m going to talk a little about the behind-the-scenes work. This post really straddles the line between writing and computers, and there will be some technical bits, so be warned.

The tech

I’ll admit it. I don’t like word processors that much. Microsoft Word, LibreOffice Writer, or whatever else is out there (even the old standby: WordPerfect), I don’t really care for them. They have their upsides, true, but they just don’t “fit” me. I suspect two reasons for this. First, I’m a programmer. I’m not afraid of text, and I don’t need shiny buttons and WYSIWYG styling. Second, I can be a bit obsessive. Presented with all the options of a modern word processor, like fonts and colors and borders and a table of contents, I’d spend more time fiddling with options than I would writing! So, when I want to write, I don’t bother with the fancy office apps. I just fire up a text editor (Vim is my personal choice, but I wouldn’t recommend it for you) and get to it.

“But what about formatting?” you may ask. Well, that’s an interesting story. At first, I didn’t even bother with inline formatting. I used the old-school, ad hoc styling familiar to anybody who remembers USENET, IRC, or email conversations. Sure, I could use HTML, just like a web page would, but the tags get in the way, and they’re pretty ugly. So I simply followed a few conventions, namely:

  • Chapter headers are marked by a following line of = or -.
  • A blank line means a paragraph break.
  • Emphasis (italics or text in a foreign language, for example) is indicated by surrounding _.
  • Bold text (when I need it, which is rare) uses *.
  • Scene breaks are made with a line containing multiple * and nothing else. (e.g., * * *)

Anything else—paragraph indentation, true dashes, block quotes, etc.—I’d take care of when it was time to publish. (“I’ll fix it in post.”) Simple, quick, and to the point. As a bonus, the text file is completely readable.

Mark it up

I based this system on email conventions and the style used by Project Gutenberg for their text ebooks. And it worked. I’ve written about 400,000 words this way, and it’s certainly good for getting down to business. But it takes a lot of post-processing, and that’s work. As a programmer, work is something I like to avoid.

Enter Markdown. It’s not much more than a codified set of conventions for representing HTML-like styling in plain text, and it’s little different from what I was already using. Sounds great. Even better, it has tool support! (There’s even a Wordpress plugin, which means I can write these posts in Markdown, using Vim, and they come out as HTML for you.)

Markdown is great for its intended purpose, as an HTML replacement. Books need more than that, though; they aren’t just text and formatting. And that’s where the crown jewel comes in: Pandoc. It takes in Markdown text and spits out HTML or EPUB. And EPUB is what I want, because that’s the standard for ebooks (except Kindle, which uses MOBI, but that’s beside the point).

Putting the pieces together

All this together means that I have a complete set of book-making tools without ever touching a word processor, typesetting program, or anything of the sort. It’s not perfect, it’s not fancy, and it certainly isn’t anywhere near professional. But I’m not a professional, am I?

For those wondering, here are the steps:

  1. Write book text in Pandoc-flavored Markdown. (Pandoc has its own Markdown extensions which are absolutely vital, like header identifiers and smart punctuation.)

  2. Write all the other text—copyright, dedication, “About the Author”, and whatever else you need. (“Front matter” and “back matter” are the technical terms.) I put these in separate Markdown files.

  3. Create EPUB metadata file. This contains the author, title, date, and other attributes that ebook readers can use. (Pandoc uses a format called YAML for this, but it also takes XML.)

  4. Make a cover. This one’s the hard part for me, since I have approximately zero artistic talent.

  5. Create stylesheet and add styling. EPUB uses the same CSS styling as HTML web pages, and Pandoc helps you a lot with this. Also, this is where I fix things like chapter headings, drop caps/raised initials, and so on.

  6. Run Pandoc to generate the EPUB. (The command would probably look something like this: pandoc --smart --normalize --toc-depth=1 --epub-stylesheet=<stylesheet file> --epub-cover-image=<cover image> -o <output file> <front matter .md file> <main book text file(s)> <back matter .md file> <metadata .yml or .xml file>)

  7. Open the output file in an ebook reader (Calibre, for me) and take a look.

  8. Repeat steps 5 and 6 until the formatting looks right.

  9. Run KindleGen to make a MOBI file. You only need this if you intend to publish on Amazon’s store. (I do, so I had to do this step.)

  10. Bask in the glory of creating a book! Oh, and upload your book to wherever. That’s probably a good idea, too.

Yeah, there are easier methods. A lot of people seem allergic to the command line; if you’re one of them, this isn’t the way for you. But I’m comfortable in the terminal. As I said, I’m a programmer, so I have to be. The hardest part for me (except the cover) was figuring out the options I needed to make something that looked like a proper ebook.

Even if you don’t use my cobbled-together method of creating an ebook, you still owe it to yourself to check out Pandoc. It’s so much easier, in my opinion, than a word processor or ebook editor. There are even graphical front-ends out there, if that’s what you prefer. But I like working with plain text. It’s easy, it’s readable, and it just works.