Cracking the coding code

Woman in 1940s garb standing in front of a huge machine with lots of rotors

Got an e-mail from a fellow book designer this morning asking, “Do you have a blog post about marking up a MS for the designer/typesetter?” Um, I couldn’t remember; had to search my own blog to find out. I found I’d written two posts in which such issues come into play—

  • May I take your order? (September 30, 2006)—in which I show the sample pages I prepared to instruct a typesetter on a moderately complicated book design
  • How stylish are you? (January 19, 2008)—in which I listed and explained the most common style names I use when marking up or laying out a bookish document

But both of these posts are written from the designer’s desk, whereas my friend was, he later explained, looking for information that might help a fledgling editor (in this case, an editorial intern) understand how to mark up a manuscript. To which I said, “Um, hello, the Chicago Manual?” I know there’s some discussion of markup right there in the front, but I realized I hadn’t consulted that section in the 15th edition in years, and I hadn’t yet checked it in the 16th edition at all. So I looked! And found that there is now a sizable chunk of appendix devoted to markup, with an eye toward producing multiple output formats—print, HTML, e-books, and more. That appendix is heavy going, though, and more theoretical than practical. How might a designer or production editor explain, in, say, under twenty minutes, how a clever intern should mark up a manuscript?

Well. I do think the first two paragraphs in CMS16’s Markup section in Appendix A are a good place to start. Key points:

From A.6, “What is markup?”—

There are four basic ways to apply markup. . . . Briefly, they include markup by pencil on paper; generic markup in a word processor, which is similar to paper markup except that it is typed into the document (see 2.78); word-processing styles (see 2.79); and markup in a formal language such as XML (see 2.80). All of these share a fundamental purpose: to identify each element of a manuscript, from chapter numbers to chapter titles, subheads, paragraphs for running text and block quotations, emphasized text, entries in a bibliography, and so forth.

From A.7, “Semantic and functional markup”—

A formal markup language describes the structure of a document and identifies its components. This type of markup is sometimes referred to as semantic markup because it names the parts of a document rather than describing their appearance. To take one example, though the title Origin of Species might be distinguished from the surrounding text by italics, more meaningful markup would label it as a book title. Such markup would not only differentiate book titles from other types of italicized text (such as names of species or emphasized terms) but would allow them to be presented in something other than the customary italics if desired—for example, boldface or underscored. . . . Moreover, some markup may have formatting or functionality associated with it in one medium but not another. For example, markup that determines hyperlinked text—graphically distinct and clickable in an electronic document—may not be evident on the printed page.

In other words,

  • it doesn’t really matter how you indicate the style changes, as long as you follow a system that makes sense; and
  • mark up what the text is, not what you think it should look like.

That’s the theory. Now how do you practice it? Well, that depends on which method, of the four listed in A.6, you are following:

  1. markup by pencil on paper
  2. generic markup in a word processor
  3. word-processing styles
  4. markup in a formal language such as XML

I’ve seen the first three used extensively in the wild, but so far I’ve yet to receive a manuscript marked up in XML. Not holding my breath, either; XML is rat simple in some ways, but it can also make people extremely cranky.

Every production editor, designer, and compositor probably has his or her own preferred method, and if you’re working in house or freelancing for a steady client, you’ll use the house method. But at your next job, or whenever the current production chief retires, they’re likely to use a different system, so you should understand the gist of all four. Going through them one by one, therefore . . .

  1. Markup by pencil on paper

    This is the easiest and lowest tech, and probably therefore still the most common. There’s an illustration of this kind of coding in CMS16, figure 2.5. The basic idea is to write a brief code in the margin and circle it. It’s what we used at St. Martin’s while I was there, and I’ll bet a modest amount of money (hey, I’m underemployed; gimme a break) that it’s what they’re still using four years later. It’s the method I described in the penultimate paragraph of May I take your order?, which highlights the main drawback of this system:

    There’s only one chunk of poetry in this book, but it’s a long and messy one comprising selections from a longer, oddly structured work. The headings on these poetry extracts had all been coded correctly—A and B heads only—on the design dupe in red, and then recoded incorrectly—with three levels of heads—in blue on the setting copy. So I tossed it back on the production editor’s desk with a note asking if it was okay to change them back, and when she said it was, I erased all the (illegible, anyway) codes and remarked them in my color (which is usually brown). Frankly, it’s kind of annoying that the MS comes to me already coded, because quite often I find that the coding is wrong. I spend a lot of time erasing or crossing the codes out and rewriting them.

    Should you need to correct the coding, you either have to erase the old marks—which can be difficult, with some varieties of pencil—or cross them out—which can get ugly. I prefer to cross them out and re-mark in a different color, because it makes it easier for the typesetter to tell what’s going on. If I must erase, I do so as thoroughly as possible (using a Mars eraser or a retractable eraser stick—never the eraser on the end of the pencil, which exists solely to torment and dismay) and then write the new mark right over the old one, so the previous code can’t be read.

    When using this method, it’s rare to label every paragraph. What styles should you mark, then, and how often? The example in CMS shows only the heads being tagged—CT for the chapter title and A for an A-level head; everything else is assumed to be body text. Unless I’m given specific instructions, though, I prefer to err on the side of . . . well, specificity. I’d add the code TNI (per the naming scheme I described in How stylish are you? [BUT: see my comment below, added while I was actually awake]) next to the first paragraph under each heading, and TX next to the first indented paragraph below each of those. After that, I wouldn’t mark anything until the next change in style. I would also err on the side of semantic distinction. I’ll use separate codes for quoted prose extracts, quoted correspondence, and quoted verse, though they might all end up being typeset using essentially the same style.

  2. Generic markup in a word processor

    A lot of people do this in exactly the same way as in method 1, but within the actual file instead of on paper. That is, instead of writing “CT” in the margin next to a chapter title and circling it, they type “<ct>” or “[[CT]]” or “***CT***” at the head of the title itself, and without any space after it. The codes could be wrapped in any characters or groups of characters, as long as they’re not ones that occur in the actual text itself. (Angle brackets are probably a safe choice for novel, as long as it’s not science fiction, but it’d be unwise to use them to tag a book about anything related to Web technology, since they’re likely to appear in examples of HTML.)

    Another way to do this kind of markup is with paired start and end tags, as recommended in CMS16 2.78, “Generic markup for electronic manuscripts”:

    <cn> . . . </cn> chapter number
    <ct> . . . </ct> chapter title
    <a> . . . </a> first-level subhead (A-head)
    <b> . . . </b> second-level subhead (B-head)
    <ext> . . . </ext> block quotation (prose extract)
    <po> . . . </po> poetry extract
    <note-a> . . . </note-a> first-level subhead in endnotes section
    <tdotb> t with dot below (i.e., when the Unicode character for ṭ is not available in the font being used to prepare the manuscript; see 11.2)
    <! . . . !> instruction to the typesetter—for example, to consult hard copy or page image for proper alignment or other formatting

    The idea here is that you wrap everything in these pairs of labels:

    <ct>How I Spent My Summer Vacation</ct>
    <tnic1>This summer, I went to Italy with Elisabeth and her family. We stayed a week in each of two separate towns.</tnic1>
    <a>Casole d’Elsa</a>
    <tni>For the first week, we—Elisabeth, her husband, their two daughters, Elisabeth’s parents, and I—stayed at a resort tucked amid the hills of Tuscany. At the top of the nearest hill sat the tiny walled town of Casole d’Elsa.</tni>
    <tx>We spent a lot of time at the pool.</tx>

    Depending on your familiarity with and loyalty toward HTML, this markup system may look comforting or terrifying. Or something in between, which is where it sits for me. It’s my feeling that if you’re going to do this much markup, you might as well aim for valid XML. But if it serves as training wheels for less tech-y publishing people who’re trying to make that transition, that’s worthwhile, too, I guess.

  3. Word-processing styles

    This method is described in general terms in CMS16 2.79, “Software-generated styles.” As implemented specifically in MS Word, it has been my preferred method for the last, oh, fifteen years, ever since my programmer friend Steve showed me his super-secret method for quickly applying Word’s built-in styles (which I think was a custom control panel, written in Visual Basic? that supported keyboard shortcuts?). That was in Word 95 for Windows, mind you. MS Word has supported styles since forever, but most Office users still have no idea they exist, much less how to take advantage of them. And a lot of the best ways of taking advantage of them depend on automation, which is a lot less idiot-friendly since Microsoft killed off macros in Office 2008.

    Sigh.

    Styles are by no means unique to MS Word. RTF documents can contain styles, too, and most modern word processors can read and write RTF, if not .doc files. I’ve yet to find another program that makes styles as easy to use as Word does, though.1 And by easy, I mean fast. And by fast, I mean keyboard-controllable. I really don’t want to have to keep switching between keyboard and mouse when I’m doing something as repetitive as styling a book-length manuscript.

    So I’m still using Word 2004, and I have no idea where you’d find the equivalent controls in the newer versions with the “ribbon” interface. In general, though, the process would be as follows:

    1. Set up some styles. This can be tedious, if you use as many styles as I do, but you need only do it once, and then you can save the whole wad in a document template. I tend to use a lot of colors and fonts, to make the styles easy to distinguish at a glance. Since I’m just going to map them over to corresponding styles in InDesign by name, it doesn’t matter how hideous the Word doc ends up looking. If you’re coding an MS for somebody else, you might want to avoid the psychedelic ransom-note look, though.
    2. If the MS contains reasonably consistent localized formatting (e.g., all the A-level heads are bold, and all the B-heads are bold and italicized), use the “Select all” button in the formatting palette to apply styles to identically formatted noncontiguous chunks.
    3. Use search-and-replace to apply styles as thoroughly as possible to less consistent localized formatting.
    4. Use macros to apply styles to other weird but semiconsistent stuff. For example, if all the chapter titles are in all caps, and each one is preceded by a chapter number and followed by an epigraph, I might record a macro that searches for text in all caps, moves the cursor to the head of the line and applies the CT style, moves the cursor up one line and applies the CN style, and then moves the cursor down two lines and applies the EPI style. I’d then step through the document, styling the first three elements in each chapter using this macro.
    5. Once you’ve applied as much styling as you can through automation, check the whole document page by page to deal with whatever you’ve missed. When you’re done, nothing in the document will be styled as “Normal”; every distinct type of element in the text should have a named style attached to it.

    (All that said, even though I’m still running Office 2004 so I can keep my blessed macros, lately I’ve been doing more of my styling in InDesign. It’s faster to do so using InDesign’s QuickApply than it is to use Word’s formatting palette or the style dropdown on Word’s formatting toolbar. But if you’re an editorial intern who’s being tasked with coding a manuscript, you don’t have that option, so tough luck.)

  4. Markup in a formal language such as XML

    Weeeeeell, as I said, I’ve never personally seen this used on a real project. So although I’ve been wanking around with XML for years, and I could explain it in a general way, my description of XML markup as it’s used in publishing would be, at best, a SWAG. There’s some interesting stuff at Start With XML, but it seems geared more toward management types than toward us lowly manuscript monkeys, and there’ve been no new posts on the blog for more than a year.

    So I’m going to pull a Butterfly McQueen and say, “Lawzy, we got to have a doctor! I don’t know nothin’ ’bout birthin’ [XML] babies!” Anybody know of a good how-to-XMLize-books resource that’s adapted to the meanest understanding—or, at least, to the small-press level of meanness? If you have actual battlefield experience with this, care to drop some knowledge in the comments?

No matter which markup method you use, be sure to make a list of all the styles you ended up applying, with explanations as necessary. Keep your style names short, but don’t make anybody have to guess what they stand for.

Phew. Epic. Any questions? Anyone? Bueller?

Photo: US Navy Cryptanalytic Bombe by brewbooks / J Brew; some rights reserved.

  1. Any recommendations? And don’t say NeoOffice; I use it sometimes, but it makes me sad. []

5 thoughts on “Cracking the coding code

  1. mark up what the text is, not what you think it should look like.

    This is so crucial. I am still coming across (hardcopy) mss with some heads marked “bold, all caps” (or some such) rather than–or in addition to–the correct heading level. Argh! I may have to send the copyeditors to this post the next time I see it….

  2. It occurs to me, in the clear light of day, that I ought to occasionally pay attention to what I’m writing. It has therefore also occurred to me that I’ve gone and done exactly what I just said not to do, ignoring the principle Shelby has confirmed as crucial:

    mark up what the text is, not what you think it should look like.

    So, what the hell am I doing recommending the use of a TNI (text, no indent) style, which is, obviously, presentational rather than semantic? Oops. Do as I think I’m saying, not as I say I’m doing.

    In practice, I do make a bit more of a semantic distinction than that. I use a separate style—C1—for the first paragraph of every chapter, and what I really mean by TNI in the markup context is “first paragraph following a subhead” (though when I’m typesetting, yes, it just means “text, no indent”). I could get more specific than that—A1 for the first para after an A-head, B1, C1, etc.—but I’ve only ever done so while typesetting, for finer control over the text’s appearance. When coding a manuscript for someone else, it’s probably overkill.

    I also use a completely separate set of codes for sidebars (SBA, SBB, SB1, SBTNI, SBTX, SBEXT, SBBL, SBNL, etc.), captions (CAPTNI [which is really CAP1], CAPTX, etc.), tables (TA, TB, TTNI [aka T1], TTX, etc.), or any other stuff that’s not regular body text.

  3. Do you have any thoughts about markup in flat ASCII files?

    People like agents and editors sometimes ask for material submitted in the bodies of email messages, not as attached word-processing documents. Some agents encourage prospective clients to send a sample chapter this way in the initial query, and it’s pretty clear they use “Can you follow these instructions?” for triage of queries. Even if the recipient would be willing to accept something like Microsoft Word format, I’d quite often prefer to send flat ASCII if the recipient is willing to accept it, since flat ASCII is closer to what I use internally and less likely to raise software issues. But then even the most basic typographic effects may not have a representation. Where in a paper manuscript I would use underlining to indicate emphasis (which would likely end up typeset as italics), I’m not sure what to do in ASCII. There are a few different traditions from old-style Internet communication, including *asterisks* and _underscores_, but either of those may be confusing for someone who wasn’t on the Net in the days when they were more widely used. Then there’s the possibility of sending HTML email, but that raises plenty of problems of its own.

  4. Hmm. Excellent question. The last time I routinely received manuscripts in plaintext, they were marked up with Quark XPress tags or Xtags (PDF, 1970 KB), of which I was not exceedingly fond. In this modern day and age, I’d go with Markdown syntax—which is informed by the same classic Internet conventions you mentioned—and include a headnote stating as much, with a link to the Dingus in case the recipient wants to convert it to something prettier.

    As a typesetter, I would be delighted to receive manuscripts in Markdown. That’d save me quite a lot of cleanup work.

    I use Michael Yoshitaka Erlewine’s Markdown for WordPress and bbPress plug-in on this site, which is why if you wrap words in asterisks or underscores in the comments, they come out tagged for emphasis.

    Note also Milian Wolff’s Markdownify and its corresponding anti-dingus, for converting HTML to Markdown via PHP. There are other tools that do this using Python, etc.

Leave a Reply