PDFs in EPUBs: Test results

Trojan Horse

What a thing was this, too, which that mighty man wrought and endured in the carven horse, wherein all we chiefs of the Argives were sitting, bearing to the Trojans death and fate!

—Homer, Odyssey (translated by A.T. Murray, 1919)

A couple of weeks ago, Hugh McGuire tweeted this:

[blackbirdpie url=”https://twitter.com/hughmcguire/status/231068424506322945″]

and then he blogged the replies he received at Including a PDF in an EPUB.

This is something I’d been wondering about for a while, too. I remembered Joshua Tallent of eBookArchitects mentioning at at least one workshop that it was possible to embed a PDF in an EPUB, but I’d never tried it. The company I work for publishes a lot of craft books whose print editions have patterns and templates in the back, and so far we’ve either had to suppress the e-book versions entirely or supply those patterns to our e-book readers through the Web. Hugh’s post reminded me that I’d been meaning to test Joshua’s tip to see if it would help solve our pattern problem, so I finally just did that.

TL;DR: Yes, you can embed a PDF in an EPUB so that all its pages are viewable in iBooks and Adobe RMSDK–based readers, but display is wonky and not necessarily readable, and you can’t print the PDF at full size, if at all, so it doesn’t solve my particular problem. It might solve your problems, though, so a more detailed breakdown of what I found follows.

I started by Googling the problem, which led me to two relevant MobileRead threads:

One of those, in turn, led me to this handy piece of the EPUB 2 spec: 2.3.1.1: Items That Are Not OPS Core Media Types, which includes sample code for listing a PDF in an OPF.

To create a test file, I embedded two PDFs in an EPUB from my QA pile.

  1. templates.pdf, the actual patterns from the book in question, which are downloadable from the Web. The e-book already included an HTML page linking to the website, so I added two hyperlinks from there to the PDF—one around a thumbnail GIF of the first page of the PDF, and the other around some text—and stuck the file in the EPUB’s images folder. I added this file to the OPF manifest and used the page it was linked from as the fallback document. Then I added it to the spine as a nonlinear item, two lines below the html document that linked to it.

  2. incompetence.pdf, because the article “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments” was the first PDF I found sitting in my downloads folder; go figure. I created an HTML page of the paper’s abstract to use as a fallback—listed only in the manifest—and added the PDF to the spine and NCX.

Relevant lines from the manifest:

[sourcecode lang=”xml”]




[/sourcecode]

Relevant portion of the spine (bm4 contains the links to templates.pdf):

[sourcecode lang=”xml”]







[/sourcecode]

This test file validates using ePubCheck 3.

So, then the fun part! Here’s what I found when I tried to view this EPUB using what I had lying around:

iBooks on an iPad

Nonlinear PDF

  • Opens just like a zoomed image, from either the linked-wrapped GIF or the linked text.
  • The PDF pages scroll vertically.
  • Horizontal swipe gestures have no effect; you must tap the “Done” button to return to the HTML page from which the PDF was linked.
  • In landscape orientation, the PDF opens to fit the width of the screen.

Linear PDF

  • The PDF pages scroll vertically.
  • Horizontal swipe gestures flip between the PDF and the adjacent HTML documents, as usual.
  • In landscape view, if the PDF falls on a recto page, it is scaled to fit the width of that page (i.e., tiny). If, however, the PDF falls on a verso page, the PDF is scaled to fit the full width of the screen, but only the left side of the PDF is visible. The right half is overlaid with the recto page containing the next (HTML, in my test file) section of the EPUB. It scrolls vertically but not horizontally, so there is no way to view the right half of the file.

Correct, though too small to actually read:
iBooks PDF-rendering in landscape view

Not so correct:
iBooks PDF-rendering glitch

In any case, you can’t print from iBooks, as far as I know, and I can’t imagine a way to get PDFs that are embedded in an EPUB to display in any other PDF-capable iOS app. So although iBooks has the best support for this trick among the apps I tested (which is not saying much), this still doesn’t help me at all.

Also, for the record, iBooks crashes whenever I close the book. But maybe that’s just me? In my experience, iBooks crashes pretty much every time I blink.

Nook Color, Nook Simple Touch

Nonlinear PDF

  • Tapping the linked thumbnail image does nothing except turn the page or open the menu at the bottom of the screen. It’s the same as tapping a section of plain text. Typical Nook shenanigans. The text link to the PDF works fine.
  • The 'linear="no"' property is ignored. If you page to the position in the document that corresponds to where this PDF is listed in the spine, the nonlinear file is right there in plain sight.
  • Some–but not all!–of the formatting in the PDF has been stripped out. The text is illegibly small and not affected by the Nook software’s font-size button.

Linear PDF

  • The PDF appears right where it should be, according to the spine.
  • Most of the PDF’s formatting has been stripped out, and the text is illegibly small.

PDF page viewed on a Nook Color PDF page viewed on a Nook Color PDF page viewed on a Nook Color

I’d say embedding PDFs in EPUBs intended for the Nook is a baaaaaad idea.

ADE 1.7.2.1131

Nonlinear PDF

  • Clicking the linked thumbnail image does nothing. The text link to the PDF works fine.
  • The 'linear="no"' property is ignored. The linked PDF can be found in the position corresponding to where it is listed in the spine.
  • ADE opens each PDF at the width of an HTML EPUB page. The PDF is shown one page per spread (on the leftmost page, if you’re using a multicolumn view) and can be paged through normally.

Linear PDF

  • Displayed the same as the nonlinear PDF, at its designated position in the spine.

This version of ADE allows you to print pages from an EPUB, if the file is not locked down. It does not, however, print PDF pages at 100%, but rather at about 75%. Not helpful for my craft pattern problem, but acceptable for some other types of content.

ADE 1.8.0 Preview

Nonlinear PDF

  • The 'linear="no"' property is ignored. Spine position is maintained.
  • Clicking the image link on the HTML page works!
  • PDF display is the same as in ADE 1.7.2.

Linear PDF

  • Displayed the same as the nonlinear PDF, at its designated position in the spine.

There is no option to print from ADE 1.8.0 Preview, even if the file lacks DRM. Because useful features give you cancer.

Run through KindleGen 2.5.1

Fails to compile. KindleGen (MAC OSX V2.5 build 0626-3a91e28) ignores both internal PDFs and therefore throws an error when it finds references to them:

[sourcecode lang=”text”]
Info(prcgen):I1006: Resolving hyperlinks
Info(prcgen):I1010: Writing hyperlinks
Warning(prcgen):W14001: Hyperlink not resolved: /var/folders/RZ/RZlyxmL8Feuj2mWkVh8TZLNQTjQ/-Tmp-/mobi-pq60cV/OEBPS/images/templates.pdf
Warning(prcgen):W14002: Some hyperlinks could not be resolved.
. . .
Info(prcgen):I1049: Building table of content URL: /var/folders/RZ/RZlyxmL8Feuj2mWkVh8TZLNQTjQ/-Tmp-/mobi-pq60cV/ncx.ncx
Error(prcgen):E24010: Hyperlink not resolved in toc:/var/folders/RZ/RZlyxmL8Feuj2mWkVh8TZLNQTjQ/-Tmp-/mobi-pq60cV/OEBPS/images/incompetence.pdf#
Error(prcgen):E24001: The table of content could not be built.
Info(prcgen):I1038: MOBI file could not be generated because of errors!
[/sourcecode]

Conclusion

Embedding PDFs may be useful, as Joshua suggests, for providing alternate versions of large data tables . . . if your EPUB is being sold in the iBookstore. On a Nook, however, the PDF version of a table may actually turn out to be less legible than the HTML or GIF version. Given the inconsistent and buggy support for PDFs in this small sample of e-readers, I’d say this trick causes more problems than it solves. Maybe when EPUB 3 support comes prancing in on the back of a unicorn . . . ?

Or is there something I’m missing about how to do this?

Photo: Trojan Horse at the Mt Olympus Theme Park, Wisconsin Dells, WI by mac9001 / Maciej Ciupa; some rights reserved.

4 Responses

  1. karen jensen
    karen jensen August 27, 2012 at 2:35 pm |

    I work with pdf files to estimate construction projects. The squirrreliness of scale when printing them has driven me to digital takeoff tools. Often the % the hard copies are off is a large fraction of the margin of error I’m trying to stay within.

  2. mokane
    mokane March 13, 2015 at 12:24 pm |

    You could try running the pdf through this open source program if it’s letter size or A4 (http://willus.com/k2pdfopt/).

    Did you try re-generating the pdf in a more friendly ebook size? Would a fixed layout e-pub solve the problem, assuming you could regenerate the pdf to the same dimensions of the fixed layout epub, i.e. not letter size or a4?

Leave a Reply

%d bloggers like this: